Rohan's Bytes
Subscribe
Sign in
Enhanced Transformer architecture for…
Rohan Paul
Nov 6, 2024
Split long sequences into bite-sized chunks, and suddenly your Transformer can eat 100x more data
Read →
Comments
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts
Enhanced Transformer architecture for…
Split long sequences into bite-sized chunks, and suddenly your Transformer can eat 100x more data