Rohan's Bytes
Subscribe
Sign in
AI Paper Explained
MoH: Multi-Head Attention as Mixture-of-Head…
Rohan Paul
Nov 11, 2024
Smart routing system tells Transformer heads when to pay attention, boosting efficiency
Read →
Comments
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts
MoH: Multi-Head Attention as Mixture-of-Head…
Smart routing system tells Transformer heads when to pay attention, boosting efficiency