Rohan's Bytes
Subscribe
Sign in
Share this post
Rohan's Bytes
MoH: Multi-Head Attention as Mixture-of-Head Attention
Copy link
Facebook
Email
Notes
More
AI Paper Explained
MoH: Multi-Head Attention as Mixture-of-Head…
Rohan Paul
Nov 11, 2024
Share this post
Rohan's Bytes
MoH: Multi-Head Attention as Mixture-of-Head Attention
Copy link
Facebook
Email
Notes
More
Smart routing system tells Transformer heads when to pay attention, boosting efficiency
Read →
Comments
Share
Copy link
Facebook
Email
Notes
More
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts
Share this post
MoH: Multi-Head Attention as Mixture-of-Head…
Share this post
Smart routing system tells Transformer heads when to pay attention, boosting efficiency