A new categorical framework illuminates the inner workings of transformer self-attention.
This paper provides a categorical framework for self-attention, viewing its linear components as morphisms in a 2-category. This clarifies positional encodings, equivariance properties, and connections to "circuit" interpretability.
-----
https://arxiv.org/abs/2501.02931
Methods from this Paper 💡:
→ Self-attention's query, key, value maps are unified as a parametric 1-morphism in the 2-category Para(Vect).
→ On the underlying category Vect, this induces an endofunctor.
→ Stacking multiple self-attention layers forms the free monad on this endofunctor.
→ Positional encodings are treated as monoid actions (additive case) or injective maps.
→ Linear self-attention parts are shown to be equivariant to input permutations.
→ These categorical constructions align with "circuit"-based interpretability methods.
-----
Key Insights from this Paper 🧐:
→ The framework unifies geometric, algebraic, and interpretability perspectives on transformers.
→ It explains how parameters are shared and composed.
→ It makes the algebraic structure of attention explicit, going beyond prior geometric or logical approaches.
Share this post