Flash Attention-2 was already making attention 4-8x faster, but has yet to take advantage of modern GPUs like H100’s full power.
Share this post
FlashAttention-3 is now available. 1.5-2.0x…
Share this post
Flash Attention-2 was already making attention 4-8x faster, but has yet to take advantage of modern GPUs like H100’s full power.