Meet Cake: The LLM speedup tool that attacks KV cache from both sides :bulb. KV cache loading becomes twice as fast by computing from start while loading from end simultaneously
Share this post
Compute Or Load KV Cache? Why Not Both?
Share this post
Meet Cake: The LLM speedup tool that attacks KV cache from both sides :bulb. KV cache loading becomes twice as fast by computing from start while loading from end simultaneously