AQLM is really a great paper for the GPU constrained LLM setup - "Extreme Compression of Large Language Models via Additive Quantization" 💡
Extreme Compression of Large Language Models…
AQLM is really a great paper for the GPU constrained LLM setup - "Extreme Compression of Large Language Models via Additive Quantization" 💡