"LoQT: Low-Rank Adapters for Quantized Pretraining"

Playback speed

Share post at current time

0:00

Transcript

"LoQT: Low-Rank Adapters for Quantized Pretraining"

The podcast on this paper is generated with Google's Illuminate.

Rohan Paul

Dec 29, 2024

Train massive neural networks on your gaming PC with LoQT's memory tricks

LoQT introduces a memory-efficient training method that combines low-rank adapters with quantization, enabling training of 7B parameter models on a single 24GB GPU without model sharding or offloading. This breakthrough makes LLM training accessible on consumer hardware[3].

-----

https://arxiv.org/abs/2405.16528

🔍 Original Problem:

Training large neural networks demands extensive computational resources, making it impractical on consumer hardware without complex techniques like model sharding or gradient offloading[3].

-----

🛠️ Solution in this Paper:

→ LoQT initializes two low-rank factors for each weight matrix - one from gradient projections and another to minimize quantization error.

→ Only one matrix receives active optimization, significantly reducing gradient and optimizer state size compared to full training.

→ The product of these matrices merges periodically into the full-rank matrix with exponentially increasing gaps[3].

→ Non-updated matrices stay quantized for optimal memory usage.

-----

💡 Key Insights:

→ Large accumulated updates enable successful quantized model training, as small changes wouldn't register in quantized states[3].

→ Gradient-based tensor factorization provides effective initialization for low-rank adapters.

→ Exponentially increasing merge intervals align with model convergence patterns.

-----

📊 Results:

→ Successfully trains 7B parameter models on 24GB consumer GPUs

→ Enables 13B parameter model training with per-layer updates on same hardware

→ Suitable for both pretraining and fine-tuning tasks[3]

Rohan's Bytes

"LoQT: Low-Rank Adapters for Quantized Pretraining"

Discussion about this video