"Performance-Guided LLM Knowledge Distillation for Efficient Text Classification at Scale"

Playback speed

Share post at current time

0:00

Transcript

"Performance-Guided LLM Knowledge Distillation for Efficient Text Classification at Scale"

The podcast on this paper is generated with Google's Illuminate.

Rohan Paul

Dec 31, 2024

Performance-Guided Knowledge Distillation (PGKD) shrinks LLMs into tiny models while keeping their classification superpowers

https://arxiv.org/abs/2411.05045

🎯 Original Problem:

LLMs excel at text classification but face deployment challenges due to high inference costs and latency. Production environments need faster, cheaper solutions while maintaining LLM-level performance.

-----

🛠️ Solution in this Paper:

→ Performance-Guided Knowledge Distillation (PGKD) transfers LLM knowledge into smaller task-specific models through active learning

→ Student model performance metrics guide teacher LLM to generate optimized training data

→ Hard negative mining identifies misclassified samples where student was confident

→ Early stopping prevents performance drift and overfitting

-----

💡 Key Insights from this Paper:

→ PGKD effectiveness increases with dataset complexity and number of classes

→ Validation metrics feedback helps LLM generate better training samples

→ Hard negative samples improve model decision boundaries

→ Performance gains diminish as training data size increases

-----

📊 Results:

→ Up to 130X faster inference compared to LLMs

→ 25X lower operational costs

→ Accuracy improvement from 0.320 to 0.443 on complex datasets (335 classes)

→ Consistently outperforms base BERT model across all dataset sizes

Rohan's Bytes

"Performance-Guided LLM Knowledge Distillation for Efficient Text Classification at Scale"

Discussion about this video