0:00
/
0:00
Transcript

"SlimLM: An Efficient Small Language Model for On-Device Document Assistance"

The podcast on this paper is generated with Google's Illuminate.

SlimLM proposed in this paper, shrinks document processing to phone size while matching larger models' performance.

SlimLM is a series of efficient small language models optimized for document processing on mobile devices, achieving comparable performance to larger models while maintaining speed and privacy through on-device computation. The paper addresses the critical need for efficient document assistance on smartphones by developing models ranging from 125M to 1B parameters, tested extensively on Samsung Galaxy S24.

-----

https://arxiv.org/abs/2411.09944

🤖 Original Problem:

While small language model's real-world performance on smartphones remains understudied, particularly for document processing tasks requiring longer context handling.

-----

📱 Solution in this Paper:

→ SlimLM models are pretrained on SlimPajama-627B dataset and finetuned on DocAssist, a specialized dataset built from 83K documents.

→ The models are optimized for three key document tasks: summarization, question answering, and question suggestion.

→ A sweet spot analysis identifies optimal trade-offs between model size, context length (up to 800 tokens), and inference time on Samsung S24.

-----

🔍 Key Insights:

→ Smaller models (125M-350M) achieve higher inference speeds but lower accuracy

→ Mid-sized models (450M-760M) offer the best balance of speed and performance

→ Context length significantly impacts model efficiency, with 800 tokens being the practical limit

-----

📊 Results:

→ SlimLM-125M outperforms SmolLM-135M while maintaining efficient mobile performance

→ SlimLM-450M matches Qwen2-0.5B despite being smaller

→ Models achieve 99.86-100% accuracy on intent classification after fine-tuning

Discussion about this video