SlimLM proposed in this paper, shrinks document processing to phone size while matching larger models' performance.
SlimLM is a series of efficient small language models optimized for document processing on mobile devices, achieving comparable performance to larger models while maintaining speed and privacy through on-device computation. The paper addresses the critical need for efficient document assistance on smartphones by developing models ranging from 125M to 1B parameters, tested extensively on Samsung Galaxy S24.
-----
https://arxiv.org/abs/2411.09944
🤖 Original Problem:
While small language model's real-world performance on smartphones remains understudied, particularly for document processing tasks requiring longer context handling.
-----
📱 Solution in this Paper:
→ SlimLM models are pretrained on SlimPajama-627B dataset and finetuned on DocAssist, a specialized dataset built from 83K documents.
→ The models are optimized for three key document tasks: summarization, question answering, and question suggestion.
→ A sweet spot analysis identifies optimal trade-offs between model size, context length (up to 800 tokens), and inference time on Samsung S24.
-----
🔍 Key Insights:
→ Smaller models (125M-350M) achieve higher inference speeds but lower accuracy
→ Mid-sized models (450M-760M) offer the best balance of speed and performance
→ Context length significantly impacts model efficiency, with 800 tokens being the practical limit
-----
📊 Results:
→ SlimLM-125M outperforms SmolLM-135M while maintaining efficient mobile performance
→ SlimLM-450M matches Qwen2-0.5B despite being smaller
→ Models achieve 99.86-100% accuracy on intent classification after fine-tuning
Share this post