EXAONE 3.5 introduces three instruction-tuned LLMs (32B, 7.8B, 2.4B parameters) with exceptional real-world performance and long context understanding up to 32K tokens.
The models demonstrate superior bilingual capabilities in Korean and English while being computationally efficient compared to similar-sized models.
-----
https://arxiv.org/abs/2412.04862
🤔 Original Problem:
Existing LLMs face limitations in computational resources, deployment flexibility, and context length handling. Academic researchers need smaller models for limited GPUs, while industry demands both larger models for enhanced performance and smaller ones for on-device deployment.
-----
🔧 Solution in this Paper:
→ EXAONE 3.5 offers three model sizes: 2.4B for resource-constrained devices, 7.8B for balanced performance, and 32B for exceptional capabilities.
→ The models employ long-context fine-tuning with replay-based methods to extend context length from 4K to 32K tokens.
→ A rigorous decontamination process prevents test-set leakage and improves generalization.
→ The training pipeline includes supervised fine-tuning using taxonomic knowledge extraction and preference optimization through direct alignment algorithms.
-----
💡 Key Insights:
→ Smaller models can achieve competitive performance with efficient training
→ Long context understanding doesn't require massive computation
→ Bilingual capabilities can be maintained across model sizes
→ Decontamination significantly impacts generalization ability
-----
📊 Results:
→ Outperforms baselines in real-world use cases and long context tasks
→ 2.4B model surpasses larger models in general domain tasks
→ Achieves 32K token context understanding across all sizes
→ Uses 2.77x less computation than Qwen 2.5 32B while maintaining competitive performance
Share this post