LLMs teach themselves to write longer by splitting and expanding their own outputs
📚 https://arxiv.org/abs/2410.23933
🤖 Original Problem:
LLMs excel at processing long inputs but struggle to generate high-quality text beyond 2,000 words. Current solutions rely on human-written texts or proprietary models like GPT-4, making them impractical and limited.
-----
🔧 Solution in this Paper:
→ Introduces "Self-Lengthen" - a two-component framework with Generator and Extender models
→ Generator produces initial response, Extender expands it in two stages:
- Stage 1: Extends first half of content
- Stage 2: Uses extended first half as reference to complete second half
→ Uses iterative training cycles:
- Micro-iteration: Progressive expansion of text length
- Macro-iteration: Fine-tuning both models with expanded outputs
→ Requires only seed instructions and open-source instruction model
-----
💡 Key Insights:
→ LLMs can self-improve long-text generation without external data
→ Two-stage extension bypasses model length constraints
→ Length-bias sampling accelerates output length increase
→ Random line removal during training enhances extension capabilities
-----
📊 Results:
→ Increased output length from 1,000 to 8,000 words while maintaining quality
→ Outperformed instruction backtranslation and behavior imitation methods
→ No negative impact on MMLU benchmark performance
→ Successfully integrated into Qwen 2.5 series models
Share this post