Make LLMs contextually aware by strategically placing deceptive content, forcing them to discern true dependencies.
The paper says distractors enhance the model's ability to model long-range dependencies.
NExtLong, to synthesize long-context data for LLMs. It enhances long-range dependency learning by interleaving hard negative distractors within documents.
-------------
Paper - https://arxiv.org/abs/2501.12766
Original Problem 😕:
→ LLMs with extended context windows struggle due to limited availability of long documents.
→ Existing methods synthesize long context data.
→ But they lack mechanisms to effectively model long-range dependencies.
-------------
Solution in this Paper 💡:
→ This paper proposes NExtLong.
→ NExtLong is a framework for synthesizing long-context data.
→ It uses Negative document Extension.
→ NExtLong decomposes documents into meta-chunks.
→ It extends context by interleaving hard negative distractors.
→ Distractors are retrieved from pretraining corpora.
→ This forces the model to distinguish relevant long-range context.
→ Distractors enhance the model's ability to model long-range dependencies.
→ NExtLong has two stages: Negative Document Extension and Long-Range Dependence Modeling.
→ In stage 1, documents are chunked into meta-chunks.
→ Hard negatives are mined for each meta-chunk from pre-training data.
→ Hard negatives are concatenated with meta-chunks to create a long document.
→ In stage 2, the model is trained on this synthesized long document.
→ Training focuses on modeling long-range dependencies by identifying meta-chunks across distractors.
-------------
Key Insights from this Paper 🤔:
→ NExtLong effectively improves LLMs' ability to capture long-range dependencies.
→ Negative document extension strengthens model's ability to distinguish relevant information.
→ Synthesized data from NExtLong reduces reliance on real long documents.
→ NExtLong shows potential for training ultra-long context models without long document scarcity limitations.
Results 📈:
→ NExtLong improves average performance by 7.33% over Quest, a prior synthesis method.
→ NExtLong outperforms existing models trained on long documents on HELMET and RULER benchmarks.
→ Achieves a 13.43% gain in Recall and 9.12% gain in Re-Rank over Quest method.
→ Llama-3-8B-NExtLong-512K-Base surpasses Llama-3-8B-ProLong-512K-Base by +5.42% and Llama-3.1-8B by +4.69%.
Share this post