"NExtLong: Toward Effective Long-Context Training without Long Documents"

Playback speed

Share post at current time

0:00

Transcript

"NExtLong: Toward Effective Long-Context Training without Long Documents"

Below podcast is generated with Google's Illuminate.

Rohan Paul

Feb 04, 2025

Make LLMs contextually aware by strategically placing deceptive content, forcing them to discern true dependencies.

The paper says distractors enhance the model's ability to model long-range dependencies.

NExtLong, to synthesize long-context data for LLMs. It enhances long-range dependency learning by interleaving hard negative distractors within documents.

-------------

Paper - https://arxiv.org/abs/2501.12766

Original Problem 😕:

→ LLMs with extended context windows struggle due to limited availability of long documents.

→ Existing methods synthesize long context data.

→ But they lack mechanisms to effectively model long-range dependencies.

-------------

Solution in this Paper 💡:

→ This paper proposes NExtLong.

→ NExtLong is a framework for synthesizing long-context data.

→ It uses Negative document Extension.

→ NExtLong decomposes documents into meta-chunks.

→ It extends context by interleaving hard negative distractors.

→ Distractors are retrieved from pretraining corpora.

→ This forces the model to distinguish relevant long-range context.

→ Distractors enhance the model's ability to model long-range dependencies.

→ NExtLong has two stages: Negative Document Extension and Long-Range Dependence Modeling.

→ In stage 1, documents are chunked into meta-chunks.

→ Hard negatives are mined for each meta-chunk from pre-training data.

→ Hard negatives are concatenated with meta-chunks to create a long document.

→ In stage 2, the model is trained on this synthesized long document.

→ Training focuses on modeling long-range dependencies by identifying meta-chunks across distractors.

-------------

Key Insights from this Paper 🤔:

→ NExtLong effectively improves LLMs' ability to capture long-range dependencies.

→ Negative document extension strengthens model's ability to distinguish relevant information.

→ Synthesized data from NExtLong reduces reliance on real long documents.

→ NExtLong shows potential for training ultra-long context models without long document scarcity limitations.

Results 📈:

→ NExtLong improves average performance by 7.33% over Quest, a prior synthesis method.

→ NExtLong outperforms existing models trained on long documents on HELMET and RULER benchmarks.

→ Achieves a 13.43% gain in Recall and 9.12% gain in Re-Rank over Quest method.

→ Llama-3-8B-NExtLong-512K-Base surpasses Llama-3-8B-ProLong-512K-Base by +5.42% and Llama-3.1-8B by +4.69%.

Rohan's Bytes

"NExtLong: Toward Effective Long-Context Training without Long Documents"

Discussion about this video