A Controlled Study on Long Context Extension and Generalization in LLMs

Nov 08, 2024

This study exposes trade-offs in long-context extension methods, highlighting exact attention's superiority and extrapolation challenges.

Original Problem 🔍:

Its Challenging to compare long-context extension methods due to differences in data, model classes, and evaluation approaches.

Solution in this Paper 💡:

• Implements controlled protocol for extension methods

• Uses consistent base models and extension data

• Standardizes evaluation across methods

• Considers both intrinsic metrics (perplexity, retrieval) and extrinsic tasks

• Evaluates within extension length and extrapolation to longer contexts

Key Insights from this Paper 💡:

• Perplexity strongly correlates with downstream task performance for exact fine-tuned methods

• Approximate attention methods generally underperform across benchmarks

• Continual fine-tuning with exact attention works well within extended context length

• Extrapolation to longer lengths remains challenging

Results 📊:

• Dynamic NTK performs best among exact attention methods

• Exact fine-tuned methods outperform approximate attention and frozen methods

• NTK-32K: 0.96 faithfulness, 0.96 answer relevancy, 1.0 context recall

• Improved performance on LongBench, RULER, and retrieval tasks

Rohan's Bytes