SEGMENT+ : Long Text Processing with Short-Context Language Models

Nov 09, 2024

SEGMENT + : Long Text Processing with Short-Context Language Models

Combining structured notes and filtering

Solution in this Paper 🛠️:

• Two-stage process:

Gathering Information: It collects structured notes in parallel from all segments of the long input. Each note contains an Evidence part (original relevant sentences) and a Reasoning part (analysis/interpretation).
Deriving Answer: It filters out unhelpful notes, then divides remaining notes into batches. Each batch is merged into one updated structured note. This process iterates until the notes fit the context window for answering the question.

Key Insights from this Paper 💡:

• Combining precision (Evidence) and recall (Reasoning) improves information gathering

• Filtering mechanism reduces noise and enhances performance, especially for smaller models

• Structured information flow allows for better control and interpretability

• Performance remains stable as input length increases

Results 📊:

• Outperformed baselines on long document QA tasks across various datasets

• Superior performance on Babilong benchmark for needle-in-a-haystack tasks

• Greater performance gains with stronger base models

• Stable performance as input length increased (up to 128k tokens)

• Optimal segment size: 3000 tokens

Paper - "SEGMENT*™: Long Text Processing with Short-Context Language Models"

Rohan's Bytes