0:00
/
0:00
Transcript

"Large Language Models Can Self-Improve in Long-context Reasoning"

The podcast on this paper is generated with Google's Illuminate.

Self-improving LLMs crack the long-context reasoning puzzle.

This paper introduces SEALONG, a method for LLMs to self-improve in long-context reasoning without relying on human experts or advanced models. It leverages multiple sampled outputs and consensus-based evaluation to create self-supervision for fine-tuning.

-----

https://arxiv.org/abs/2411.08147

🤔 Original Problem:

LLMs struggle with long-context reasoning despite strong retrieval capabilities. Existing approaches rely on human experts or advanced models for data synthesis, limiting further advancements.

-----

💡 Solution in this Paper:

→ SEALONG enables LLMs to self-improve in long-context reasoning through a two-stage process.

→ First, it samples multiple reasoning trajectories for each question and long context.

→ These outputs are then scored using Minimum Bayes Risk (MBR), prioritizing outputs with higher semantic consistency.

→ The scoring method uses sentence embedding similarity to measure consistency between outputs.

→ Finally, SEALONG applies either supervised fine-tuning on high-scoring outputs or preference optimization using both high and low-scoring outputs.

-----

🔑 Key Insights from this Paper:

→ LLMs have untapped potential in long-context reasoning, revealed through refined prompting and multiple output sampling.

→ Consensus-based evaluation effectively identifies high-quality outputs without external supervision.

→ Self-improvement in long-context reasoning is possible without relying on human experts or advanced models.

-----

📊 Results:

→ SEALONG improved Llama-3.1-8B-Instruct's performance from 50.8 to 55.0 on long-context tasks.

→ Outperformed GPT-4o on some tasks (55.0 vs 54.4).

→ Demonstrated strong data efficiency, achieving competitive performance with only 1K examples.

Discussion about this video