0:00
/
0:00
Transcript

"Hindsight: Posterior-guided training of retrievers for improved open-ended generation"

The podcast on this paper is generated with Google's Illuminate.

HindSIGHT, proposed in this paper, uses posterior-guided training to find truly relevant passages for open-ended generation tasks

A guide retriever with target output access improves passage selection for better generation. Joint optimization at play.

📚 https://arxiv.org/abs/2110.07752

🎯 Original Problem:

Existing retrieval-augmented generation systems struggle with open-ended tasks like conversations where multiple passages can be equally relevant. Current methods fail to find truly relevant passages even in top-10 results, and generators don't effectively learn to ground outputs in retrieved passages.

-----

🔧 Solution in this Paper:

→ Introduces HindSIGHT - uses a guide retriever that can access target output during training to find truly relevant passages

→ Employs joint optimization of retriever, guide retriever and generator using evidence lower bound (ELBo)

→ Uses iterative closed-set training to efficiently update passage indices

→ Implements distributional repositioning before inference using α-mixture sampling

-----

💡 Key Insights:

→ Separating relevance signal from generation allows better training

→ Guide retriever with "hindsight" access to target output improves passage selection

→ Reverse KL divergence encourages retriever to match some modes with guide

→ Iterative closed-set training with rounds allows efficient index updates

-----

📊 Results:

→ 23% relative improvement in retriever success@10

→ 19% relative improvement in generator groundedness (Novel-F1)

→ 6.4% relative improvement in end-to-end performance

→ Validated improvements on MS-MARCO NLGen dataset

Discussion about this video