HindSIGHT, proposed in this paper, uses posterior-guided training to find truly relevant passages for open-ended generation tasks
A guide retriever with target output access improves passage selection for better generation. Joint optimization at play.
📚 https://arxiv.org/abs/2110.07752
🎯 Original Problem:
Existing retrieval-augmented generation systems struggle with open-ended tasks like conversations where multiple passages can be equally relevant. Current methods fail to find truly relevant passages even in top-10 results, and generators don't effectively learn to ground outputs in retrieved passages.
-----
🔧 Solution in this Paper:
→ Introduces HindSIGHT - uses a guide retriever that can access target output during training to find truly relevant passages
→ Employs joint optimization of retriever, guide retriever and generator using evidence lower bound (ELBo)
→ Uses iterative closed-set training to efficiently update passage indices
→ Implements distributional repositioning before inference using α-mixture sampling
-----
💡 Key Insights:
→ Separating relevance signal from generation allows better training
→ Guide retriever with "hindsight" access to target output improves passage selection
→ Reverse KL divergence encourages retriever to match some modes with guide
→ Iterative closed-set training with rounds allows efficient index updates
-----
📊 Results:
→ 23% relative improvement in retriever success@10
→ 19% relative improvement in generator groundedness (Novel-F1)
→ 6.4% relative improvement in end-to-end performance
→ Validated improvements on MS-MARCO NLGen dataset
Share this post