0:00
/
0:00
Transcript

"CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation"

The podcast on this paper is generated with Google's Illuminate.

CORAL, proposed in this paper, bridges single-turn to multi-turn RAG with Wikipedia-based conversations and smart compression

Wikipedia's structure transforms into natural conversations for better RAG evaluation

📚 https://arxiv.org/abs/2410.23090

🎯 Original Problem:

Current academic research focuses mainly on single-turn Retrieval-Augmented Generation (RAG), while real-world applications require handling multi-turn conversations. This gap creates challenges in managing conversation history, topic shifts, and maintaining response quality across extended dialogues.

-----

🔧 Solution in this Paper:

→ Introduced CORAL: A benchmark with 8,000 diverse conversations derived from Wikipedia

→ Developed a three-stage construction approach:

- Extract title trees from Wikipedia pages

- Sample conversation flows using 4 strategies (Linear Descent, Sibling-Inclusive, Single-Tree Random Walk, Dual-Tree Random Walk)

- Use GPT-4 to contextualize questions with natural language elements

→ Created unified framework supporting 3 core tasks:

- Conversational passage retrieval

- Response generation

- Citation labeling

→ Implemented conversation compression strategies:

- Last Response Strategy

- Rewrite Strategy

- LLM Summarization Strategy

-----

💡 Key Insights:

→ Fine-tuned open-source LLMs outperform commercial closed-source LLMs in retrieval tasks

→ Shortening input length maintains response quality while improving citation accuracy

→ Performance gains plateau after 3B parameters for generation tasks

→ Citation labeling improves with larger models (3B to 7B parameters)

-----

📊 Results:

→ Achieved 23.2 MRR and 33.6 MAP in retrieval using KD-ANCE-C

→ Obtained 26.3 BLEU-1 score with Llama-3.1-8B-SFT for response generation

→ Reached 31.1% Citation Precision using Qwen2.5-7B-SFT with LLM Summarization

Discussion about this video