0:00
/
0:00
Transcript

"Multi-task retriever fine-tuning for domain-specific and efficient RAG"

Generated below podcast on this paper with Google's Illuminate.

One small retriever does all the heavy lifting for domain RAG applications.

A single retriever model fine-tuned on multiple domain tasks enables efficient, scalable RAG applications while maintaining multilingual capabilities and strong generalization across domains.

-----

https://arxiv.org/abs/2501.04652

🤔 Original Problem:

Deploying RAG applications faces two key challenges: retrieving domain-specific information and managing multiple retrievers for different applications is computationally expensive and inefficient.

-----

🔧 Solution in this Paper:

→ The paper instruction fine-tunes a small retriever encoder (mGTE-base, 305M parameters) on various domain tasks

→ The training combines workflow steps, database tables, and field retrieval tasks into a single model

→ Data balancing through frequency-based downsampling prevents bias from overrepresented steps

→ The multi-task approach enables one retriever to serve multiple applications efficiently

-----

💡 Key Insights:

→ Multi-task instruction fine-tuning improves generalization across domains

→ Downsampling frequent components boosts step retrieval performance by 8%

→ The model preserves multilingual capabilities despite English-only training

-----

📊 Results:

→ Step retrieval: 0.90 recall@15 on out-of-domain tests

→ Table retrieval: 0.90 recall@5 across domains

→ Field retrieval: 0.60 recall@5 for database fields

→ Workflow retrieval: 0.94 recall@5 on unseen tasks

Discussion about this video