0:00
/
0:00
Transcript

"ScalingNote: Scaling up Retrievers with Large Language Models for Real-World Dense Retrieval"

The podcast on this paper is generated with Google's Illuminate.

ScalingNote introduces a two-stage method to scale up dense retrieval using LLMs while maintaining fast query response times for real-world applications.

-----

https://arxiv.org/abs/2411.15766

🔍 Original Problem:

→ Real-world dense retrieval systems face a critical trade-off between model performance and query latency. While using LLMs can improve retrieval quality, they significantly slow down online query processing.

→ Current systems focus on negative sampling strategies rather than scaling up model size due to deployment constraints.

-----

🛠️ Solution in this Paper:

→ ScalingNote employs a two-stage training approach to leverage LLMs effectively.

→ Stage 1 trains dual towers initialized from the same LLM using contrastive learning and hard negative mining.

→ Stage 2 performs Query-based Knowledge Distillation to transfer knowledge from the LLM query tower to a smaller, faster BERT-based query tower.

→ The system uses residual K-means clustering for document indexing and IVFPQ for efficient retrieval.

-----

💡 Key Insights:

→ Scaling laws in dense retrieval show that larger models and more data improve performance up to a certain point

→ Query towers can be effectively distilled while maintaining document tower complexity

→ Title-specific query prediction improves document relevance

-----

📊 Results:

→ Achieved 83.01% AUC-SAT and 82.14% AUC-REL on manual test dataset

→ Reduced irrelevant document retrieval by 1.546% in top-20 results

→ Maintained high QPS (33,810) with 4-layer BERT student model

Discussion about this video