0:00
/
0:00
Transcript

"Mixture-of-PageRanks: Replacing Long-Context with Real-Time, Sparse GraphRAG"

The podcast on this paper is generated with Google's Illuminate.

Old-school PageRank meets RAG: A CPU-friendly solution for million-token context processing.

MixPR introduces a CPU-efficient algorithm that replaces expensive long-context processing in LLMs with sparse graph-based retrieval, enabling fast processing of million-token contexts.

-----

https://arxiv.org/abs/2412.06078

🤔 Original Problem:

Processing long contexts in LLMs is computationally expensive and often impractical due to quadratic attention costs. Current Retrieval-Augmented Generation (RAG) solutions focus mainly on simple Q&A tasks and lack computational efficiency.

-----

🔧 Solution in this Paper:

→ MixPR uses Personalized PageRank with dynamic alpha parameter to balance between query-relatedness and structural importance.

→ The system implements sparse matrix operations on CPU using TF-IDF for keyword-based embeddings.

→ It creates an adjacency matrix from text chunks without using neural networks, making graph construction extremely efficient.

→ The algorithm automatically adjusts its retrieval strategy based on task type, using higher alpha values for query-dependent tasks and lower values for global document understanding.

-----

💡 Key Insights:

→ Graph-based retrieval outperforms traditional RAG for complex multi-hop reasoning tasks

→ Sparse embeddings enable efficient processing without GPU requirements

→ Dynamic alpha parameter is crucial for handling diverse task types

-----

📊 Results:

→ Processes 1.5M+ tokens in seconds on CPU

→ Outperforms existing RAG methods on BABILong benchmark

→ Matches specialized fine-tuned models on Hash-Hop tasks

→ Achieves state-of-the-art on RULER benchmark with GPT-4o

Discussion about this video