Old-school PageRank meets RAG: A CPU-friendly solution for million-token context processing.
MixPR introduces a CPU-efficient algorithm that replaces expensive long-context processing in LLMs with sparse graph-based retrieval, enabling fast processing of million-token contexts.
-----
https://arxiv.org/abs/2412.06078
🤔 Original Problem:
Processing long contexts in LLMs is computationally expensive and often impractical due to quadratic attention costs. Current Retrieval-Augmented Generation (RAG) solutions focus mainly on simple Q&A tasks and lack computational efficiency.
-----
🔧 Solution in this Paper:
→ MixPR uses Personalized PageRank with dynamic alpha parameter to balance between query-relatedness and structural importance.
→ The system implements sparse matrix operations on CPU using TF-IDF for keyword-based embeddings.
→ It creates an adjacency matrix from text chunks without using neural networks, making graph construction extremely efficient.
→ The algorithm automatically adjusts its retrieval strategy based on task type, using higher alpha values for query-dependent tasks and lower values for global document understanding.
-----
💡 Key Insights:
→ Graph-based retrieval outperforms traditional RAG for complex multi-hop reasoning tasks
→ Sparse embeddings enable efficient processing without GPU requirements
→ Dynamic alpha parameter is crucial for handling diverse task types
-----
📊 Results:
→ Processes 1.5M+ tokens in seconds on CPU
→ Outperforms existing RAG methods on BABILong benchmark
→ Matches specialized fine-tuned models on Hash-Hop tasks
→ Achieves state-of-the-art on RULER benchmark with GPT-4o
Share this post