Why choose between similar or related when you can have both? SiReRAG says why not both!
SiReRAG introduces a dual-tree indexing approach for RAG systems that combines semantic similarity and information relatedness, significantly improving multihop reasoning tasks.
-----
https://arxiv.org/abs/2412.06206
🤔 Original Problem:
Current RAG systems use either semantic similarity or information relatedness for indexing, but not both. This leads to incomplete knowledge synthesis and suboptimal performance on complex reasoning tasks.
-----
🔧 Solution in this Paper:
→ SiReRAG builds two separate trees - a similarity tree using recursive summarization and a relatedness tree based on entity connections
→ The similarity tree follows RAPTOR's approach with 4-level recursive clustering of semantically similar content
→ The relatedness tree extracts fine-grained propositions about entities and groups them via shared entities
→ Both trees are flattened into a unified retrieval pool for comprehensive knowledge access
-----
💡 Key Insights:
→ Combining similarity and relatedness captures more comprehensive knowledge connections
→ Entity-based proposition aggregates reduce noise and redundancy better than raw text chunks
→ Independent tree structures maintain clear distinction between similarity and relatedness signals
-----
📊 Results:
→ 1.9% average F1 score improvement across MuSiQue, 2WikiMultiHopQA, and HotpotQA datasets
→ Up to 7.8% improvement when enhancing existing reranking methods
→ Maintains efficiency with TPER consistently below 1.0
Share this post