LeSeR (Lexical reranking of Semantic Retrieval), proposed in this paper, combines semantic search with lexical reranking to improve regulatory document retrieval and question answering accuracy.
-----
Paper - https://arxiv.org/abs/2412.06009
🤔 Original Problem:
Regulatory documents are complex and ever-changing, making it challenging for organizations to find relevant information and ensure compliance. Traditional search methods often miss important context or struggle with regulatory terminology.
-----
🔧 Solution in this Paper:
→ LeSeR (Lexical reranking of Semantic Retrieval) introduces a two-stage approach that first uses semantic embeddings for high-recall retrieval.
→ The system fine-tunes embedding models using Multiple Negative Symmetric Ranking Loss on query-passage pairs.
→ Retrieved passages are then reranked using BM25 lexical scoring to improve precision.
→ The final system integrates BGE_LeSeR with Qwen2.5 7B for answer generation.
-----
💡 Key Insights:
→ Pure semantic models excel at recall but struggle with ranking precision
→ Lexical reranking significantly improves mean Average Precision
→ Fine-tuning with MNSR loss enhances retrieval performance
→ Hybrid approaches outperform both pure semantic and lexical methods
-----
📊 Results:
→ BGE_LeSeR achieved 0.8201 Recall@10 and 0.6655 mAP@10
→ Qwen2.5 7B integration delivered highest RePASs score of 0.4340
→ System outperformed Mistral, Nemo, and Gemma models across metrics
Share this post