0:00
/
0:00
Transcript

"BioRAGent: A Retrieval-Augmented Generation System for Showcasing Generative Query Expansion and Domain-Specific Search for Scientific Q&A"

Generated below podcast on this paper with Google's Illuminate.

BioRAGent is a web-based RAG that combines query expansion, document retrieval, and answer generation to provide accurate biomedical answers with transparent citations.

-----

https://arxiv.org/abs/2412.12358

Original Problem 🤔:

→ Biomedical search requires complex queries for evidence-based answers

→ LLMs tend to hallucinate in professional settings, making direct use challenging

-----

Solution in this Paper 🛠️:

→ BioRAGent uses 3-shot learning for query expansion with synonyms and related terms

→ The system retrieves top 50 PubMed articles using Elasticsearch with BM25 ranking

→ Parallel processing extracts relevant snippets using LLM-guided extraction

→ Generates two answer types: short paragraphs and responses with PubMed citations

→ Built on Gradio framework using Gemini 1.5 flash 002 for optimal performance

-----

Key Insights 💡:

→ Transparent query expansion makes search process controllable

→ Few-shot learning effectively handles specialized biomedical domains

→ Parallel document processing maintains real-time performance

-----

Results 📊:

→ Won multiple first and second places in BioASQ 2024 challenge

→ Strong performance in question answering tasks (Phase A+ and B)

→ Competitive results across different question formats