This paper's method teaches LLMs to pinpoint exact sentences, instead of chunks, when citing sources in long documents, making fact-checking easier and responses more accurate
Results 📊:
• LongCite-8B/9B outperform GPT-4o by 6.4%/3.6% in citation F1 score
• 2x finer citation granularity vs proprietary models
• 7-9% improvement in response correctness over vanilla long-context SFT
• High agreement between human evaluation and automated metrics
📚 https://arxiv.org/abs/2409.02897
Original Problem 🔍:
Current long-context LLMs lack citation capabilities, making it difficult for users to verify information and raising concerns about hallucinations.
-----
Key Insights from this Paper 💡:
• LongBench-Cite: Automated benchmark for LQAC ( Long-Context Question Answering with Citations)
• "Coarse to Fine" (CoF): Pipeline for generating high-quality LQAC data
• SFT with citations improves response correctness and citation quality
• Sentence-level citations are more user-friendly than chunk-level
-----
Solution in this Paper 🧠:
• CoF pipeline for LongCite-45k dataset creation:
- Generate QA pairs via self-instruct
- Retrieve chunks and add coarse citations
- Extract fine-grained sentence-level citations
- Filter low-quality instances
• Fine-tune LongCite-8B and LongCite-9B on LongCite-45k
• One-pass generation of responses with sentence-level citations
------
Are you into AI and LLMs❓ Join me on Twitter with 31.7K others, to remain on the bleeding-edge every day.
𝕏/🐦 https://x.com/rohanpaul_ai