"Parametric Retrieval Augmented Generation"
Below podcast on this paper is generated with Google's Illuminate.
https://arxiv.org/abs/2501.15915
The problem with current LLMs is their static knowledge after training, making it hard to incorporate new information. Traditional Retrieval-Augmented Generation (RAG) methods address this by adding external documents to the input context, but this increases computation and doesn't deeply integrate knowledge.
This paper introduces Parametric RAG. It injects external knowledge directly into the LLM's parameters. This is done by converting documents into a small set of parameters that are merged into the LLM's feed-forward networks.
-----
📌 Parametric Retrieval Augmented Generation (RAG) directly modifies feed-forward network parameters via LoRA. This in-parameter knowledge injection improves knowledge utilization compared to input context methods.
📌 Offline document parameterization is a key innovation. It pre-computes knowledge representations, enabling faster online inference by avoiding long context processing during queries.
📌 Retrieve-Update-Generate (RUG) workflow with LoRA merging offers a practical framework for Parametric RAG. It dynamically combines knowledge from multiple documents efficiently for each query.
----------
Methods Explored in this Paper 🔧:
→ The paper proposes Parametric Retrieval Augmented Generation (Parametric RAG).
→ Parametric RAG converts each document into a set of parameters offline. This is called 'Document Parameterization'.
→ Document Parameterization has two steps: 'Document Augmentation' and 'Parametric Document Encoding'.
→ Document Augmentation rewrites documents and generates question-answer pairs from them to create richer training data.
→ Parametric Document Encoding uses this augmented data to train low-rank matrices (LoRA) for each document. These LoRA matrices are the parametric representation of the document.
→ During online inference, Parametric RAG retrieves relevant documents.
→ It then merges the parametric representations of these documents.
→ These merged parameters are inserted into the LLM's feed-forward network.
→ Finally, the updated LLM generates the answer. This is called the Retrieve-Update-Generate (RUG) workflow.
-----
Key Insights 💡:
→ Injecting knowledge into LLM parameters is more effective than in-context knowledge injection for RAG.
→ Parametric RAG reduces online computational cost by eliminating the need to process long input contexts.
→ Combining Parametric RAG with traditional in-context RAG can further improve performance.
→ Document Augmentation with rewriting and QA pair generation is crucial for effective document parameterization.
-----
Results 📊:
→ Parametric RAG outperforms standard RAG, DA-RAG, FLARE, and DRAGIN on 2WikiMultihopQA, HotpotQA, PopQA, and ComplexWebQuestions datasets.
→ On 2WikiMultihopQA with LLaMA-1B, Parametric RAG achieves an F1 score of 0.2764, while standard RAG scores 0.2520.
→ With LLaMA-8B on 2WikiMultihopQA, Parametric RAG achieves an F1 score of 0.3932, and standard RAG scores 0.3372.
→ Combining Parametric RAG with in-context RAG ('Combine Both' method) achieves the best performance overall, reaching an F1 score of 0.4258 on 2WikiMultihopQA with LLaMA-8B.
→ Parametric RAG reduces inference time by 29% to 36% compared to standard RAG with LLaMA3-8B.