Reflection-Tuning Paper from 2023: An Approach for Data Recycling, where LLMs can fix their own mistakes by reflecting on the previous mistakes.
- Improves the quality of instruction-tuning data.
- Models trained on recycled data show significant improvements in performance across multiple benchmarks.
📚 https://arxiv.org/abs/2310.11716v1
🧠 The method operates on two main phases:
1. Instruction Reflection
2. Response Reflection
Both phases use an oracle model (e.g. GPT4) to analyze and improve the existing data.
------
🛠️ The process step-by-step:
1️⃣ Data Preparation
- The process starts with an existing instruction-tuning dataset (e.g., Alpaca or WizardLM).
- Each data point consists of an instruction-response pair.
2️⃣ Instruction Reflection
- For each instruction-response pair:
a. The pair is fed into the oracle model (e.g. GPT4) along with specific criteria for reflection.
b. Criteria include:
- Complexity of the Topic
- Level of Detail Required
- Knowledge Required
- Ambiguity of the Instruction
- Logical Reasoning or Problem-Solving Involved
c. The oracle model analyzes the instruction based on these criteria.
d. It then generates a new, improved instruction.
3️⃣ Response Reflection
- For each newly generated instruction:
a. The instruction is fed into the oracle model along with new criteria.
b. Criteria for response reflection include:
- Helpfulness
- Relevance
- Accuracy
- Level of Details
c. The oracle model generates an improved response based on these criteria.
4️⃣ Data Recycling
- The new instruction-response pair replaces the original in the dataset.
- This process is repeated for all pairs in the dataset.
5️⃣ Model Training
- The recycled dataset is used to fine-tune a base LLM (e.g., Llama3.1-8B).
📊 Evaluation
The recycled dataset is evaluated using various metrics:
- Instruction and response lengths
- Perplexity scores
- Coherence between instructions and responses
- IFD scores
📚 https://github.com/tianyi-lab/Reflection_Tuning
------
Are you into AI and LLMs❓ Join me on Twitter with 30.3K others, to remain on the bleeding-edge every day.
𝕏/🐦 https://x.com/rohanpaul_ai
Share this post