Synthetic data, DPO and LoRA teach LLMs the art of human conversation.
LLMs can now generate more natural, human-like responses through synthetic datasets and fine-tuning techniques while maintaining their technical capabilities across various benchmarks.
-----
https://arxiv.org/abs/2501.05032
Original Problem 🤖:
LLMs often produce formal, impersonal responses that lack natural conversational flow, making interactions feel mechanical and distant.
-----
Solution in this Paper 🔧:
→ The researchers created synthetic datasets using Llama 3 models (405B for questions, 70B for answers) to generate natural, conversational responses.
→ They implemented Direct Preference Optimization (DPO) technique to balance casual conversation with structured dialogue.
→ Low-Rank Adaptation (LoRA) was used for fine-tuning to prevent catastrophic forgetting while adapting to new tasks.
→ The dataset covered 256 topics with 10,884 samples, spanning diverse conversation scenarios.
-----
Key Insights 💡:
→ Synthetic datasets can effectively train models for more human-like responses
→ Fine-tuning with DPO and LoRA maintains model performance while improving conversational abilities
→ Lower LoRA r-values (r=8) help preserve core model knowledge
-----
Results 📊:
→ Fine-tuned models achieved ~90% selection rate in human-likeness evaluations for Llama and Qwen models
→ Minimal performance impact on technical benchmarks (-0.2% to +1.07% without IFEval)
→ Training completed in 2-4 hours on 2xNVIDIA A100 GPUs
------
Are you into AI and LLMs❓ Join my daily AI newsletter. I will send you 7 emails a week analyzing the highest signal AI developments. ↓↓
🎉 https://rohanpaul.substack.com/
Share this post