DogeRM (Domain knowledge merged Reward Model) injects domain expertise into reward models by merging their neural weights.
Teaching reward models new domains without collecting expert preferences
📚 https://arxiv.org/abs/2407.01470
🎯 Original Problem:
Training reward models for domain-specific tasks requires extensive preference data collection from experts, making it costly and time-consuming.
-----
🔧 Solution in this Paper:
→ Introduces DogeRM (Domain knowledge merged Reward Model) that merges parameters of a general reward model with domain-specific fine-tuned LLMs
→ Uses weighted averaging for common token embeddings and transformer layers while keeping reward model's regression layer
→ Controls merging process with hyperparameter λ that determines weight of SFT parameters
→ Compatible with any domain-specific model sharing same architecture as base model
-----
💡 Key Insights:
→ Domain-specific SFT data is more accessible than preference data
→ Model merging can integrate domain knowledge without additional training
→ Weighted parameter averaging effectively transfers domain expertise
→ Performance improves most when λ is between 0.2 and 0.5
-----
📊 Results:
→ Math performance improved by 11.4% and 17% on RewardBench when merged with MetaMath-7B and MAmmoTH-7B
→ Coding performance improved by 5.4% on RewardBench when merged with Code Model
→ GSM8K accuracy increased by 5% in best-of-16 setting
→ No significant degradation in other domains
Share this post