A two-stage fusion approach that optimizes hyperparameters and model weights using Bayesian methods
BOMF (Bayesian Optimization Model Fusion) uses Bayesian Optimization to fuse models while considering both loss and metrics for better fine-tuning
https://arxiv.org/abs/2411.06710
🤖 Original Problem:
Fine-tuning pre-trained language models faces challenges in selecting optimal hyperparameters and checkpoints. Existing model fusion techniques that work well for computer vision don't perform as effectively for language models due to misalignment between loss and metric landscapes.
-----
🔧 Solution in this Paper:
→ Introduces BOMF (Bayesian Optimization Model Fusion) - a two-stage approach combining hyperparameter optimization and model fusion
→ First stage uses lightweight models (frozen layers/reduced LoRA rank) to find optimal hyperparameters through Bayesian Optimization
→ Second stage employs Multi-Objective Bayesian Optimization to find optimal weights for combining models while considering both loss and metrics
→ Collects fusion members from a single training trajectory after 50% of training epochs
-----
💡 Key Insights:
→ Loss and metric landscapes have significant misalignment in language models unlike computer vision models
→ Optimal hyperparameters align well across different model configurations (frozen layers, LoRA ranks)
→ Best performing weights in training trajectory correlate with fusion performance
-----
📊 Results:
→ Improved performance across Natural Language Understanding and Generation tasks
→ Successfully tested on RoBERTa, T5 and LLaMA models
→ Reduced computational costs by using single training trajectory instead of multiple ones
Share this post