Preference optimization to improve the semantics of Spoken Language Models (SLMs).
Align-SLM framework, proposed in this paper, teaches speech models to talk better by learning from feedback
https://arxiv.org/abs/2411.01834
🎯 Original Problem:
Textless Spoken Language Models (SLMs) generate speech continuations with poor semantic coherence, repetitive phrases, and grammatical errors compared to text-based LLMs.
-----
🛠️ Solution in this Paper:
→ Align-SLM framework enhances SLM semantics through Reinforcement Learning using AI feedback
→ Generates multiple speech continuations from a prompt using pre-trained TWIST model
→ Creates preference data pairs using automated selection with LLM-guided semantic feedback
→ Applies Direct Preference Optimization to learn from this feedback
→ Couples with curriculum learning by iteratively raising difficulty in discerning preference data
-----
💡 Key Insights:
→ First framework to improve long-term semantics of speech-only SLMs through preference optimization
→ Pure speech-to-speech approach without requiring text injection
→ Data efficient and automated preference data selection strategy
→ Novel coupling with curriculum learning for iterative improvements
-----
📊 Results:
→ 77.9% on sWUGGY benchmark
→ 61.1% on SStoryCloze benchmark
→ 86.8% on T-StoryCloze benchmark
→ Superior Meaningfulness Mean Opinion scores in human evaluations
Share this post