0:00
/
0:00
Transcript

"Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback"

The podcast on this paper is generated with Google's Illuminate.

Preference optimization to improve the semantics of Spoken Language Models (SLMs).

Align-SLM framework, proposed in this paper, teaches speech models to talk better by learning from feedback

https://arxiv.org/abs/2411.01834

🎯 Original Problem:

Textless Spoken Language Models (SLMs) generate speech continuations with poor semantic coherence, repetitive phrases, and grammatical errors compared to text-based LLMs.

-----

🛠️ Solution in this Paper:

→ Align-SLM framework enhances SLM semantics through Reinforcement Learning using AI feedback

→ Generates multiple speech continuations from a prompt using pre-trained TWIST model

→ Creates preference data pairs using automated selection with LLM-guided semantic feedback

→ Applies Direct Preference Optimization to learn from this feedback

→ Couples with curriculum learning by iteratively raising difficulty in discerning preference data

-----

💡 Key Insights:

→ First framework to improve long-term semantics of speech-only SLMs through preference optimization

→ Pure speech-to-speech approach without requiring text injection

→ Data efficient and automated preference data selection strategy

→ Novel coupling with curriculum learning for iterative improvements

-----

📊 Results:

→ 77.9% on sWUGGY benchmark

→ 61.1% on SStoryCloze benchmark

→ 86.8% on T-StoryCloze benchmark

→ Superior Meaningfulness Mean Opinion scores in human evaluations

Discussion about this video