Pre-trained models: the secret sauce for better code-switching speech recognition.
This paper proposes a novel approach to enhance code-switching Automatic Speech Recognition (ASR) by leveraging pre-trained language models and multilingual speech representations.
-----
https://arxiv.org/abs/2412.08651
Original Problem 😕:
Code-switching ASR faces challenges due to limited training data and complex language mixing patterns.
-----
Solution in this Paper 💡:
→ The paper introduces a two-stage fine-tuning method for code-switching ASR.
→ It utilizes pre-trained wav2vec 2.0 models for multilingual speech representations.
→ The approach incorporates BERT-based language models for improved language modeling.
→ A novel masked language modeling task is employed during fine-tuning to enhance code-switching capabilities.
→ The method adapts to target code-switched languages without requiring extensive code-switched data.
-----
Key Insights from this Paper 💡:
→ Pre-trained models can effectively transfer knowledge to code-switching tasks.
→ Two-stage fine-tuning helps balance general and code-switching specific knowledge.
→ Masked language modeling improves code-switching understanding in ASR systems.
-----
Results 📊:
→ Achieved 21.9% and 29.8% relative WER improvements on SEAME and MSC datasets, respectively.
→ Outperformed previous state-of-the-art methods in code-switching ASR performance.
→ Demonstrated effective cross-lingual transfer, even with limited target language data.
Share this post