"Enhancing Code-Switching ASR Leveraging Non-Peaky CTC Loss and Deep Language Posterior Injection"

Playback speed

Share post at current time

0:00

Transcript

"Enhancing Code-Switching ASR Leveraging Non-Peaky CTC Loss and Deep Language Posterior Injection"

Generated below podcast on this paper with Google's Illuminate.

Rohan Paul

Jan 06, 2025

Transcript

Pre-trained models: the secret sauce for better code-switching speech recognition.

This paper proposes a novel approach to enhance code-switching Automatic Speech Recognition (ASR) by leveraging pre-trained language models and multilingual speech representations.

-----

https://arxiv.org/abs/2412.08651

Original Problem 😕:

Code-switching ASR faces challenges due to limited training data and complex language mixing patterns.

-----

Solution in this Paper 💡:

→ The paper introduces a two-stage fine-tuning method for code-switching ASR.

→ It utilizes pre-trained wav2vec 2.0 models for multilingual speech representations.

→ The approach incorporates BERT-based language models for improved language modeling.

→ A novel masked language modeling task is employed during fine-tuning to enhance code-switching capabilities.

→ The method adapts to target code-switched languages without requiring extensive code-switched data.

-----

Key Insights from this Paper 💡:

→ Pre-trained models can effectively transfer knowledge to code-switching tasks.

→ Two-stage fine-tuning helps balance general and code-switching specific knowledge.

→ Masked language modeling improves code-switching understanding in ASR systems.

-----

Results 📊:

→ Achieved 21.9% and 29.8% relative WER improvements on SEAME and MSC datasets, respectively.

→ Outperformed previous state-of-the-art methods in code-switching ASR performance.

→ Demonstrated effective cross-lingual transfer, even with limited target language data.

Rohan's Bytes

"Enhancing Code-Switching ASR Leveraging Non-Peaky CTC Loss and Deep Language Posterior Injection"

Discussion about this video