"BLR-MoE: Boosted Language-Routing Mixture of Experts for Domain-Robust Multilingual E2E ASR"

Playback speed

Share post at current time

0:00

Transcript

"BLR-MoE: Boosted Language-Routing Mixture of Experts for Domain-Robust Multilingual E2E ASR"

Below podcast is generated with Google's Illuminate.

Rohan Paul

Jan 28, 2025

BLR-MoE decouples language identification from automatic speech recognition (ASR), simplifying domain adaptation and boosting accuracy.

This paper introduces BLR-MoE, a new architecture for multilingual automatic speech recognition (ASR) that reduces errors caused by language confusion, especially in mismatched domain scenarios.

-----

https://arxiv.org/abs/2501.12602

Original Problem 🤔:

→ Multilingual ASR models often struggle with language confusion, especially when training and testing data have different domains.

→ Existing Mixture of Expert (MoE) models like LR-MoE only address confusion in the feed-forward network, not in the attention mechanism.

→ Language identification based routers, crucial for directing data to the correct expert, are susceptible to errors exacerbated by domain mismatch.

-----

Key Insights 💡:

→ Decoupling language identification and acoustic modeling can simplify the problem and improve robustness.

→ Applying MoE to both attention and feed-forward networks can further reduce language confusion.

→ Adapting the language identification router to the target domain with minimal data can improve overall performance.

→ Pruning irrelevant experts based on prior language information can significantly boost performance for specific scenarios.

-----

Solution in this Paper 솔:

→ This paper proposes BLR-MoE, which extends LR-MoE by applying MoE to the attention mechanism.

→ BLR-MoE uses a more robust language identification (LID) router. This is accomplished by using a Time Delay Neural Network (TDNN).

→ The router is then fine-tuned independently using audio-language pairs, decoupling it from the main acoustic model training.

→ The paper also introduces expert pruning. Given prior knowledge of the languages involved, unnecessary experts are removed. This simplifies the model and reduces potential confusion.

-----

Results 💯:

→ BLR-MoE achieves a 16.09% relative Word Error Rate (WER) reduction over the LR-MoE baseline on a 10,000-hour dataset.

→ Shows substantial improvements of 3.98% in in-domain and 19.09% in out-of-domain scenarios.

→ The accuracy of the LID-based router sees a 9.41% absolute improvement compared to LR-MoE.

Rohan's Bytes

"BLR-MoE: Boosted Language-Routing Mixture of Experts for Domain-Robust Multilingual E2E ASR"

Discussion about this video