BLR-MoE decouples language identification from automatic speech recognition (ASR), simplifying domain adaptation and boosting accuracy.
This paper introduces BLR-MoE, a new architecture for multilingual automatic speech recognition (ASR) that reduces errors caused by language confusion, especially in mismatched domain scenarios.
-----
https://arxiv.org/abs/2501.12602
Original Problem 🤔:
→ Multilingual ASR models often struggle with language confusion, especially when training and testing data have different domains.
→ Existing Mixture of Expert (MoE) models like LR-MoE only address confusion in the feed-forward network, not in the attention mechanism.
→ Language identification based routers, crucial for directing data to the correct expert, are susceptible to errors exacerbated by domain mismatch.
-----
Key Insights 💡:
→ Decoupling language identification and acoustic modeling can simplify the problem and improve robustness.
→ Applying MoE to both attention and feed-forward networks can further reduce language confusion.
→ Adapting the language identification router to the target domain with minimal data can improve overall performance.
→ Pruning irrelevant experts based on prior language information can significantly boost performance for specific scenarios.
-----
Solution in this Paper 솔:
→ This paper proposes BLR-MoE, which extends LR-MoE by applying MoE to the attention mechanism.
→ BLR-MoE uses a more robust language identification (LID) router. This is accomplished by using a Time Delay Neural Network (TDNN).
→ The router is then fine-tuned independently using audio-language pairs, decoupling it from the main acoustic model training.
→ The paper also introduces expert pruning. Given prior knowledge of the languages involved, unnecessary experts are removed. This simplifies the model and reduces potential confusion.
-----
Results 💯:
→ BLR-MoE achieves a 16.09% relative Word Error Rate (WER) reduction over the LR-MoE baseline on a 10,000-hour dataset.
→ Shows substantial improvements of 3.98% in in-domain and 19.09% in out-of-domain scenarios.
→ The accuracy of the LID-based router sees a 9.41% absolute improvement compared to LR-MoE.
Share this post