Understanding and Mitigating Language Confusion in LLMs

Playback speed

Share post at current time

Share from 0:00

0:00

Transcript

Understanding and Mitigating Language Confusion in LLMs

Generated this podcast on this Paper with Google's Illuminate, a specialized tool to create podcast from arXiv papers only

Rohan Paul

Jan 04, 2025

New benchmark, LCB, proposed in this paper, reveals why LLMs struggle with language consistency and how to fix their linguistic mix-ups.

📚 https://arxiv.org/abs/2406.20052

Original Problem 🎯:

LLMs often fail to consistently generate text in users' desired languages, exhibiting "language confusion" where they mix languages at word, line, or response levels. This significantly impacts non-English users' experience.

-----

Solution in this Paper 🛠️:

• Created Language Confusion Benchmark (LCB) covering 15 diverse languages

• Evaluated models on monolingual generation (queries in language L, expecting response in L)

• Tested cross-lingual generation (English instructions to generate in target language)

• Developed metrics: Line-level Pass Rate (LPR), Word-level Pass Rate (WPR), Language Confusion Pass Rate (LCPR)

• Implemented mitigation strategies: few-shot prompting, multilingual instruction tuning, beam search decoding

-----

Key Insights 💡:

• Base models perform better than English-centric instruction-tuned variants

• Complex prompts increase language confusion

• High sampling temperatures aggravate the problem

• Multilingual instruction tuning helps reduce confusion

• Position of language instruction impacts model performance

-----

Results 📊:

• Command R and OpenAI models excel in monolingual generation (LPR 98.6-99.3%)

• Llama 2/3 and Mistral models struggle with consistency (LPR 48.3-73.0%)

• Cross-lingual performance: Command R+ Refresh leads with 95.4% LPR

• Few-shot prompting improves base model performance from 1.1% to 95.0% LPR

• Reducing temperature to 0.3 and using beam search significantly reduces confusion

Rohan's Bytes

Understanding and Mitigating Language Confusion in LLMs

Discussion about this video

Ready for more?