0:00
/
0:00
Transcript

"Do LLMs Provide Consistent Answers to Health-Related Questions across Languages?"

Below podcast is generated with Google's Illuminate.

This paper reveals inconsistencies in Large Language Model (LLM) answers to health questions across languages like English, Chinese, Turkish, and German. These inconsistencies can spread healthcare misinformation.

-----

Paper - https://arxiv.org/abs/2501.14719

Methods in this Paper 💡:

→ The paper introduces a novel prompt-based evaluation workflow.

→ This workflow assesses LLM consistency across languages for health-related questions.

→ It expands the HealthFC dataset by adding Turkish and Chinese translations and disease categories.

→ The workflow parses LLM responses into segments using a defined discourse ontology.

→ Discourse ontology includes Answer-Summary, Health Benefits, Clinical Guidelines, Individual Considerations, and Public Health Advice.

→ A consistency-check prompt then compares parsed English answers with other languages.

→ Consistency is labeled as Consistent, Partially Consistent, Contradict, or Irrelevant.

-----

Key Insights from this Paper 🧠:

→ LLMs show inconsistencies in health answers when questions are posed in different languages.

→ Inconsistencies are more pronounced in non-English languages compared to English.

→ Specific disease topics and information types show higher inconsistency.

→ Answer summaries are generally consistent, but other parts like guidelines and advice are not.

→ LLMs tend to provide longer answers in English and German than in Turkish.

-----

Results 📊:

→ Inconsistent Answer Summaries range from 14.55% to 37.56% across languages and models.

→ Inconsistent Clinical Guidelines and Evidence reach up to 77.93%.

→ Average inconsistency across all information units ranges from 40.56% to 68.12%.

Discussion about this video