"Can LLMs assist with Ambiguity? A Quantitative Evaluation of various Large Language Models on Word Sense Disambiguation"

Playback speed

Share post at current time

Share from 0:00

0:00

Transcript

"Can LLMs assist with Ambiguity? A Quantitative Evaluation of various Large Language Models on Word Sense Disambiguation"

The podcast on this paper is generated with Google's Illuminate.

Rohan Paul

Dec 23, 2024

Transcript

Combining LLMs with knowledge bases makes word sense disambiguation more accurate.

This paper proposes a novel approach combining prompt augmentation with knowledge bases to improve Word Sense Disambiguation using LLMs, achieving better accuracy in understanding ambiguous words.

-----

https://arxiv.org/abs/2411.18337

🤔 Original Problem:

Word Sense Disambiguation (WSD) faces challenges with lexical ambiguity in modern digital communications, impacting translation and information retrieval systems due to limited training data.

-----

🔧 Solution in this Paper:

→ The paper introduces a systematic prompt augmentation mechanism combined with a knowledge base containing different sense interpretations

→ Implements a human-in-loop approach where prompts are enhanced with Part-of-Speech tagging and synonyms of ambiguous words

→ Uses aspect-based sense filtering to narrow down possible word meanings

→ Employs few-shot Chain of Thought prompting to guide the LLM's decision process

→ Incorporates a hybrid Retrieval Augmented Generation inspired model blending LLM with knowledge base

-----

💡 Key Insights:

→ Commercial LLMs like GPT-4 outperform open-source models in WSD tasks

→ Prompt augmentation with knowledge bases significantly improves disambiguation accuracy

→ Human-in-loop approach helps refine prompts for better performance

→ Aspect-based filtering reduces sense ambiguity effectively

-----

📊 Results:

→ GPT-4 Turbo achieved 81% accuracy in prediction level assessment

→ Llama-2-70B showed 83% accuracy in suggestion level disambiguation

→ Model demonstrated 64% correct predictions on test sets

→ Achieved Sharpe ratio of 2.21 on test portfolio construction

Rohan's Bytes

"Can LLMs assist with Ambiguity? A Quantitative Evaluation of various Large Language Models on Word Sense Disambiguation"

Discussion about this video