0:00
/
0:00
Transcript

"Exploring the Abilities of Large Language Models to Solve Proportional Analogies via Knowledge-Enhanced Prompting"

The podcast on this paper is generated with Google's Illuminate.

LLMs solve analogies better with targeted semantic knowledge than with raw data dumps

Teaching LLMs the "why" behind analogies boosts their performance by 45%.

Knowledge isn't enough - LLMs need guidance to think like humans for solving analogies

This paper explores LLMs' ability to solve proportional analogies through knowledge-enhanced prompting. The researchers created a 15K multiple-choice dataset and evaluated nine LLMs using different prompting techniques, finding that targeted knowledge significantly improves performance compared to structured knowledge or zero-shot approaches.

-----

https://arxiv.org/abs/2412.00869

🤔 Original Problem:

LLMs struggle with proportional analogies (like "Oxygen is to Gas as ___ is to ___"), which are fundamental to human cognition and reasoning. Previous datasets were limited in size and scope.

-----

🔧 Solution in this Paper:

→ Created a comprehensive 15K multiple-choice dataset containing 238 distinct relation types for testing analogical reasoning

→ Implemented four prompting techniques: Zero-shot, Few-shot with examples, Structured Knowledge using WordNet/ConceptNet/Wikidata, and Targeted Knowledge with specific semantic relationships

→ Developed a semantic filtering mechanism to select relevant knowledge paths from structured sources

→ Introduced a modified Chain-of-Thought approach focusing on semantic relationships and cognitive processes

-----

💡 Key Insights:

→ Simply adding structured knowledge doesn't improve analogy solving - targeted knowledge is more effective

→ Code-focused models perform poorly compared to general-purpose LLMs

→ Semantic filtering slightly outperforms random filtering for knowledge selection

-----

📊 Results:

→ Best model (GPT-3.5-Turbo) achieved 55.25% accuracy with Targeted Knowledge Prompting

→ 21% improvement over zero-shot prompting

→ 45% improvement compared to structured knowledge prompting

Discussion about this video