0:00
/
0:00
Transcript

"Mind Your Theory: Theory of Mind Goes Deeper Than Reasoning"

Generated below podcast on this paper with Google's Illuminate.

LLMs need both correct mental state attribution and appropriate depth of mentalization for true Theory of Mind (ToM) .

Current ToM benchmarks miss half the puzzle: knowing when to mentalize matters as much as how.

This paper identifies a critical gap in evaluating Theory of Mind (ToM) capabilities in LLMs, highlighting the need to assess both depth of mentalization and correct inference.

-----

https://arxiv.org/abs/2412.13631

🤔 Original Problem:

→ Current AI research focuses solely on testing whether LLMs can correctly attribute mental states, ignoring whether ToM should be invoked in the first place.

→ Existing benchmarks use static scenarios that don't properly evaluate interactive ToM capabilities.

-----

🔬 Solution in this Paper:

→ The paper proposes a two-step evaluation framework for ToM capabilities.

→ First step determines whether to invoke ToM and at what depth of mentalization.

→ Second step applies correct inference given the chosen depth.

→ This approach distinguishes between three types of ToM errors: unnecessary ToM use, insufficient depth, and incorrect reasoning.

-----

🎯 Key Insights:

→ Biological agents adaptively choose ToM depth based on context and resource constraints

→ Linear probing methods cannot distinguish between different types of ToM failures

→ Interactive benchmarks are needed to test appropriate ToM invocation

-----

📊 Results:

→ Current benchmarks fail to distinguish between Type B (insufficient depth) and Type C (incorrect reasoning) errors

→ Static vignette-based tasks preclude testing true ToM capabilities

Discussion about this video