0:00
/
0:00
Transcript

"Can Large Language Models Adapt to Other Agents In-Context?"

Generated below podcast on this paper with Google's Illuminate.

Perfect prediction doesn't guarantee optimal action in LLMs.

LLMs have strong prediction capabilities but struggle with using those predictions effectively for decision-making in multi-agent interactions.

https://arxiv.org/abs/2412.19726

Original Problem 🤔:

→ Current research claims LLMs have near-human theory of mind capabilities, but these evaluations only test prediction ability, not actual decision-making.

→ There's a critical gap between predicting other agents' behavior and using those predictions rationally.

-----

Solution in this Paper 💡:

→ The paper introduces two distinct measures: literal theory of mind (ability to predict others' actions) and functional theory of mind (ability to respond optimally to those predictions).

→ It evaluates LLMs through canonical game theory scenarios like Rock-Paper-Scissors, Battle of Sexes, and Prisoner's Dilemma.

→ The study compares performance against both simple fixed-strategy agents and adaptive tit-for-tat policies.

-----

Key Insights 🔍:

→ LLMs show high accuracy (>90%) in predicting other agents' actions

→ Despite good predictions, LLMs make sub-optimal decisions in response

→ Current prompting techniques, including chain-of-thought, don't bridge this gap

→ Inductive bias helps short-term performance but hinders long-term convergence

-----

Results 📊:

→ Top LLMs achieve 96.7% prediction accuracy but show 0.542 regret per step

→ Simple tabular models outperform LLMs with 0.083 regret despite similar prediction accuracy

→ Social prompting and oracle knowledge don't significantly improve decision quality

Discussion about this video