MaKTO (Multi-agent KTO) outperforms GPT-4o by 23.0% and two-stage RL agents by 10.9% in win rate.
MaKTO learns social strategy through language game immersion.
This paper introduces Multi-agent Kahneman & Tverskyβs Optimization (MaKTO) to train LLMs for strategic interaction in language games. MaKTO enables models to learn via in-context interaction, unlike traditional decoupled decision-making approaches.
-----
Paper - https://arxiv.org/abs/2501.14225
Original Problem: π€:
β Current AI agents in language games often separate decision-making from language generation.
β This decoupling limits generalization and strategic depth in complex social interactions.
β Existing methods fail to fully integrate language and action as described by Wittgenstein's Language Game Theory.
-----
Solution in this Paper: π‘:
β The paper proposes Multi-agent Kahneman & Tverskyβs Optimization (MaKTO).
β MaKTO trains LLMs through direct interaction in a multi-agent game environment.
β Behavior cloning (BC) initializes the model using expert game data and strategies.
β MaKTO employs Kahneman & Tversky Optimization (KTO) for fine-tuning decision-making.
β It uses multi-agent gameplay with diverse models to prevent strategy fixation.
β Stepwise preference selection refines actions using heuristic, voting, and verifier-based methods.
-----
Key Insights from this Paper: π:
β Integrating language and decision-making is crucial for advanced AI agents.
β Wittgenstein's Language Game Theory inspires a more unified approach to AI development.
β Social deduction games like Werewolf are excellent testbeds for strategic language agents.
β Multi-agent interaction during training enhances robustness and generalization.
β Stepwise feedback provides more granular optimization than win-loss outcomes alone.
-----
Results: π:
β MaKTO achieves a 61% average win rate in 9-player Werewolf games.
β MaKTO (Multi-agent KTO) outperforms GPT-4o by 23.0%
β MaKTO achieves a 60% win rate against human expert players.
β In Turing tests, MaKTO shows only 48.9% detectability, indicating human-like gameplay.