AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios

This paper evaluates LLMs' social abilities through complex multi-agent interactions and private information handling

Nov 08, 2024

This paper evaluates LLMs' social abilities through complex multi-agent interactions and private information handling

Finds, LLMs struggle with complex social scenarios and high-level growth goals

🤖 Original Problem:

Evaluating social intelligence of LLMs in complex interactions remains challenging. Current benchmarks lack scenario diversity, oversimplify real interactions, and focus only on explicit goal achievement without considering private information handling.

🔧 Solution in this Paper:

• Built AgentSense: A benchmark with 1,225 diverse social scenarios extracted from scripts using bottom-up approach

• Uses Dramaturgical Theory to create realistic social interactions

• Evaluates both goal completion and implicit reasoning abilities

• Implements multi-turn conversations between agents with private information

• Measures performance through interviews and multiple-choice questions

• Introduces Profile Sensitivity Index (PSI) to assess stability across different character profiles

💡 Key Insights:

• Being a "sender" (actively sharing information) is more challenging than being a "receiver"

• Models perform better at relationship management and cooperation vs competition

• Even GPT-4 struggles with balancing goal achievement and private information protection

• Social intelligence varies significantly based on character profiles

📊 Results:

• GPT-4 leads overall performance but still needs improvement in private information reasoning

• Qwen2.5-14b shows strong performance in both goal completion and information reasoning

• Llama-2 series models perform poorly, with some improvement in Llama-3 series

• PSI results show higher social intelligence models are less sensitive to profile changes

• Models achieve 88.36% goal completion rate and 76.86% information reasoning accuracy

🎭 The key components and methodology of AgentSense

Scenario Construction: Extracts templates from scripts and synthesizes diverse characters to create scenarios
Social Interaction Simulation: Agents interact through multi-turn conversations trying to achieve social goals while protecting private information
Evaluation: Uses interviews and multiple-choice questions to assess goal completion and information reasoning abilities.

🎯 The types of social goals and scenarios covered

Personal domain (54%): Home, private gatherings, intimate settings
Small society (37%): Schools, workplaces, communities
Large society (9%): Public spaces, online platforms

Social goals are categorized using ERG theory into:

Existence needs: Information exchange
Relatedness needs: Building/maintaining relationships
Growth needs: Cooperation, competition, conflict resolution.

Rohan's Bytes

Discussion about this post