"MEETING DELEGATE: Benchmarking LLMs on Attending Meetings on Our Behalf"
Below podcast on this paper is generated with Google's Illuminate.
https://arxiv.org/abs/2502.04376
The problem is that attending meetings is time-consuming, prone to scheduling conflicts, and often inefficient due to unnecessary full attendance. This paper explores using LLMs as meeting delegates to address these issues.
This paper proposes an LLM powered meeting delegate system. It aims to have LLMs participate in meetings as representatives for individuals.
-----
📌 This system uses a modular architecture. Information Gathering, Meeting Engagement, and Voice Generation are distinct modules. This design allows for independent improvement and testing of each component.
📌 Introduces a benchmark dataset from real meeting transcripts. This dataset is critical. It enables quantitative evaluation of meeting delegate systems in realistic scenarios.
📌 Performance varies significantly across LLMs. GPT-4 and GPT-4o show balanced performance. Gemini 1.5 Pro is cautious, while smaller models are more active in meetings.
----------
Methods Explored in this Paper 🔧:
→ This paper introduces an LLM powered meeting delegate system. It consists of three key modules.
→ The Information Gathering module collects user preferences and relevant data before meetings. This includes topics of interest and background knowledge.
→ The Meeting Engagement module monitors meeting transcripts in real-time. It uses LLMs to decide when and how the delegate should participate. It determines appropriate responses like leading discussions, responding to others, or chiming in. This study focuses on the participant role.
→ The Voice Generation module converts the LLM's text responses into speech. It uses text-to-speech technology to mimic the user's voice. Streaming is used to minimize latency.
-----
Key Insights 💡:
→ GPT-4 and GPT-4o demonstrate balanced engagement. They effectively respond when needed and remain silent when appropriate.
→ Gemini 1.5 Pro tends to be more cautious. It has a higher silence rate and lower response rate.
→ Gemini 1.5 Flash and Llama3 models are more active. They tend to engage more frequently, sometimes when silence is preferable.
→ Across all models, approximately 60% of generated responses address at least one key point from the ground-truth. This shows promise for LLM meeting delegates.
-----
Results 📊:
→ GPT-4 and GPT-4o achieved Response/Silence Rates between 0.7 and 0.8.
→ Gemini 1.5 Pro reached a Silence Rate of approximately 0.9.
→ Approximately 60% loose recall rate was achieved across models, indicating responses contained at least one key point from ground truth.