SmartAgent and Chain-of-User-Thought (COUT) teaches AI to read between clicks, understanding not just what users do, but why they do it. making interfaces truly personal.
SmartAgent introduces Chain-of-User-Thought (COUT), enabling AI agents to understand user preferences while interacting with interfaces, bridging the gap between task completion and personalization.
-----
https://arxiv.org/abs/2412.07472v1
🤔 Original Problem:
→ Current embodied AI agents excel at following instructions but lack understanding of user preferences, making them ineffective for personal assistant applications.
→ Existing systems rely on fixed action paths, limiting their ability to adapt to different user needs.
-----
🔧 Solution in this Paper:
→ SmartAgent implements COUT through a three-step reasoning process: basic GUI actions, understanding explicit requirements, and making personalized recommendations.
→ The framework uses a two-stage training approach with embodiment and personalization phases.
→ It leverages Qwen-VL as backbone with Perceiver for GUI actions and Reasoner for user preferences.
-----
🎯 Key Insights:
→ First framework to combine embodied AI with personalization capabilities
→ Introduces SmartSpot benchmark for evaluating personalized embodied agents
→ Demonstrates effective zero-shot reasoning in new scenarios
-----
📊 Results:
→ Achieves 64% Element Accuracy in GUI interactions
→ Maintains 71% Explicit Preference Understanding
→ Shows 24% Implicit Preference Accuracy
→ Performs comparably to specialized GUI agents while adding personalization
Share this post