"Advancing Agentic Systems: Dynamic Task Decomposition, Tool Integration and Evaluation using Novel Metrics and Dataset"

Playback speed

Share post at current time

0:00

Transcript

"Advancing Agentic Systems: Dynamic Task Decomposition, Tool Integration and Evaluation using Novel Metrics and Dataset"

The podcast on this paper is generated with Google's Illuminate.

Rohan Paul

Dec 18, 2024

Task decomposition Agentic frameworks that converts user queries into optimized task graphs while managing tool dependencies

📚 https://arxiv.org/abs/2410.22457

🤖 Original Problem:

LLM-based systems face limitations in industrial settings due to costly fine-tuning requirements and challenges in real-time decision making. Current agentic frameworks lack comprehensive evaluation methods and suffer from high latency, limited adaptability, and insufficient dynamic tool integration support.

-----

🔧 Solution in this Paper:

→ Introduces a framework with 5 core components: Orchestrator (generates task graphs), Delegator (manages task distribution), Agents (execute tasks using LLMs), Tools (provide functions), and Executor (handles execution sequence)

→ Proposes novel evaluation metrics: Node F1 Score, Structural Similarity Index (SSI), and Tool F1 Score

→ Develops specialized dataset based on AsyncHow dataset for analyzing agentic behavior

→ Implements Task-Aware Semantic Tool Filtering for real-time tool selection

-----

💡 Key Insights:

→ Task graph decomposition strategies (coarse vs fine-grained) significantly impact system efficiency

→ Structural metrics are more critical in sequential tasks, while tool-related metrics dominate parallel tasks

→ SSI emerged as strongest predictor of performance in sequential tasks (r=0.470, p<0.001)

→ Tool F1 Score proved essential in parallel tasks (r=0.476, p<0.001)

-----

📊 Results:

→ Model achieved R-squared value of 0.3631 for sequential tasks, explaining 36.31% variance

→ For parallel tasks, R-squared value reached 0.3933, explaining 39.33% variance

→ Node Label Similarity showed substantial correlation (r=0.447, p<0.01)

→ System demonstrated high Node Precision/Recall but lower Edge F1 Scores in complex dependencies

Rohan's Bytes

"Advancing Agentic Systems: Dynamic Task Decomposition, Tool Integration and Evaluation using Novel Metrics and Dataset"

Discussion about this video

Ready for more?