"ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization"
Below podcast on this paper is generated with Google's Illuminate.
https://arxiv.org/abs/2502.04306
This paper addresses the challenge of optimizing LLM agent workflows for complex tasks. Existing methods are inflexible, not adaptable and struggle with scalability due to discrete optimization techniques.
This paper introduces ScoreFlow, a framework using gradient-based optimization in a continuous space. It uses Score-DPO, a new method of direct preference optimization that considers quantitative feedback for improved workflow generation.
-----
📌 ScoreFlow's strength lies in shifting workflow optimization from discrete search to continuous gradient-based methods. This enables efficient exploration of complex agentic workflows, overcoming limitations of prior discrete approaches like Monte Carlo Tree Search used in AFlow.
📌 Score-DPO innovatively integrates quantitative scores into Direct Preference Optimization. This score-aware preference learning refines workflow generation more effectively than standard DPO, which only uses binary preference pairs, leading to faster convergence and better performance.
📌 The code representation for workflows in ScoreFlow provides a flexible structure. This allows for complex logic, loops, and conditional execution within agent workflows, unlike graph-based methods. This code-based approach enhances adaptability and expressiveness for diverse tasks.
----------
Methods Explored in this Paper 🔧:
→ ScoreFlow framework is introduced for automated LLM agent workflow generation and optimization.
→ It employs code as a representation for workflows, enabling flexible and robust searches.
→ Operators are used as predefined, reusable agent combinations, customizable by the generator.
→ ScoreFlow uses an open-source LLM as the base model for workflow generation, minimizing costs.
→ ScoreFlow optimizes workflow generators using preference data derived from evaluation scores.
→ Score-DPO, a variant of Direct Preference Optimization, is proposed.
→ Score-DPO incorporates quantitative evaluation scores directly into the preference optimization process.
→ Score-DPO enhances sampling distribution by up-weighting preference pairs with larger score differences.
→ Score-DPO incorporates evaluation scores into the Bradley-Terry ranking objective for improved learning.
-----
Key Insights 💡:
→ ScoreFlow achieves high performance, scalability, and adaptability in LLM agent workflow optimization.
→ Loss-gradient optimization in ScoreFlow offers more flexibility and scalability compared to discrete methods.
→ Score-DPO effectively addresses inaccuracies in evaluation scores, improving optimization convergence.
→ Adaptive workflow generation in ScoreFlow allows for task-specific operator selection and workflow complexity.
→ ScoreFlow demonstrates robustness across different LLM architectures for both generators and executors.
→ ScoreFlow enables smaller models to outperform larger models with better cost efficiency.
-----
Results 📊:
→ ScoreFlow achieves an 8.2% average performance improvement over baselines across six benchmarks.
→ ScoreFlow outperforms automated workflow optimization baselines like ADAS and Aflow.
→ ScoreFlow, using Score-DPO, outperforms baselines like SFT, PPO and standard DPO in workflow optimization.
→ ScoreFlow shows better scalability and performance advantage over Aflow on diverse combined datasets.