WorkflowLLM enables LLMs to handle 70+ action workflows, a 10x improvement over current capabilities
An LLM that can orchestrate real-world automation workflows at production scale
https://arxiv.org/abs/2411.05451
Original Problem 🤔:
Current LLMs can only handle small workflows with around 6 actions and simple logical structures. This falls short of real-world needs where applications like Apple Shortcuts involve 70+ actions and complex branching/looping patterns.
-----
Solution in this Paper 🛠️:
→ Created WorkflowBench - a dataset with 106,763 workflow samples covering 1,503 APIs from 83 applications
→ Collected real workflows from Apple Shortcuts and RoutineHub, converted to Python code, added hierarchical thoughts using ChatGPT
→ Used ChatGPT to generate diverse task queries and expand dataset coverage
→ Trained an annotator model on collected data to generate workflows for new queries
→ Fine-tuned Llama-3.1-8B on this dataset to create WorkflowLlama
-----
Key Insights from this Paper 💡:
→ Data quality and scale are crucial for workflow orchestration capability
→ Three-phase data construction ensures diversity and complexity
→ Hierarchical thought generation improves model understanding
→ Quality confirmation steps maintain dataset integrity
-----
Results 📊:
→ Outperformed all baselines including GPT-4
→ Handled complex workflows with 70+ actions vs 6 actions for GPT-4
→ Demonstrated strong generalization to unseen APIs and instructions
→ Achieved 77.5% F1 score on out-of-distribution T-Eval benchmark
Share this post