New benchmark reveals LLMs struggle with complex graph-based workflows, scoring 15% lower than linear tasks.
Share this post
Benchmarking Agentic Workflow Generation
Share this post
New benchmark reveals LLMs struggle with complex graph-based workflows, scoring 15% lower than linear tasks.