0:00
/
0:00
Transcript

"WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning"

The podcast on this paper is generated with Google's Illuminate.

WebRL, proposed in this paper, transforms weak open LLMs into powerful web agents using self-evolving reinforcement learning

https://arxiv.org/abs/2411.02337

Original Problem 🤔:

Open source LLMs perform poorly as web agents compared to proprietary models like GPT-4. Base Llama-3.1-8B has only 4.8% success rate on WebArena-Lite benchmark, making open LLMs impractical for web automation tasks.

-----

Solution in this Paper 🛠️:

→ Introduces WebRL - a self-evolving online curriculum reinforcement learning framework for training web agents

→ Uses a curriculum that automatically generates new training tasks from failed attempts, matched to agent's current skill level

→ Implements an outcome-supervised reward model to evaluate task success

→ Employs KL-divergence constraints between reference and actor policies to prevent catastrophic forgetting

→ Uses experience replay buffer with confidence filtering to retain successful trajectories

-----

Key Insights 💡:

→ Curriculum learning with dynamic task generation is crucial for continuous improvement

→ KL-divergence constraints effectively prevent policy drift during online learning

→ Filtering replay data based on action perplexity helps balance between familiar and challenging tasks

→ Focus on successful trajectories avoids incorrect intermediate state estimation

-----

Results 📊:

→ Improves Llama-3.1-8B from 4.8% to 42.4% success rate

→ Boosts GLM-4-9B from 6.1% to 43% success rate

→ Achieves 49.1% with Llama-3.1-70B

→ Outperforms GPT-4-Turbo (17.6%) and AutoWebGLM (18.2%)

Discussion about this video