"WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning"

Playback speed

Share post at current time

0:00

Transcript

"WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning"

The podcast on this paper is generated with Google's Illuminate.

Rohan Paul

Jan 05, 2025

WebRL, proposed in this paper, transforms weak open LLMs into powerful web agents using self-evolving reinforcement learning

https://arxiv.org/abs/2411.02337

Original Problem 🤔:

Open source LLMs perform poorly as web agents compared to proprietary models like GPT-4. Base Llama-3.1-8B has only 4.8% success rate on WebArena-Lite benchmark, making open LLMs impractical for web automation tasks.

-----

Solution in this Paper 🛠️:

→ Introduces WebRL - a self-evolving online curriculum reinforcement learning framework for training web agents

→ Uses a curriculum that automatically generates new training tasks from failed attempts, matched to agent's current skill level

→ Implements an outcome-supervised reward model to evaluate task success

→ Employs KL-divergence constraints between reference and actor policies to prevent catastrophic forgetting

→ Uses experience replay buffer with confidence filtering to retain successful trajectories

-----

Key Insights 💡:

→ Curriculum learning with dynamic task generation is crucial for continuous improvement

→ KL-divergence constraints effectively prevent policy drift during online learning

→ Filtering replay data based on action perplexity helps balance between familiar and challenging tasks

→ Focus on successful trajectories avoids incorrect intermediate state estimation

-----

Results 📊:

→ Improves Llama-3.1-8B from 4.8% to 42.4% success rate

→ Boosts GLM-4-9B from 6.1% to 43% success rate

→ Achieves 49.1% with Llama-3.1-70B

→ Outperforms GPT-4-Turbo (17.6%) and AutoWebGLM (18.2%)

Rohan's Bytes

"WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning"

Discussion about this video