"Human-Readable Programs as Actors of Reinforcement Learning Agents Using Critic-Moderated Evolution"

Playback speed

Share post at current time

0:00

Transcript

"Human-Readable Programs as Actors of Reinforcement Learning Agents Using Critic-Moderated Evolution"

The podcast on this paper is generated with Google's Illuminate.

Rohan Paul

Dec 27, 2024

Creates explainable RL agents by evolving programs instead of training neural networks.

This paper propses to make black-box RL agents transparent by evolving readable programs guided by TD3 (Twin Delayed Deep Deterministic Policy Gradient) critics

📚 https://arxiv.org/abs/2410.21940

🎯 Original Problem:

Deep Reinforcement Learning (DRL) agents use black-box neural networks, making them hard to understand and trust. This lack of transparency hinders their adoption in real-world control systems where engineers need guarantees of stability and robustness.

-----

🔧 Solution in this Paper:

→ Combines TD3 (Twin Delayed Deep Deterministic Policy Gradient) with Genetic Programming to create human-readable programs

→ Programs are represented as sequences of real values (genome) encoding operators and literals

→ Uses stack-based execution approach with pre-populated input states

→ TD3 critics guide the genetic algorithm by providing gradients for program optimization

→ Introduces stochasticity in operator mapping to smooth the optimization landscape

-----

💡 Key Insights:

→ Direct optimization through TD3 critics instead of environment interactions improves sample efficiency

→ Programs can influence exploration during training rather than being distilled after

→ Stochastic aspects in program representation create smoother optimization landscape

→ Simple operator set allows for readable yet effective policies

-----

📊 Results:

→ Achieves comparable performance to vanilla TD3

→ Several orders of magnitude more sample-efficient than pure genetic programming

→ Successfully generates interpretable navigation programs for SimpleGoal environment

→ Maintains policy quality while providing explainability

Rohan's Bytes

"Human-Readable Programs as Actors of Reinforcement Learning Agents Using Critic-Moderated Evolution"

Discussion about this video