Teaching RL agents to explain decisions by showing their training journey
https://arxiv.org/abs/2411.07200
🎯 Original Problem:
This study examines the reproducibility of a novel approach in explainable reinforcement learning that attributes agent decisions to specific trajectory clusters from training data. The original paper claimed four key findings about trajectory influence on agent decisions, but lacked public code and comprehensive implementation details.
Main claims from the original paper was
(i) training on less trajectories induces a lower initial state value,
(ii) trajectories in a cluster present similar high-level patterns,
(iii) distant trajectories influence the decision of an agent, and
(iv) humans correctly identify the attributed trajectories to the decision of the agent.
This new study (for reproducing the original ) could not support the number (iv) above.
-----
🔧 Solution in this Paper:
→ Implemented and tested the original paper's claims across five environments: Grid-World, Seaquest, HalfCheetah, Breakout, and Q*Bert
→ Used trajectory encoders (LSTM for Grid-World, GPT for others) to embed state-action sequences
→ Applied XMeans clustering to group similar trajectories and identify behavior patterns
→ Created complementary datasets by removing one cluster at a time to measure impact
→ Evaluated using metrics like Initial State Value, Action Contrast, and Wasserstein distance
-----
💡 Key Insights:
→ Training with fewer trajectories consistently leads to lower initial state values
→ Trajectories within clusters exhibit similar high-level behavioral patterns
→ Distant trajectories can significantly influence agent decisions
→ Human validation of trajectory attribution remains inconclusive
-----
📊 Results:
→ Successfully reproduced 3 out of 4 original claims with partial validation
→ Grid-World showed closest alignment with original findings
→ Other environments faced reproducibility challenges due to code limitations
→ Introduced new quantitative metrics for better claim validation
Share this post