Quantifying how much reinforcement learning really depends on hyperparameter optimization.
This paper introduces a systematic way to measure how sensitive reinforcement learning algorithms are to hyperparameter tuning across different environments.
-----
https://arxiv.org/abs/2412.07165v1
🤖 Original Problem:
→ Modern reinforcement learning algorithms rely heavily on tuning numerous hyperparameters, with performance varying drastically across environments.
→ The field lacks standardized methods to measure and compare how different algorithms depend on hyperparameter tuning.
-----
🔧 Solution in this Paper:
→ The paper proposes two key metrics: hyperparameter sensitivity and effective hyperparameter dimensionality.
→ Hyperparameter sensitivity measures how much an algorithm's peak performance relies on per-environment tuning versus using fixed parameters.
→ Effective hyperparameter dimensionality quantifies how many parameters must be tuned to achieve near-peak performance.
-----
💡 Key Insights:
→ Performance improvements often come at the cost of increased hyperparameter sensitivity
→ Normalization techniques in PPO, contrary to common belief, can increase sensitivity
→ Different environments require vastly different hyperparameter settings for optimal performance
-----
📊 Results:
→ Study covered 4.3 million runs across Brax MuJoCo domains
→ PPO variants with normalization showed increased performance but higher sensitivity
→ Some normalization methods required tuning of all hyperparameters for peak performance