-
ppo-implementation-details
The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
Im currently working with A2C. The model was able to learn open ai pong, i ran this as a sanity check that i havent made any bugs. Now im trying to make the model play breakout, but still after 10m steps the model has not made any significant progress. Im using baseline hyperparameters which can be found here https://github.com/openai/baselines/blob/master/baselines/a2c/a2c.py, except my buffersize have been from 512 to 4096. Ive noticed that entropy decreases extremely slowly given the buffersize from the interval which i just gave. So my questions are how to make entropy decrease and how to increase rewards per buffer? Ive tried to decrease the entropy coefficient to almost zero, but still it acts very weirdly.
You might find our PPO blog post helpful - https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/