- Trust Region Policy Optimization (https://arxiv.org/pdf/1502.05477.pdf)
- Proximal Policy Optimization (https://arxiv.org/abs/1707.06347)
- Sample Efficient Actor-Critic with Experience Replay (https://arxiv.org/abs/1611.01224)
- Continuous control with deep reinforcement learning (https://arxiv.org/abs/1509.02971)
- Dueling Network Architectures for Deep Reinforcement Learning (https://arxiv.org/abs/1511.06581)
- Deep Reinforcement Learning that Matters (https://arxiv.org/pdf/1709.06560.pdf)
- Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control (https://arxiv.org/pdf/1708.04133.pdf)
Tests were done on 3 different environments: OpenAi-Gym (MountainCarContinuous), Mujoco (Reacher) and Atari (Breakout).
- DQN (src/mountaincar-continuous/dqnandresults/gym-mountaincarcontinuous/dqn)
- DDPG (src/mountaincar-continuous/ddpgandresults/gym-mountaincarcontinuous/ddpg)
- PPO (src/mountaincar-continuous/ppoandresults/gym-mountaincarcontinuous/ppo)
- DDPG (src/baselines/baselines/ddpgandresults/mujoco-reacher/ddpg)
- TRPO (src/baselines/baselines/trpo_mpiandresults/mujoco-reacher/trpo)
- PPO (src/baselines/baselines/ppo2andresults/mujoco-reacher/ppo)
- ACER (src/baselines/baselines/acerandresults/atari-breakout/acer)
- TRPO (src/baselines/baselines/trpo_mpiandresults/atari-breakout/trpo)
- PPO (src/baselines/baselines/ppo2andresults/atari-breakout/ppo)
To get the source code, execute the following commands:
git clone https://github.com/lajoiepy/Reinforcement_Learning_PPO.git
cd Reinforcement_Learning_PPO
git submodule init
git submodule update
- The source code in src/baselinesis a fork of https://github.com/openai/baselines.
- The source code in src/mountaincar-continuous/ddpgis mostly from https://github.com/lirnli/OpenAI-gym-solutions/blob/master/Continuous_Deep_Deterministic_Policy_Gradient_Net/DDPG%20Class%20ver2.ipynb
- The source code in src/mountaincar-continuous/dqnis inspired from http://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html
- The source code in src/mountaincar-continuous/ppois a fork from https://github.com/tpbarron/pytorch-ppo
- Pytorch for src/mountaincar-continuous/dqnandsrc/mountaincar-continuous/ppo.
- Tensorflow for src/mountaincar-continuous/ddpgandsrc/baselines.
- Gym for src/mountaincar-continuous.
- Mujoco and Atari for src/baselines.
- Follow README files for code in src/baselines.
- For PPO on Gym environnements run python3 src/mountaincar-continuous/pytorch-ppo/main.py --env-name MountainCarContinuous-v0.
- For the DQN implementation run python3 mountaincar_dqn.py.
- For the DDPG implementation run python3 mountaincar_ddpg.py.