Implemented for Tensorflow 2.0+
- DDPG with prioritized replay
- Primal-Dual DDPG for CMDP
- SAC Discrete
- Install dependancies imported (my tf2 conda env as reference)
- Each file contains example code that runs training on CartPole env
- Training:
python3 TF2_DDPG_LSTM.py - Tensorboard:
tensorboard --logdir=DDPG/logs
- Install hyperopt https://github.com/hyperopt/hyperopt
- Optional: switch agent used and configure param space in
hyperparam_tune.py - Run:
python3 hyperparam_tune.py
Agents tested using CartPole env.
| Name | On/off policy | Model | Action space support |
|---|---|---|---|
| DQN | off-policy | Dense, LSTM | discrete |
| DDPG | off-policy | Dense, LSTM | discrete, continuous |
| AE-DDPG | off-policy | Dense | discrete, continuous |
| SAC:bug: | off-policy | Dense | continuous |
| PPO | on-policy | Dense | discrete, continuous |
| Name | On/off policy | Model | Action space support |
|---|---|---|---|
| Primal-Dual DDPG | off-policy | Dense | discrete, continuous |
Models used to generate the demos are included in the repo, you can also find q value, reward and/or loss graphs
| DQN Basic, time step = 4, 500 reward | DQN LSTM, time step = 4, 500 reward |
|---|---|
![]() |
![]() |
| DDPG Basic, 500 reward | DDPG LSTM, time step = 5, 500 reward |
|---|---|
![]() |
![]() |
| AE-DDPG Basic, 500 reward | PPO Basic, 500 reward |
|---|---|
![]() |
![]() |





