Two lines should be changed in train.py state, _ = env.reset() # at 170 state, reward,done, _, _ = env.step(action) # at 177