-
Notifications
You must be signed in to change notification settings - Fork 14
Open
Description
I have deep dived through your paper and don't understand how CtRL-SIm was trained, but may be i just skipped some thoughts :)
I have question:
Is it correct, that you pretrained CtRL-SIm model and then fine tuned it by PPO in online rl ? (sampling trajectories in gpudrive by using several updates policy)
If answer is no, what you have used for old policy in PPO ?
Metadata
Metadata
Assignees
Labels
No labels