Skip to content

Offline RL ? #9

@artbelyaev0

Description

@artbelyaev0

I have deep dived through your paper and don't understand how CtRL-SIm was trained, but may be i just skipped some thoughts :)
I have question:

Is it correct, that you pretrained CtRL-SIm model and then fine tuned it by PPO in online rl ? (sampling trajectories in gpudrive by using several updates policy)

If answer is no, what you have used for old policy in PPO ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions