-
Notifications
You must be signed in to change notification settings - Fork 857
Open
Description
Problem Description
Would it be useful to add a complex (nested/dictionary) action and obs space variant of the PPO algo? I did this for minerl
and wondered if it would be useful to contribute into the main library? I'd happily make a PR.
Checklist
- I have checked that there is no similar issue in the repo.
- I have checked the documentation site and found not relevant information in GitHub issues.
Current Behavior
Currently PPO only supports continuous or discrete actions separately and a single array observation.
Expected Behavior
PPO can support arbitrary complex action and observation spaces.
Possible Solution
- Use
tree
to map over actions and observation. - Store arrays in the same struct shape as the obs space or flatten them for storage and unflatten when passing to the network.
Metadata
Metadata
Assignees
Labels
No labels