-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Fixes ray initialization to correctly direct subprocess output #3533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
LGTM thanks for catching this! |
@ozhanozen Hi, I have tried this PR. Just run the command with Then the ouput log show below. (IsaacLabTuneTrainable pid=1414) [ERROR]: Could not find experiment logs within 200.0 seconds. |
Hi @PierrePeng I believe you also need the fix from #3531 |
I'd also check out #3276 which depends on this PR and the other one I linked above |
Thanks @garylvov! I have applied the patch fix in 3531 and 3533. It still didn't work and get the same error as before. It seems that the process haven't invoke the scripts/reinforcement_learning/rl_games/train.py script. |
Hi @PierrePeng, couuld you add
and track what is wrong? Assuming you already have the 3531, you should be able to see some output that might give clues regarding what is wrong. If train.py script is not executed at all, this problem is not directly linked to this PR and it is better to create a new issue for this.
|
Hi @ozhanozen . Here is the log which is based on the #3531 and #3533, and adding |
Thank you for the log @PierrePeng (and for suggesting this functionality @ozhanozen ) It looks like that each training run started as needed, but the exact experiment name couldn't be extracted. I think we need to do add the "Exact experiment name requested from command line: " To RL Games. Previously, this was handled implicitly for RL Games (and explicitly for the rest). However, I don't see that this experiment name is being printed, so we need to add it explicitly. |
Hi @PierrePeng I believe commit fe6d188 in #3531 should resolve this issue, please let me know if it persists |
Description
When running Ray directly from tuner.py, Ray is not correctly initialized within
invoke_tuning_run()
. The two problems associated with this are discussed in #3532. To solve them, this PR:ray_init()
fromutil.get_gpu_node_resources()
. Now, ray needs to be initialized before callingutil.get_gpu_node_resources()
. This change actually reverses Fixes the missing Ray initialization #3350, which was merged to add the missing initialization when usingtuner.py
, but it is safer to explicitly initialize Ray with the correct arguments outside of theutil.get_gpu_node_resources()
.invoke_tuning_run()
to be beforeutil.get_gpu_node_resources()
so we explicitly initialize it before and do not raise an exception later.ray_init()
if Ray was already initialized.Fixes #3532
Type of change
Screenshots
Change 1:

Change 2:

Change 3:

Checklist
pre-commit
checks with./isaaclab.sh --format
config/extension.toml
fileCONTRIBUTORS.md
or my name already exists there