Skip to content

Expose a WandB run_name argument #101

@lewtun

Description

@lewtun

Currently the WandB runs are named according to the value of output_dir, but this requires every run to have a separate value for output_dir to avoid collisions with WandB on repeated runs, e.g. one sporadically hits this error:

actor]: 2025-11-07 08:15:29,293 - pipelinerl.utils - ERROR - Exception in actor: Run init
ialization has timed out after 90.0 sec. Please try increasing the timeout with the `init_
timeout` setting: `wandb.init(settings=wandb.Settings(init_timeout=120))`.
[preprocessor]: 2025-11-07 08:15:29,293 - pipelinerl.utils - ERROR - Exception in preproce
ss: Run initialization has timed out after 90.0 sec. Please try increasing the timeout wit
h the `init_timeout` setting: `wandb.init(settings=wandb.Settings(init_timeout=120))`.
[preprocessor]: 2025-11-07 08:15:29,298 - pipelinerl.utils - ERROR - Traceback: Traceback 
(most recent call last):
  File "/fsx/lewis/git/pipeline-rl-cmu/prl/lib/python3.11/site-packages/wandb/sdk/wandb_in
it.py", line 997, in init
    result = wait_with_progress(
             ^^^^^^^^^^^^^^^^^^^
  File "/fsx/lewis/git/pipeline-rl-cmu/prl/lib/python3.11/site-packages/wandb/sdk/mailbox/
wait_with_progress.py", line 23, in wait_with_progress
    return wait_all_with_progress(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/fsx/lewis/git/pipeline-rl-cmu/prl/lib/python3.11/site-packages/wandb/sdk/mailbox/
wait_with_progress.py", line 77, in wait_all_with_progress
    return asyncer.run(progress_loop_with_timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fsx/lewis/git/pipeline-rl-cmu/prl/lib/python3.11/site-packages/wandb/sdk/lib/asyn
cio_manager.py", line 136, in run
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/admin/home/lewis/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/pyth
on3.11/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/admin/home/lewis/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/pyth
on3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
TimeoutError: Timed out waiting for response on p9imn6j7ynez

Moreover, repeated runs write to the same WandB run which is a bit counterintuitive and quite different from other frameworks which assign a new WandB ID per run (usually the auto-generated one)

It would be good to expose a run_name arg so that users can specify the desired run name in the config / runtime while being able to use a fixed value for output_dir (e.g. useful when debugging)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions