Skip to content

Commit 2291b93

Browse files
authored
Add ntasks for older slurm versions. (#1100)
* Add ntasks for older slurm versions. Without this flag, on slurm 23.11.10 we get this error: ``` sbatch: error: Failed to validate job spec. --gpus-per-task or --tres-per-task used without either --gpus or -n/--ntasks is not allowed. sbatch: error: Invalid generic resource (gres) specification ``` * .
1 parent 50da5af commit 2291b93

File tree

2 files changed

+3
-0
lines changed

2 files changed

+3
-0
lines changed

torchx/schedulers/slurm_scheduler.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -210,6 +210,7 @@ def from_role(
210210
sbatch_opts.setdefault("gpus-per-node", str(resource.gpu))
211211
else:
212212
sbatch_opts.setdefault("gpus-per-task", str(resource.gpu))
213+
sbatch_opts.setdefault("ntasks", "1")
213214

214215
srun_opts = {
215216
"output": f"slurm-{macros.app_id}-{name}.out",

torchx/schedulers/test/slurm_scheduler_test.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,7 @@ def test_replica_request(self, mock_version: MagicMock) -> None:
128128
"--cpus-per-task=2",
129129
"--mem=10",
130130
"--gpus-per-task=3",
131+
"--ntasks=1",
131132
],
132133
)
133134
self.assertEqual(
@@ -163,6 +164,7 @@ def test_replica_request_nomem(self, mock_version: MagicMock) -> None:
163164
"--ntasks-per-node=1",
164165
"--cpus-per-task=2",
165166
"--gpus-per-task=3",
167+
"--ntasks=1",
166168
],
167169
)
168170

0 commit comments

Comments
 (0)