-
Notifications
You must be signed in to change notification settings - Fork 54
Description
This is a design/discussion issue for the command line arguments and syntax for a jobspec-oriented run/submit interface.
The main idea is that there is a "slot shape", and a target entity for task scheduling. I'm not 100% sold on my own terminology here so please feel free to propose alternatives. Essentially, you get whether the task is per-slot or per-resource and which resource from the --target
parameter, which defaults to slot. The number of tasks can either be per-target, or a total count, and the shape is specified with a restricted version of the original short-form jobspec I proposed. Here's a sketch of the interface:
flux run
--file
: read jobspec from a file, TODO determine override behavior, for now mutually exclusive with all else
OR--target
: either slot or a specific resource specified in the request, default slot--slot-shape
: short-form resource shape, default:Node
, current format thought<resource-type>[\[<min>\[:<max>\]\]]\[><resource>|,<resource at same level>]
, basically what we discussed long ago but limited in what can be specified for now, still set up to parse as yaml so you could also put actual yaml/json here if you were sufficiently motivated--shape-file
: read shape as a resource-set from a file--nslots
: number of slots to request, default: 1, also accepts a range to populate count--tasks-per-target
: number of tasks to run per target, either slot or resource, default 1--total-tasks
: total number of tasks to run in some arrangement across resources, mutually exclusive with--tasks-per-target
--time
walltime, using flux duration
use-cases, drawn from rfc14:
- Request 4 nodes:
flux run --nslots 4
- Request between 3 and 30 nodes:
flux run --nslots 3:30
- Request 4 tasks(sic. was nodes, but that would be the same as the following) with at least 2 sockets each, and 4 cores per socket: (not planning to support sockets yet, but)
flux run --nslots 4 --shape socket[2]>core[4]
- Request an exclusive allocation of 4 nodes that have at least two sockets and 4 cores per socket:
flux run --nslots 4 --shape node>socket[2]>core[4]
Skipping the complex examples as we don't plan to support them yet, and for now the recommended mechanism would be writing the jobspec.
use-case set 2:
- Run hostname 20 times on 4 nodes, 5 per node
flux run --nslots 4 --total-tasks 20 hostname
flux run --nslots 4 --tasks-per-slot 5 hostname
flux run --slot-shape node[4] --tasks-per-resource node:5 hostname
- Run 5 copies of hostname across 4 nodes, default distribution:
flux run --nslots 4 --total-tasks 5 hostname
- Run 10 copies of myapp, require 2 cores per copy, for a total of 20 cores:
flux run --nslots 10 --shape core[2] myapp
- Multiple binaries is not necessarily on tap yet, but I'm thinking of allowing you to have multiple of these on the same command line with a separator, probably get to the same place.
- Run 10 copies of app across 10 cores with at least 2GB per core:
flux run --shape (core,memory[2g]) app
(possibly amounts we may need to revisit) - Run 10 copies of app across 2 nodes with at least 4GB per node:
flux run --shape node>memory[4g] --total-tasks 10 app
One possible issue here is that several of our use-cases require the slot to be outside the node for them to be easily expressible. Opening another issue for discussion of jobspec-V1 and ordering shortly.