Skip to content

Design/prototype new run interface #2213

@trws

Description

@trws

This is a design/discussion issue for the command line arguments and syntax for a jobspec-oriented run/submit interface.

The main idea is that there is a "slot shape", and a target entity for task scheduling. I'm not 100% sold on my own terminology here so please feel free to propose alternatives. Essentially, you get whether the task is per-slot or per-resource and which resource from the --target parameter, which defaults to slot. The number of tasks can either be per-target, or a total count, and the shape is specified with a restricted version of the original short-form jobspec I proposed. Here's a sketch of the interface:

flux run

  • --file: read jobspec from a file, TODO determine override behavior, for now mutually exclusive with all else
    OR
  • --target: either slot or a specific resource specified in the request, default slot
  • --slot-shape: short-form resource shape, default: Node, current format thought <resource-type>[\[<min>\[:<max>\]\]]\[><resource>|,<resource at same level>], basically what we discussed long ago but limited in what can be specified for now, still set up to parse as yaml so you could also put actual yaml/json here if you were sufficiently motivated
  • --shape-file: read shape as a resource-set from a file
  • --nslots: number of slots to request, default: 1, also accepts a range to populate count
  • --tasks-per-target: number of tasks to run per target, either slot or resource, default 1
  • --total-tasks: total number of tasks to run in some arrangement across resources, mutually exclusive with --tasks-per-target
  • --time walltime, using flux duration

use-cases, drawn from rfc14:

  1. Request 4 nodes: flux run --nslots 4
  2. Request between 3 and 30 nodes: flux run --nslots 3:30
  3. Request 4 tasks(sic. was nodes, but that would be the same as the following) with at least 2 sockets each, and 4 cores per socket: (not planning to support sockets yet, but) flux run --nslots 4 --shape socket[2]>core[4]
  4. Request an exclusive allocation of 4 nodes that have at least two sockets and 4 cores per socket: flux run --nslots 4 --shape node>socket[2]>core[4]

Skipping the complex examples as we don't plan to support them yet, and for now the recommended mechanism would be writing the jobspec.

use-case set 2:

  1. Run hostname 20 times on 4 nodes, 5 per node
    1. flux run --nslots 4 --total-tasks 20 hostname
    2. flux run --nslots 4 --tasks-per-slot 5 hostname
    3. flux run --slot-shape node[4] --tasks-per-resource node:5 hostname
  2. Run 5 copies of hostname across 4 nodes, default distribution: flux run --nslots 4 --total-tasks 5 hostname
  3. Run 10 copies of myapp, require 2 cores per copy, for a total of 20 cores: flux run --nslots 10 --shape core[2] myapp
  4. Multiple binaries is not necessarily on tap yet, but I'm thinking of allowing you to have multiple of these on the same command line with a separator, probably get to the same place.
  5. Run 10 copies of app across 10 cores with at least 2GB per core: flux run --shape (core,memory[2g]) app (possibly amounts we may need to revisit)
  6. Run 10 copies of app across 2 nodes with at least 4GB per node: flux run --shape node>memory[4g] --total-tasks 10 app

One possible issue here is that several of our use-cases require the slot to be outside the node for them to be easily expressible. Opening another issue for discussion of jobspec-V1 and ordering shortly.

Metadata

Metadata

Assignees

Labels

designdon't expect this to ever be closed...

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions