Design/prototype new run interface

This is a design/discussion issue for the command line arguments and syntax for a jobspec-oriented run/submit interface. 

The main idea is that there is a "slot shape", and a target entity for task scheduling.  I'm not 100% sold on my own terminology here so please feel free to propose alternatives. Essentially, you get whether the task is per-slot or per-resource and which resource from the `--target` parameter, which defaults to slot.  The number of tasks can either be per-target, or a total count, and the shape is specified with a restricted version of the original short-form jobspec I proposed.  Here's a sketch of the interface:

`flux run`

* `--file`: read jobspec from a file, TODO determine override behavior, for now mutually exclusive with all else
OR
* `--target`: either slot or a specific resource specified in the request, default slot
* `--slot-shape`: short-form resource shape, default: `Node`, current format thought `<resource-type>[\[<min>\[:<max>\]\]]\[><resource>|,<resource at same level>]`, basically what we discussed long ago but limited in what can be specified for now, still set up to parse as yaml so you could also put actual yaml/json here if you were sufficiently motivated
* `--shape-file`: read shape as a resource-set from a file
* `--nslots`: number of slots to request, default: 1, also accepts a range to populate  count
* `--tasks-per-target`: number of tasks to run per target, either slot or resource, default 1
* `--total-tasks`: total number of tasks to run in some arrangement across resources, mutually exclusive with `--tasks-per-target`
* `--time` walltime, using [flux duration](https://github.com/flux-framework/rfc/blob/master/spec_23.adoc)

use-cases, drawn from rfc14:

1. Request 4 nodes: `flux run --nslots 4`
2. Request between 3 and 30 nodes: `flux run --nslots 3:30`
3. Request 4 tasks(sic. was nodes, but that would be the same as the following) with at least 2 sockets each, and 4 cores per socket: (not planning to support sockets yet, but) `flux run --nslots 4 --shape socket[2]>core[4]`
4. Request an exclusive allocation of 4 nodes that have at least two sockets and 4 cores per socket: `flux run --nslots 4 --shape node>socket[2]>core[4]`

Skipping the complex examples as we don't plan to support them yet, and for now the recommended mechanism would be writing the jobspec.

use-case set 2:

1. Run hostname 20 times on 4 nodes, 5 per node
    1. `flux run --nslots 4 --total-tasks 20 hostname`
    1. `flux run --nslots 4 --tasks-per-slot 5 hostname`
    1. `flux run --slot-shape node[4] --tasks-per-resource node:5 hostname`
2. Run 5 copies of hostname across 4 nodes, default distribution: `flux run --nslots 4 --total-tasks 5 hostname`
3. Run 10 copies of myapp, require 2 cores per copy, for a total of 20 cores: `flux run --nslots 10 --shape core[2] myapp`
4. Multiple binaries is not necessarily on tap yet, but I'm thinking of allowing you to have multiple of these on the same command line with a separator, probably get to the same place.  
5. Run 10 copies of app across 10 cores with at least 2GB per core: `flux run --shape (core,memory[2g]) app` (possibly amounts we may need to revisit)
6. Run 10 copies of app across 2 nodes with at least 4GB per node: `flux run --shape node>memory[4g] --total-tasks 10 app`

One possible issue here is that several of our use-cases require the slot to be outside the node for them to be easily expressible.  Opening another issue for discussion of jobspec-V1 and ordering shortly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Design/prototype new run interface #2213

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Design/prototype new run interface #2213

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions