Skip to content

Conversation

sam-maloney
Copy link
Contributor

Problem: we cannot define arbitrary jobspec resources lists easily on the command line

Define a compact string representation that can be mapped to a full jobspec resources request

Apologies for stepping on any toes, but I was reading through @vsoch's draft in #371 and the discussion in flux-framework/flux-core#2213 and this idea started forming in my head, and wouldn't leave me alone until I had written it down, and then it snowballed into this, hahaha!

I'm not sure if people are still interested in this feature or not, but figured I'd open the PR in any case. And I think my main takeaway would be that whatever is settled on it shouldn't be V1 specific, otherwise it's just one more thing that needs to be transitioned eventually. This syntax makes simple requests very simple and compact, but imposes no restrictions on arbitrarily complex RFC14 compliant resources lists. Of course at some point it may make little sense to ever express such things directly on the command line, but one might as well avoid restricting anything needlessly!

And the fundamental idea could likely be extended to full jobspec requests instead of just resources if that was ultimately desirable, but would likely make more sense as a later step/addition.

There's a prototype parser on a branch in my flux-core fork modeled after the constraint parser.

@garlick
Copy link
Member

garlick commented Aug 18, 2025

This is pretty neat! It's nice how this spec incorporates the resource range spec. It's a thorough job with a formal spec and an experimental parser. Great work!

Any thoughts from @vsoch or @trws? The other work stalled out "because El Capitan" I would guess :-)

@vsoch
Copy link
Member

vsoch commented Aug 18, 2025

I like the idea (the concepts are similar between what we have talked about previously). I wasn't particularly busy with El Cap - I assumed shape just wasn't a priority so the issue would linger and ultimately be closed.

I think the part I'm having trouble with is looking at the shape objects and having a hard time reading them out. We have to figure out a way to make it easy to show an (understandable) shape and it still looks like a really steep learning curve. In other words, looking at the examples I'm not convinced users (beyond a small set of advanced) are going to want to learn how to create these.

@garlick
Copy link
Member

garlick commented Aug 18, 2025

Yeah I figured "wasn't a priority" == "doesn't affect El Cap" given the timing :-)

@sam-maloney
Copy link
Contributor Author

I think the part I'm having trouble with is looking at the shape objects and having a hard time reading them out. We have to figure out a way to make it easy to show an (understandable) shape and it still looks like a really steep learning curve. In other words, looking at the examples I'm not convinced users (beyond a small set of advanced) are going to want to learn how to create these.

Understandability is definitely a valid concern, although also one that runs somewhat orthogonal to compactness, which is also important for CLI usability. For myself, I think the full YAML/JSON is likely going to be the most understandable form to begin with, so this idea was focused on making things compact (while still allowing full complexity).

And I'll be honest I don't really have any great ideas for making resource 'shapes' more understandable to users 😅 Directly specifying resources to Flux will at some level always require users to have at least basic understanding of the graph resource model, and while I like the "directory structure" analogy implied by the forward slashes (as used by OAR) I agree that there's nothing inherently intuitive about the rest of it if one isn't already comfortable with jobspec...

@vsoch
Copy link
Member

vsoch commented Aug 19, 2025

The compactness (or any feature of a cli) is somewhat irrelevant if most users can't grep it easily. My opinion is that If something is hard to use it isn't an issue of "but there is no other way" but that we didn't do a good enough job to think through it. I think a command line cli representation minimally needs to be easy to look at and "get" without having to be an expert user, and (me as a more advanced user) I'm staring at a lot of these examples (my previous PR included) and I don't. Here is an example of something that is more intuitive to me - an explicit definition of shape that just defines parent-child relationships in a simple, readable path format, and then a separate configuration of those components:

flux job submit \
    --shape slot/node/socket/core \
    --set slot:count=4,label=nodelevel \
    --set node:count=1,exclusive=false \
    --set socket:count.min=2 \
    --set core:count.min=4

It's easy to think about parent child relationships, hence the --shape flag. More explicitly, the structure is slot -> node -> socket -> core and that is shown as slot/node/socket/core The set flag works like --set <target>:<key>=<value> to reference named entities in the shape. That is also easy to understand, and similar to what other tools already do (e.g., helm). That does mean that each target entity needs a unique name. Then the dot notation is used for a nested attribute. Then you can have scripts that do separate definition of each, for example:

cmd="flux job submit --shape slot/node/socket/core --set slot:count=4"
if [ "$I_NEED_EXCLUSIVE_MODE" = "true" ]; then
  cmd="$cmd --set node:exclusive=true"
fi
$cmd

And you can tweak things but editing individual flags and you don't have to totally rewrite your shape command (or the parser for it). It's a bit more verbose, but I can look at it and "get it" if that makes sense. I'm not saying this PR is wrong, but I think if we are going to add a feature like shape we need to prioritize simplicity and ability to understand, even if it's a challenging thing to do.

@garlick
Copy link
Member

garlick commented Aug 19, 2025

For me personally, the proposed denser representation does click once the simple rules are known, and the rules could be explained in a man page very quickly IMHO. I'm not bothered by

$ flux submit --shape=slot=4/node=1{-x}/socket=2+/core=4+ ...

in fact I quite like it 🤷

I would guess longer term we would shunt most users off to simpler ways of asking for resources, like maybe with predefined shapes from site- or user- provided config files (similar to flux jobs --format=NAME I guess). Meanwhile having a powerful, compact way to do it is appealing to me.

@trws
Copy link
Member

trws commented Aug 19, 2025

As @vsoch said, the syntax actually looks a lot like what we had talked about long, long ago before trimming it down for jobspec-v1 (slot[4]>node[1]>socket[2:]) adjusted to be more shell-friendly so I have a feeling we're a bit more adjusted to it than others might be. I need to dig into this a bit to grok what some of it means, and might want to tweak how some of it behaves because of how some shells treat chars like + and how we express things elsewhere, but I like the direction a lot. Having it be a simple "path" like this (which is actually something we use in fluxion regularly) makes a lot of sense to me too.

@grondo
Copy link
Contributor

grondo commented Aug 19, 2025

One idea for now -- now that we have CLI plugins -- is to implement this as a CLI plugin that users requiring the more advanced specification can install and use. I think @wihobbs was talking about working on something similar that acts as a filter, but embedded in the cmdline via a plugin with a unique option namespace is an enticing way to offer the functionality.

Perhaps this would allow us to offer something without fully committing, eventually perhaps absorbing the implementation as a native option or new command when we feel it working as desired.

@sam-maloney
Copy link
Contributor Author

sam-maloney commented Aug 20, 2025

I think @garlick's example was what I had in mind, i.e. a single option (in my mind I was visualizing --resources=...) to precisely specify the desired resources (obviously one could just have such an option directly accept a valid JSON/YAML string representation, but that seems very verbose for the CLI).

I also agree with what @vsoch is saying, but that feels to me more like the current interface of specifying things through a set of more "intuitive" options (like --nodes --ntasks --cores-per-task) which I agree is very useful for most users, and the evolution of that interface will be a challenging design problem for more complicated jobspec requests. But I feel like that's perhaps complementary to the idea here, which would target more advanced users who have (or are willing to learn) more of an understanding of the flux resource model and jobspec and want a way to directly specify exactly what they want.

So I guess what I'm saying is both might have a place: something like this as the compact, direct, power-user option, and then an evolution & expansion of the current options for the verbose, intuitive, basic-user mode. 🤷‍♂️

Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. This seems great, and I think most of the concerns raised in discussion have been answered? And @grondo has suggested a path forward that doesn't prevent competition with this spec. Thanks @sam-maloney for your excellent work here.

@garlick
Copy link
Member

garlick commented Sep 16, 2025

@Mergifyio rebase

Problem: we cannot define arbitrary jobspec resources lists easily on
the command line

Define a compact string representation that can be mapped to a full
jobspec resources request
Copy link
Contributor

mergify bot commented Sep 16, 2025

rebase

✅ Branch has been successfully rebased

@garlick garlick force-pushed the command-line-jobspec branch from 30156e3 to 1808669 Compare September 16, 2025 21:06
@garlick
Copy link
Member

garlick commented Sep 16, 2025

I'll set merge-when-passing. Thanks again!

@mergify mergify bot merged commit fce83c0 into flux-framework:master Sep 16, 2025
7 checks passed
@sam-maloney sam-maloney deleted the command-line-jobspec branch September 17, 2025 05:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants