feat: support for flux operator miniclusters #411

vsoch · 2025-09-15T00:26:06Z

This feature includes the addition to create a Flux Operator MiniCluster running across some subset of rabbit nodes given that the rabbit.mpi directive is defined. For example, this is the minimal that a user needs to do:

flux run -N 4 --setattr=rabbit.mpi=yes sleep 400

That does require the rabbit directives to get pushed through the process, but note that I'm separating the logic for the MiniCluster into attributes so it's easier to read and understand - the strings that Marty was showing me were really not intuitive. By default, setting that to true (or anything) can use a default container base (the base we built with Flux and cxi on ubuntu 24.04) and interactive mode, and everything from that can be customized. Here are more examples:

--setattr=rabbit.mpi.image="ghcr.io/converged-computing/lammps-reax:ubuntu2404-cxi"
--setattr=rabbit.mpi.workdir="/opt/lammps/examples/reaxff/HNS"
--setattr=rabbit.mpi.command='lmp -v x 2 -v y 2 - v x 2 -in in.reax.hnx -nocite'
--setattr=rabbit.mpi.add_flux=false
--setattr=rabbit.mpi.succeed=true
--setattr=rabbit.mpi.tasks=96
--setattr=rabbit.mpi.env.one=ketchup
--setattr=rabbit.mpi.env.two=mustard
--setattr=rabbit.mpi.rabbits=hetchy201,hetchy202
--setattr=rabbit.mpi.nodes=4

Notes

I'll include additional notes here.

flux hop

I added a flux hop command that is able to interact with the same generation classes, but without the requirement of the HPE / workflow operator stuff. This would mimic us manually creating a MiniCluster via CRD on the command line. It's just done with Python. Here is an example:

flux hop python rabbit_client.py \
    --image "ghcr.io/converged-computing/lammps-reax:ubuntu2404-cxi" \
    --command 'lmp -v x 2 -v y 2 - v x 2 -in in.reax.hnx -nocite' \
    --workdir "/opt/lammps/examples/reaxff/HNS" \
    --tasks 96 \
    --nodes 4 \
    --rabbits "hetchy201,hetchy202,hetchy203,hetchy204" \
    --no-add-flux \
    --succeed \
    --env one=ketchup \
    --env two=mustard

It likely won't be used for production given the permissions needed for that, but it will provide us with a means to test (and the command is pretty fun too). It was Marty's idea and I kind of love it. 🐰

TODO

I wrote TODO for all items we can discuss. I have opinions on most of them but I want to know what you think. Some of them are about defaults, and others about features. Don't feel like you need to read before the Hackathon, I can talk through most of them.

MiniCluster Types

As mentioned, we have two modes of operation:

Creation of a MiniCluster with a Flux job / workflow on rabbits (the path we talked about through coral2_dws.py)
Creation of an a-la-carte MiniCluster with flux hop (primarily for testing or fun)

For the second, we require the rabbit node names since we can't get them from an actual job. The second class RabbitMiniCluster is based on the first and is customized to expect the Workflow CRD object and be able to get node names from Flux.

RabbitMPI

The RabbitMPI class is a wrapper around a jobspec that translates it into MiniCluster needs (e.g., What container to use?, Do we add Flux? Should it be interactive?) I like this design because it means we can populate and generate MiniClusters in ways that don't require Flux jobs. We use the jobspec, but that's just a dictionary of attributes that can be created in another way (e.g., flux hop). I thought about removing the jobspec entirely but I don't think it's necessary - it just serves as a "standardized" data structure to derive metadata from.

Todo Items

These are primarily if we move forward with adding this integration. It's just testing for now.

We need a testing suite, likely to run on Hetchy. I could make something for GitHub actions but it would have to operate without actual rabbits.
Documentation for the set of attributes that can be set (see RabbitMPI for what is currently exposed.
If getting job info is redundant, we can add attributes to the jobtap plugin that might be needed.
We also likely want the flux_operator.py to be an actual module somewhere in there. I don't really like the style of "dump everything into one file" so I'd want to have like:

flux_k8s/
  ...
  operator/
     minicluster.py
     rabbit_mpi.py

And since the top level module is flux_k8s we can probably just call it operator to avoid a dreaded underscore.

Apologies for the list of dumb names for the flux hop command - this is for fun, and the only piece I asked Gemini to help produce, and I asked for a docker-like generation style with adjective and noun, and mentioned that I'd contribute to the set. I was horrified when it added a comment with my name to do that. I never told it my name. It claimed "statistical anomaly." 🙃 🤯 😨

ping @jameshcorbett @mcfadden8 @milroy

This feature includes the addition to create a Flux Operator MiniCluster running across some subset of rabbit nodes given that the rabbit.mpi directive is defined. By default, setting that to true (or anything) can use a default container base and interactive mode, and everything from that can be customized. In addition, we have a "flux hop" command that is able to take the same metadata, populate the RabbitMPI Job object, and create the Flux MiniCluster using the same classes/logic but without requiring the HPE stuff and Workflow. This could be used, but likely will be for testing or for fun. Signed-off-by: vsoch <[email protected]>

vsoch · 2025-09-20T02:25:23Z

For our notes, here is the command that worked (for an interactive run) on hetchy. The reason we needed to ask for all 12 nodes was to get around fluxion scheduling and compute node to rabbit assignment.

flux alloc -N12 -Sdw=xfs_small -Srabbit.mpi.image="ghcr.io/converged-computing/lammps-reax:ubuntu2404-cxi" -Srabbit.mpi.workdir="/opt/lammps/examples/reaxff/HNS" -Srabbit.mpi.add_flux=false -Srabbit.mpi.nodes=2  -qparrypeak  echo success

We need to test:

Non-interactive (the command added to the above)
Adding a scoped user set allowed to execute this
Adding a pre command for the worker nodes to wait slightly to allow the lead broker to come up.

For the last, the workers typically have a retry and it isn't clear why this is failing. It would have to be the cast that they are able to connect and then something forces the exit (and that is when they typically cleanly exit, which is what we are seeing).

mcfadden8 · 2025-09-22T15:34:40Z

Does the flux hop command offer a mechanism to provide the rabbit with the paths to the ephemeral file systems that have been created? Having file system access from the Compute and Rabbit nodes is a key feature required by the demo. For the ephemeral lustre file system, the path is the same for images running on the rabbit nodes as it is for things running on the compute nodes. For GFS2, there is a necessary mapping that needs to occur. We will only be using lustre for the demo, but will be looking to use GFS2 when it is supported.

The file system paths are stored in environment variables available to the NNF user containers. This, and semantics for gfs2 path naming are documented here: https://nearnodeflash.github.io/dev/guides/user-containers/readme/?h=user+container#putting-it-all-together

ping @behlendorf, @jameshcorbett

vsoch · 2025-09-22T16:42:26Z

@mcfadden8 depending on when the demo is, I'm not sure we have a reasonable amount of time to consider filesystems for the demo - a run of LAMMPS that is triggered by the submission of a job is what we had scoped it to. Let us know more details of what you had in mind so we can discuss.

This changeset moves the Flux Operator MiniCluster to be a module, operator, under flux_k8s. I have also cleaned up the organization of assets, and better coupled the creation of the MiniCluster with saving the name / namespace so they do not need to be provided again. We will need to implement the function to see if a user is allowed to request a MiniCluster, and work further on adding the additional securityContext needed for production. Signed-off-by: vsoch <[email protected]>

Signed-off-by: vsoch <[email protected]>

jameshcorbett · 2025-09-30T23:57:14Z

src/cmd/flux-hop.py

+        "willow" "accelerator",
+        "algorithm",


missing comma

vsoch force-pushed the add-flux-operator branch from 8ede735 to c194c80 Compare September 15, 2025 00:28

vsoch force-pushed the add-flux-operator branch from 854f9c6 to 396d952 Compare September 16, 2025 04:44

vsoch force-pushed the add-flux-operator branch from b942b32 to a2d4274 Compare September 23, 2025 03:20

vsoch added 22 commits September 23, 2025 16:15

hackathon

e4c3ce4

Signed-off-by: vsoch <[email protected]>

hackathon

b7d6cd7

Signed-off-by: vsoch <[email protected]>

hackathon

7dd7381

Signed-off-by: vsoch <[email protected]>

hackathon

efbff4c

Signed-off-by: vsoch <[email protected]>

hackathon

09fa4cf

Signed-off-by: vsoch <[email protected]>

hackathon

b6a66a5

Signed-off-by: vsoch <[email protected]>

hackathon

8021db2

Signed-off-by: vsoch <[email protected]>

hackathon

437d92d

Signed-off-by: vsoch <[email protected]>

hackathon

4ffc978

Signed-off-by: vsoch <[email protected]>

hackathon

60412ed

Signed-off-by: vsoch <[email protected]>

hackathon

1e2b7e9

Signed-off-by: vsoch <[email protected]>

hackathon

0fb8bdb

Signed-off-by: vsoch <[email protected]>

hackathon

20905bf

Signed-off-by: vsoch <[email protected]>

hackathon

6ee2afc

Signed-off-by: vsoch <[email protected]>

hackathon

1e5ff33

Signed-off-by: vsoch <[email protected]>

hackathon

74c165a

Signed-off-by: vsoch <[email protected]>

hackathon

0ca1daa

Signed-off-by: vsoch <[email protected]>

hackathon

1e141b6

Signed-off-by: vsoch <[email protected]>

hackathon

e859423

Signed-off-by: vsoch <[email protected]>

hackathon

e5bd673

Signed-off-by: vsoch <[email protected]>

hackathon

1bcbcb9

Signed-off-by: vsoch <[email protected]>

hackathon

3566f97

Signed-off-by: vsoch <[email protected]>

vsoch added 5 commits September 23, 2025 17:58

hackathon

00b9ca9

Signed-off-by: vsoch <[email protected]>

hackathon

d137d0d

Signed-off-by: vsoch <[email protected]>

hackathon

3fb9fa9

Signed-off-by: vsoch <[email protected]>

hackathon

26863b2

Signed-off-by: vsoch <[email protected]>

hackathon

c399d21

Signed-off-by: vsoch <[email protected]>

jameshcorbett reviewed Sep 30, 2025

View reviewed changes

src/cmd/flux-hop.py

Comment on lines +111 to +112

"willow" "accelerator",

"algorithm",

Copy link

Member

jameshcorbett Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing comma

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support for flux operator miniclusters #411

feat: support for flux operator miniclusters #411

Uh oh!

vsoch commented Sep 15, 2025

Uh oh!

vsoch commented Sep 20, 2025

Uh oh!

mcfadden8 commented Sep 22, 2025 •

edited

Loading

Uh oh!

vsoch commented Sep 22, 2025

Uh oh!

jameshcorbett Sep 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: support for flux operator miniclusters #411

Are you sure you want to change the base?

feat: support for flux operator miniclusters #411

Uh oh!

Conversation

vsoch commented Sep 15, 2025

Notes

flux hop

TODO

MiniCluster Types

RabbitMPI

Todo Items

Uh oh!

vsoch commented Sep 20, 2025

Uh oh!

mcfadden8 commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vsoch commented Sep 22, 2025

Uh oh!

jameshcorbett Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mcfadden8 commented Sep 22, 2025 •

edited

Loading