-
Notifications
You must be signed in to change notification settings - Fork 9
feat: support for flux operator miniclusters #411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
8ede735
to
c194c80
Compare
This feature includes the addition to create a Flux Operator MiniCluster running across some subset of rabbit nodes given that the rabbit.mpi directive is defined. By default, setting that to true (or anything) can use a default container base and interactive mode, and everything from that can be customized. In addition, we have a "flux hop" command that is able to take the same metadata, populate the RabbitMPI Job object, and create the Flux MiniCluster using the same classes/logic but without requiring the HPE stuff and Workflow. This could be used, but likely will be for testing or for fun. Signed-off-by: vsoch <[email protected]>
854f9c6
to
396d952
Compare
For our notes, here is the command that worked (for an interactive run) on hetchy. The reason we needed to ask for all 12 nodes was to get around fluxion scheduling and compute node to rabbit assignment. flux alloc -N12 -Sdw=xfs_small -Srabbit.mpi.image="ghcr.io/converged-computing/lammps-reax:ubuntu2404-cxi" -Srabbit.mpi.workdir="/opt/lammps/examples/reaxff/HNS" -Srabbit.mpi.add_flux=false -Srabbit.mpi.nodes=2 -qparrypeak echo success We need to test:
For the last, the workers typically have a retry and it isn't clear why this is failing. It would have to be the cast that they are able to connect and then something forces the exit (and that is when they typically cleanly exit, which is what we are seeing). |
Does the The file system paths are stored in environment variables available to the NNF user containers. This, and semantics for gfs2 path naming are documented here: https://nearnodeflash.github.io/dev/guides/user-containers/readme/?h=user+container#putting-it-all-together ping @behlendorf, @jameshcorbett |
@mcfadden8 depending on when the demo is, I'm not sure we have a reasonable amount of time to consider filesystems for the demo - a run of LAMMPS that is triggered by the submission of a job is what we had scoped it to. Let us know more details of what you had in mind so we can discuss. |
This changeset moves the Flux Operator MiniCluster to be a module, operator, under flux_k8s. I have also cleaned up the organization of assets, and better coupled the creation of the MiniCluster with saving the name / namespace so they do not need to be provided again. We will need to implement the function to see if a user is allowed to request a MiniCluster, and work further on adding the additional securityContext needed for production. Signed-off-by: vsoch <[email protected]>
b942b32
to
a2d4274
Compare
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
"willow" "accelerator", | ||
"algorithm", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing comma
This feature includes the addition to create a Flux Operator MiniCluster running across some subset of rabbit nodes given that the
rabbit.mpi
directive is defined. For example, this is the minimal that a user needs to do:That does require the rabbit directives to get pushed through the process, but note that I'm separating the logic for the MiniCluster into attributes so it's easier to read and understand - the strings that Marty was showing me were really not intuitive. By default, setting that to true (or anything) can use a default container base (the base we built with Flux and cxi on ubuntu 24.04) and interactive mode, and everything from that can be customized. Here are more examples:
Notes
I'll include additional notes here.
flux hop
I added a
flux hop
command that is able to interact with the same generation classes, but without the requirement of the HPE / workflow operator stuff. This would mimic us manually creating a MiniCluster via CRD on the command line. It's just done with Python. Here is an example:It likely won't be used for production given the permissions needed for that, but it will provide us with a means to test (and the command is pretty fun too). It was Marty's idea and I kind of love it. 🐰
TODO
I wrote TODO for all items we can discuss. I have opinions on most of them but I want to know what you think. Some of them are about defaults, and others about features. Don't feel like you need to read before the Hackathon, I can talk through most of them.
MiniCluster Types
As mentioned, we have two modes of operation:
coral2_dws.py
)flux hop
(primarily for testing or fun)For the second, we require the rabbit node names since we can't get them from an actual job. The second class
RabbitMiniCluster
is based on the first and is customized to expect the Workflow CRD object and be able to get node names from Flux.RabbitMPI
The
RabbitMPI
class is a wrapper around a jobspec that translates it into MiniCluster needs (e.g., What container to use?, Do we add Flux? Should it be interactive?) I like this design because it means we can populate and generate MiniClusters in ways that don't require Flux jobs. We use the jobspec, but that's just a dictionary of attributes that can be created in another way (e.g.,flux hop
). I thought about removing the jobspec entirely but I don't think it's necessary - it just serves as a "standardized" data structure to derive metadata from.Todo Items
These are primarily if we move forward with adding this integration. It's just testing for now.
RabbitMPI
for what is currently exposed.flux_operator.py
to be an actual module somewhere in there. I don't really like the style of "dump everything into one file" so I'd want to have like:And since the top level module is
flux_k8s
we can probably just call itoperator
to avoid a dreaded underscore.Apologies for the list of dumb names for the flux hop command - this is for fun, and the only piece I asked Gemini to help produce, and I asked for a docker-like generation style with adjective and noun, and mentioned that I'd contribute to the set. I was horrified when it added a comment with my name to do that. I never told it my name. It claimed "statistical anomaly." 🙃 🤯 😨
ping @jameshcorbett @mcfadden8 @milroy