Skip to content

Conversation

jacobtkeio
Copy link
Contributor

@jacobtkeio jacobtkeio commented Aug 7, 2025

This PR adds a MATCH_WITHOUT_ALLOCATING match option to sched-fluxion-resource, flux ion-resource, the C and C++ REAPI, and resource-query. A request for MATCH_WITHOUT_ALLOCATING will return resources that a jobspec would have been matched with under MATCH_ALLOCATE, but it will not actually allocate those resources.
A caveat is that MATCH_WITHOUT_ALLOCATING will return the soonest-available match, even into the future, while MATCH_ALLOCATE will return failure if no match is possible starting exactly at the requested match time.

Matching without allocating is a first step towards basic multi-cluster scheduling.
A more useful next step is to find and report the longest available matching resources in a cluster, for which the likelihood that another cluster has available matching resources within the same timeframe is highest. Then, users can broadcast requests for the longest available matching resources on many clusters (without actually allocating the requests) and then choose suitable times and resources on which to actually allocate their multi-cluster job.

@jacobtkeio jacobtkeio changed the title Add match_without_allocating match option [WIP] Add match_without_allocating match option Aug 7, 2025
@jacobtkeio jacobtkeio force-pushed the no_reserve branch 3 times, most recently from 0d42cf9 to 082f9b6 Compare August 14, 2025 21:17
@jacobtkeio jacobtkeio force-pushed the no_reserve branch 3 times, most recently from 79b16e8 to 5e05680 Compare August 22, 2025 21:38
@jacobtkeio jacobtkeio force-pushed the no_reserve branch 2 times, most recently from 9ec2a31 to ab90e65 Compare August 28, 2025 17:44
Problem: The heredoc in 'loading job-validator conf works' does not
ignore tabs and does not indent new lines, violating the semi-standard
format of sharness tests.

Make the heredoc ignore tabs and indent every line.
Problem: resource_match_opts.* has two related classes,
resource_opt_t and resource_prop_t, which both share
setters and getters for every option in the resource module.
resource_opt_t has a resource_prop_t member variable
and calls m_resource_prop.get_...() in its own
resource_opt_t::get_...() functions, and likewise for
its setters and is_set-ers. This causes half of
resource_match_opts.* to be boilerplate.

Solution: Make resource_opt_t inherit from rsrc_prop_t,
and replace all resource_opt_t function calls with the
inherited resource_prop_t implementations. Remove the
duplicated functions in resource_match_opts.*.
Problem: Checks for negative jobspec duration typically occur before
submission to the traverser, but some code paths do not perform this
check.

Check for negative jobspec durations in dfu_impl. This is cheap and
centralizes the check.
Problem: For now, match_without_allocating will consume a jobid. All
jobids must be queryable (by, say, flux ion-resource info JOBID), but a
match without an allocation is not strictly a job, nor does it have a
state like ALLOCATED or RESERVED.

Regardless, add a MATCHED state to the jobinfo enum.
This way, the jobid won't just disappear to the user, and asking for
information about the jobid will return a sensible MATCHED result.
Problem: In many places in resource/, we switch on a string to create an
appropriate match_op_t. This is prone to error because misspellings are
more likely to occur across so many duplicated strings, and any changes
to the match_op_t enums need to be mirrored correctly across many files.

Create a string_to_match_op function in match_op.h to centralize all
string -> match_op_t mapping.
Problem: run () has many strcmps for match commands instead of using
the string_to_match_op function. This is prone to error because these
strings need to be written identically in multiple places.

Pass a match_op_t into run () instead of a const char *. Convert the
strings to match_op_ts in run_match instead.
Problem: Both cmd_match and cmd_match_multi in command.cpp switch on the
input command to return a match_op_t. This functionality has since been
centralized in resource/policy/base/match_op.h, so it is much simpler to
call the string_to_match_op function instead of leaving in the switch
statements.

Replace the switch statements in cmd_match* with calls to string_to_match_op and
match_op_is_valid.
Problem: The traverser has no way to handle match_without_allocating
requests because there is no match_op_t value for them.

Create one.
Problem: Fluxion cannot search for matching resources without also
allocating them. However, it is useful to query for matches and
perform allocation separately, especially for scheduling across
several clusters.

Add a match_without_allocating match option to the DFU traverser's
run_match function. It returns matches without allocating them in
the planner. It only looks at the exact given time.
Problem: A single match-wo-alloc request is not very information-dense
because it is confined to a single start time. In a multicluster
scheduling scheme, such time-constrained match queries might not
provide enough information to be useful.

If a match-without-alloc request fails, try matching in the future
until we find the first available match and return it. In other words,
return the first-available resources on match-wo-alloc requests.
Problem: The match-without-allocating match option is accessible from
the resource module, but the resource module treats it as a normal
allocation request. Therefore, both match_without_allocating requests
and the job state of the requests' jobid will return ALLOCATED, which is
inaccurate.

Return and record a MATCHED state on match_without_allocating requests
to the resource module.
Problem: The match_wo_alloc match option is not accessible from flux
ion-resource.

Add a flux ion-resource match without_allocating command.
Problem: reapi_cli_t::match_allocate checks for run () != 0, but it only
exits with -1 if errno == ENOMEM. It should always exit if run ()
returns a non-zero result.

Return -1 when rq->traverser_run() returns a non-zero result regardless
of the value of errno
Problem: The match_wo_alloc match option is not accessible from the
C++ REAPI.

Add a match-without-allocating match option to the C++ REAPI.
Problem: The match_wo_alloc match option is not accessible from the C
REAPI.

Add a match-without-allocating match option to the C REAPI.
Problem: The match_wo_alloc match option is not accessible from the
resource-query utility.

Add a match without_allocating option to resource-query.
Problem: No tests cover the functionality of match-without-allocating.

Add t4016 to test match-without-allocating.
@jacobtkeio jacobtkeio force-pushed the no_reserve branch 3 times, most recently from af7c3c0 to 192624b Compare September 12, 2025 17:33
@jacobtkeio jacobtkeio changed the title [WIP] Add match_without_allocating match option Add match_without_allocating match option Sep 12, 2025
milroy and others added 3 commits September 19, 2025 09:57
Problem: REAPI does not have any allocation unit testing.

Add unit tests of initialization and allocation via Catch2.
Problem: No test exists for the MATCH_WITHOUT_ALLOCATING match_op_t in
the C REAPI

Add one
Problem: No test exists for the MATCH_WITHOUT_ALLOCATING match_op_t in
the C++ REAPI

Add one
Copy link

codecov bot commented Sep 19, 2025

Codecov Report

❌ Patch coverage is 91.25874% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.0%. Comparing base (e5028e9) to head (258d8d7).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
resource/reapi/bindings/c++/reapi_cli_impl.hpp 60.8% 9 Missing ⚠️
resource/reapi/test/reapi_cli_test_c.cpp 94.1% 5 Missing ⚠️
resource/reapi/test/reapi_cli_test_cxx.cpp 93.8% 5 Missing ⚠️
resource/modules/resource_match_opts.cpp 90.0% 3 Missing ⚠️
resource/policies/base/match_op.h 70.0% 3 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           master   #1387     +/-   ##
========================================
- Coverage    77.2%   77.0%   -0.3%     
========================================
  Files         114     117      +3     
  Lines       16320   16587    +267     
========================================
+ Hits        12609   12775    +166     
- Misses       3711    3812    +101     
Files with missing lines Coverage Δ
resource/jobinfo/jobinfo.cpp 76.6% <100.0%> (+1.6%) ⬆️
resource/modules/resource.cpp 74.5% <100.0%> (+<0.1%) ⬆️
resource/modules/resource_match.cpp 67.7% <100.0%> (-0.3%) ⬇️
resource/modules/resource_match_opts.hpp 100.0% <ø> (ø)
resource/reapi/bindings/c/reapi_cli.cpp 21.9% <ø> (ø)
resource/traversers/dfu.cpp 89.5% <100.0%> (+0.4%) ⬆️
resource/traversers/dfu_impl.hpp 95.0% <100.0%> (+0.4%) ⬆️
resource/traversers/dfu_impl_update.cpp 80.3% <100.0%> (+0.1%) ⬆️
resource/utilities/command.cpp 80.1% <100.0%> (-0.4%) ⬇️
resource/modules/resource_match_opts.cpp 89.4% <90.0%> (+4.8%) ⬆️
... and 4 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants