Skip to content

Conversation

jacobtkeio
Copy link
Contributor

This PR adds a MATCH_WITHOUT_ALLOCATING_EXTEND match option, which behaves identically to MATCH_WITHOUT_ALLOCATING except that it extends the durations of its matches as much as possible until the next scheduled job or the end of the resource graph.

This makes it much easier to implement a "longest" match policy in the future, which could significantly increase the chance of finding a simultaneous allocation across two or more clusters by increasing the surface area across which cluster-specific matches could overlap in time.

This PR depends on #1387.

jacobtkeio and others added 30 commits September 9, 2025 11:35
Problem: The heredoc in 'loading job-validator conf works' does not
ignore tabs and does not indent new lines, violating the semi-standard
format of sharness tests.

Make the heredoc ignore tabs and indent every line.
Problem: resource_match_opts.* has two related classes,
resource_opt_t and resource_prop_t, which both share
setters and getters for every option in the resource module.
resource_opt_t has a resource_prop_t member variable
and calls m_resource_prop.get_...() in its own
resource_opt_t::get_...() functions, and likewise for
its setters and is_set-ers. This causes half of
resource_match_opts.* to be boilerplate.

Solution: Make resource_opt_t inherit from rsrc_prop_t,
and replace all resource_opt_t function calls with the
inherited resource_prop_t implementations. Remove the
duplicated functions in resource_match_opts.*.
Problem: Checks for negative jobspec duration typically occur before
submission to the traverser, but some code paths do not perform this
check.

Check for negative jobspec durations in dfu_impl. This is cheap and
centralizes the check.
Problem: For now, match_without_allocating will consume a jobid. All
jobids must be queryable (by, say, flux ion-resource info JOBID), but a
match without an allocation is not strictly a job, nor does it have a
state like ALLOCATED or RESERVED.

Regardless, add a MATCHED state to the jobinfo enum.
This way, the jobid won't just disappear to the user, and asking for
information about the jobid will return a sensible MATCHED result.
Problem: In many places in resource/, we switch on a string to create an
appropriate match_op_t. This is prone to error because misspellings are
more likely to occur across so many duplicated strings, and any changes
to the match_op_t enums need to be mirrored correctly across many files.

Create a string_to_match_op function in match_op.h to centralize all
string -> match_op_t mapping.
Problem: run () has many strcmps for match commands instead of using
the string_to_match_op function. This is prone to error because these
strings need to be written identically in multiple places.

Pass a match_op_t into run () instead of a const char *. Convert the
strings to match_op_ts in run_match instead.
Problem: Both cmd_match and cmd_match_multi in command.cpp switch on the
input command to return a match_op_t. This functionality has since been
centralized in resource/policy/base/match_op.h, so it is much simpler to
call the string_to_match_op function instead of leaving in the switch
statements.

Replace the switch statements in cmd_match* with calls to string_to_match_op and
match_op_is_valid.
Problem: The traverser has no way to handle match_without_allocating
requests because there is no match_op_t value for them.

Create one.
Problem: Fluxion cannot search for matching resources without also
allocating them. However, it is useful to query for matches and
perform allocation separately, especially for scheduling across
several clusters.

Add a match_without_allocating match option to the DFU traverser's
run_match function. It returns matches without allocating them in
the planner. It only looks at the exact given time.
Problem: A single match-wo-alloc request is not very information-dense
because it is confined to a single start time. In a multicluster
scheduling scheme, such time-constrained match queries might not
provide enough information to be useful.

If a match-without-alloc request fails, try matching in the future
until we find the first available match and return it. In other words,
return the first-available resources on match-wo-alloc requests.
Problem: The match-without-allocating match option is accessible from
the resource module, but the resource module treats it as a normal
allocation request. Therefore, both match_without_allocating requests
and the job state of the requests' jobid will return ALLOCATED, which is
inaccurate.

Return and record a MATCHED state on match_without_allocating requests
to the resource module.
Problem: The match_wo_alloc match option is not accessible from flux
ion-resource.

Add a flux ion-resource match without_allocating command.
Problem: reapi_cli_t::match_allocate checks for run () != 0, but it only
exits with -1 if errno == ENOMEM. It should always exit if run ()
returns a non-zero result.

Return -1 when rq->traverser_run() returns a non-zero result regardless
of the value of errno
Problem: The match_wo_alloc match option is not accessible from the
C++ REAPI.

Add a match-without-allocating match option to the C++ REAPI.
Problem: The match_wo_alloc match option is not accessible from the C
REAPI.

Add a match-without-allocating match option to the C REAPI.
Problem: The match_wo_alloc match option is not accessible from the
resource-query utility.

Add a match without_allocating option to resource-query.
Problem: No tests cover the functionality of match-without-allocating.

Add t4016 to test match-without-allocating.
Problem: REAPI does not have any allocation unit testing.

Add unit tests of initialization and allocation via Catch2.
Problem: No test exists for the MATCH_WITHOUT_ALLOCATING match_op_t in
the C REAPI

Add one
Problem: No test exists for the MATCH_WITHOUT_ALLOCATING match_op_t in
the C++ REAPI

Add one
Problem: In all traversals, the output is loaded into a single
match_writers_t. It can only emit the output once, and it can only
output to a single resource format. This is limiting because we cannot
alter or reformat a set of matched resources after the traversal has
finished.
Concretely, this prevents us from responding to match RPCs (ctx->writer)
while also accessing matched resources in the graph (vertex_writer).

Make run() and find() accept std::vectors of match_writers_ts, and add
wrappers to keep support for calls with single match_writer_ts.
Problem: The planner interface has no way to query the start time of the
next job, which is useful for reasoning about future allocations.

Add a planner_unavail_time_first function that returns the start time of
the next job or -1 if there is none.
Problem: A single match_wo_alloc request is not very information-dense
because it is confined to one execution timespan. In a multicluster
scheduling scheme, such time-constrained match queries might not
provide enough information to be useful.

Add a match option that, after matching without allocation, searches
ahead and reports the full amount of time those resources could be used
before the next job starts or the graph ends. This makes each
match_wo_alloc_extend more useful by giving the scheduler options about
when it could allocate those resources past the initial execution
timespan.
Problem: Matching without allocating with duration extension is
implemented in resource_match and the traverser, but it is not
accessible through flux ion-resource.

Make it accessible via a without_allocating_extend subcommand.
Problem: Matching without allocating with duration extension is
implemented in resource_match and the traverser, but it is not
handled correctly in the C++ REAPI.

Handle it in match_allocate(cmd="without_allocating_extend").
Problem: Matching without allocating with duration extension is
implemented in resource_match and the traverser, but it is not
advertised in the C REAPI.

Add its description to the C REAPI's match functions.
Problem: Matching without allocating with duration extension is
implemented in resource-match and the traverser, but it is not handled
correctly in resource_query.

Add match without_allocating_extend into the rq help text and ensure its
resources are returned as MATCHED rather than ALLOCATED or RESERVED.
Problem: No test exists for match_without_allocate_extend's ability to
extend the duration of matched resources until their next allocation or
the end of the graph.

Add a testcase to t4014 that compares the expiration of equivalent
match_allocate and match_without_allocating_extend requests to see if
and how match_without_allocating_extend actually extends its match's
duration.
Problem: No test exists for the MATCH_WITHOUT_ALLOCATING_EXTEND match
option in the C REAPI.

Add one.
Problem: No test exists for the MATCH_WITHOUT_ALLOCATING_EXTEND match
option in the C++ REAPI.

Add one.
Copy link

codecov bot commented Sep 12, 2025

Codecov Report

❌ Patch coverage is 91.17175% with 55 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.3%. Comparing base (e5028e9) to head (a6ed1f5).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
resource/reapi/bindings/c++/reapi_cli_impl.hpp 62.5% 9 Missing ⚠️
resource/traversers/dfu_impl.cpp 65.3% 9 Missing ⚠️
resource/writers/match_writers.cpp 35.7% 9 Missing ⚠️
resource/planner/c/planner_c_interface.cpp 73.9% 6 Missing ⚠️
resource/policies/base/match_op.h 61.5% 5 Missing ⚠️
resource/traversers/dfu.cpp 92.1% 4 Missing ⚠️
resource/traversers/dfu_impl_update.cpp 81.8% 4 Missing ⚠️
resource/modules/resource_match_opts.cpp 90.0% 3 Missing ⚠️
resource/reapi/test/reapi_cli_test_c.cpp 98.2% 3 Missing ⚠️
resource/reapi/test/reapi_cli_test_cxx.cpp 98.5% 3 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           master   #1405     +/-   ##
========================================
+ Coverage    77.2%   77.3%   +0.1%     
========================================
  Files         114     117      +3     
  Lines       16320   16893    +573     
========================================
+ Hits        12609   13071    +462     
- Misses       3711    3822    +111     
Files with missing lines Coverage Δ
resource/jobinfo/jobinfo.cpp 76.6% <100.0%> (+1.6%) ⬆️
resource/modules/resource.cpp 74.5% <100.0%> (+<0.1%) ⬆️
resource/modules/resource_match.cpp 67.8% <100.0%> (-0.1%) ⬇️
resource/modules/resource_match_opts.hpp 100.0% <ø> (ø)
resource/reapi/bindings/c/reapi_cli.cpp 24.1% <ø> (ø)
resource/traversers/dfu_impl.hpp 95.0% <100.0%> (+0.4%) ⬆️
resource/utilities/command.cpp 80.1% <100.0%> (-0.3%) ⬇️
resource/writers/match_writers.hpp 70.0% <100.0%> (+0.7%) ⬆️
resource/modules/resource_match_opts.cpp 89.4% <90.0%> (+4.8%) ⬆️
resource/reapi/test/reapi_cli_test_c.cpp 98.2% <98.2%> (ø)
... and 8 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants