-
Notifications
You must be signed in to change notification settings - Fork 47
[WIP] Add MATCH_WITHOUT_ALLOCATING_EXTEND match option #1405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
jacobtkeio
wants to merge
30
commits into
flux-framework:master
Choose a base branch
from
jacobtkeio:match_extend
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Problem: The heredoc in 'loading job-validator conf works' does not ignore tabs and does not indent new lines, violating the semi-standard format of sharness tests. Make the heredoc ignore tabs and indent every line.
Problem: resource_match_opts.* has two related classes, resource_opt_t and resource_prop_t, which both share setters and getters for every option in the resource module. resource_opt_t has a resource_prop_t member variable and calls m_resource_prop.get_...() in its own resource_opt_t::get_...() functions, and likewise for its setters and is_set-ers. This causes half of resource_match_opts.* to be boilerplate. Solution: Make resource_opt_t inherit from rsrc_prop_t, and replace all resource_opt_t function calls with the inherited resource_prop_t implementations. Remove the duplicated functions in resource_match_opts.*.
Problem: Checks for negative jobspec duration typically occur before submission to the traverser, but some code paths do not perform this check. Check for negative jobspec durations in dfu_impl. This is cheap and centralizes the check.
Problem: For now, match_without_allocating will consume a jobid. All jobids must be queryable (by, say, flux ion-resource info JOBID), but a match without an allocation is not strictly a job, nor does it have a state like ALLOCATED or RESERVED. Regardless, add a MATCHED state to the jobinfo enum. This way, the jobid won't just disappear to the user, and asking for information about the jobid will return a sensible MATCHED result.
Problem: In many places in resource/, we switch on a string to create an appropriate match_op_t. This is prone to error because misspellings are more likely to occur across so many duplicated strings, and any changes to the match_op_t enums need to be mirrored correctly across many files. Create a string_to_match_op function in match_op.h to centralize all string -> match_op_t mapping.
Problem: run () has many strcmps for match commands instead of using the string_to_match_op function. This is prone to error because these strings need to be written identically in multiple places. Pass a match_op_t into run () instead of a const char *. Convert the strings to match_op_ts in run_match instead.
Problem: Both cmd_match and cmd_match_multi in command.cpp switch on the input command to return a match_op_t. This functionality has since been centralized in resource/policy/base/match_op.h, so it is much simpler to call the string_to_match_op function instead of leaving in the switch statements. Replace the switch statements in cmd_match* with calls to string_to_match_op and match_op_is_valid.
Problem: The traverser has no way to handle match_without_allocating requests because there is no match_op_t value for them. Create one.
Problem: Fluxion cannot search for matching resources without also allocating them. However, it is useful to query for matches and perform allocation separately, especially for scheduling across several clusters. Add a match_without_allocating match option to the DFU traverser's run_match function. It returns matches without allocating them in the planner. It only looks at the exact given time.
Problem: A single match-wo-alloc request is not very information-dense because it is confined to a single start time. In a multicluster scheduling scheme, such time-constrained match queries might not provide enough information to be useful. If a match-without-alloc request fails, try matching in the future until we find the first available match and return it. In other words, return the first-available resources on match-wo-alloc requests.
Problem: The match-without-allocating match option is accessible from the resource module, but the resource module treats it as a normal allocation request. Therefore, both match_without_allocating requests and the job state of the requests' jobid will return ALLOCATED, which is inaccurate. Return and record a MATCHED state on match_without_allocating requests to the resource module.
Problem: The match_wo_alloc match option is not accessible from flux ion-resource. Add a flux ion-resource match without_allocating command.
Problem: reapi_cli_t::match_allocate checks for run () != 0, but it only exits with -1 if errno == ENOMEM. It should always exit if run () returns a non-zero result. Return -1 when rq->traverser_run() returns a non-zero result regardless of the value of errno
Problem: The match_wo_alloc match option is not accessible from the C++ REAPI. Add a match-without-allocating match option to the C++ REAPI.
Problem: The match_wo_alloc match option is not accessible from the C REAPI. Add a match-without-allocating match option to the C REAPI.
Problem: The match_wo_alloc match option is not accessible from the resource-query utility. Add a match without_allocating option to resource-query.
Problem: No tests cover the functionality of match-without-allocating. Add t4016 to test match-without-allocating.
Problem: REAPI does not have any allocation unit testing. Add unit tests of initialization and allocation via Catch2.
Problem: No test exists for the MATCH_WITHOUT_ALLOCATING match_op_t in the C REAPI Add one
Problem: No test exists for the MATCH_WITHOUT_ALLOCATING match_op_t in the C++ REAPI Add one
Problem: In all traversals, the output is loaded into a single match_writers_t. It can only emit the output once, and it can only output to a single resource format. This is limiting because we cannot alter or reformat a set of matched resources after the traversal has finished. Concretely, this prevents us from responding to match RPCs (ctx->writer) while also accessing matched resources in the graph (vertex_writer). Make run() and find() accept std::vectors of match_writers_ts, and add wrappers to keep support for calls with single match_writer_ts.
Problem: The planner interface has no way to query the start time of the next job, which is useful for reasoning about future allocations. Add a planner_unavail_time_first function that returns the start time of the next job or -1 if there is none.
Problem: A single match_wo_alloc request is not very information-dense because it is confined to one execution timespan. In a multicluster scheduling scheme, such time-constrained match queries might not provide enough information to be useful. Add a match option that, after matching without allocation, searches ahead and reports the full amount of time those resources could be used before the next job starts or the graph ends. This makes each match_wo_alloc_extend more useful by giving the scheduler options about when it could allocate those resources past the initial execution timespan.
Problem: Matching without allocating with duration extension is implemented in resource_match and the traverser, but it is not accessible through flux ion-resource. Make it accessible via a without_allocating_extend subcommand.
Problem: Matching without allocating with duration extension is implemented in resource_match and the traverser, but it is not handled correctly in the C++ REAPI. Handle it in match_allocate(cmd="without_allocating_extend").
Problem: Matching without allocating with duration extension is implemented in resource_match and the traverser, but it is not advertised in the C REAPI. Add its description to the C REAPI's match functions.
Problem: Matching without allocating with duration extension is implemented in resource-match and the traverser, but it is not handled correctly in resource_query. Add match without_allocating_extend into the rq help text and ensure its resources are returned as MATCHED rather than ALLOCATED or RESERVED.
Problem: No test exists for match_without_allocate_extend's ability to extend the duration of matched resources until their next allocation or the end of the graph. Add a testcase to t4014 that compares the expiration of equivalent match_allocate and match_without_allocating_extend requests to see if and how match_without_allocating_extend actually extends its match's duration.
Problem: No test exists for the MATCH_WITHOUT_ALLOCATING_EXTEND match option in the C REAPI. Add one.
Problem: No test exists for the MATCH_WITHOUT_ALLOCATING_EXTEND match option in the C++ REAPI. Add one.
485c434
to
a6ed1f5
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #1405 +/- ##
========================================
+ Coverage 77.2% 77.3% +0.1%
========================================
Files 114 117 +3
Lines 16320 16893 +573
========================================
+ Hits 12609 13071 +462
- Misses 3711 3822 +111
🚀 New features to boost your workflow:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds a MATCH_WITHOUT_ALLOCATING_EXTEND match option, which behaves identically to MATCH_WITHOUT_ALLOCATING except that it extends the durations of its matches as much as possible until the next scheduled job or the end of the resource graph.
This makes it much easier to implement a "longest" match policy in the future, which could significantly increase the chance of finding a simultaneous allocation across two or more clusters by increasing the surface area across which cluster-specific matches could overlap in time.
This PR depends on #1387.