-
Notifications
You must be signed in to change notification settings - Fork 47
Add match_without_allocating match option #1387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jacobtkeio
wants to merge
20
commits into
flux-framework:master
Choose a base branch
from
jacobtkeio:no_reserve
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
0d42cf9
to
082f9b6
Compare
79b16e8
to
5e05680
Compare
9ec2a31
to
ab90e65
Compare
Problem: The heredoc in 'loading job-validator conf works' does not ignore tabs and does not indent new lines, violating the semi-standard format of sharness tests. Make the heredoc ignore tabs and indent every line.
Problem: resource_match_opts.* has two related classes, resource_opt_t and resource_prop_t, which both share setters and getters for every option in the resource module. resource_opt_t has a resource_prop_t member variable and calls m_resource_prop.get_...() in its own resource_opt_t::get_...() functions, and likewise for its setters and is_set-ers. This causes half of resource_match_opts.* to be boilerplate. Solution: Make resource_opt_t inherit from rsrc_prop_t, and replace all resource_opt_t function calls with the inherited resource_prop_t implementations. Remove the duplicated functions in resource_match_opts.*.
Problem: Checks for negative jobspec duration typically occur before submission to the traverser, but some code paths do not perform this check. Check for negative jobspec durations in dfu_impl. This is cheap and centralizes the check.
Problem: For now, match_without_allocating will consume a jobid. All jobids must be queryable (by, say, flux ion-resource info JOBID), but a match without an allocation is not strictly a job, nor does it have a state like ALLOCATED or RESERVED. Regardless, add a MATCHED state to the jobinfo enum. This way, the jobid won't just disappear to the user, and asking for information about the jobid will return a sensible MATCHED result.
Problem: In many places in resource/, we switch on a string to create an appropriate match_op_t. This is prone to error because misspellings are more likely to occur across so many duplicated strings, and any changes to the match_op_t enums need to be mirrored correctly across many files. Create a string_to_match_op function in match_op.h to centralize all string -> match_op_t mapping.
Problem: run () has many strcmps for match commands instead of using the string_to_match_op function. This is prone to error because these strings need to be written identically in multiple places. Pass a match_op_t into run () instead of a const char *. Convert the strings to match_op_ts in run_match instead.
Problem: Both cmd_match and cmd_match_multi in command.cpp switch on the input command to return a match_op_t. This functionality has since been centralized in resource/policy/base/match_op.h, so it is much simpler to call the string_to_match_op function instead of leaving in the switch statements. Replace the switch statements in cmd_match* with calls to string_to_match_op and match_op_is_valid.
Problem: The traverser has no way to handle match_without_allocating requests because there is no match_op_t value for them. Create one.
Problem: Fluxion cannot search for matching resources without also allocating them. However, it is useful to query for matches and perform allocation separately, especially for scheduling across several clusters. Add a match_without_allocating match option to the DFU traverser's run_match function. It returns matches without allocating them in the planner. It only looks at the exact given time.
Problem: A single match-wo-alloc request is not very information-dense because it is confined to a single start time. In a multicluster scheduling scheme, such time-constrained match queries might not provide enough information to be useful. If a match-without-alloc request fails, try matching in the future until we find the first available match and return it. In other words, return the first-available resources on match-wo-alloc requests.
Problem: The match-without-allocating match option is accessible from the resource module, but the resource module treats it as a normal allocation request. Therefore, both match_without_allocating requests and the job state of the requests' jobid will return ALLOCATED, which is inaccurate. Return and record a MATCHED state on match_without_allocating requests to the resource module.
Problem: The match_wo_alloc match option is not accessible from flux ion-resource. Add a flux ion-resource match without_allocating command.
Problem: reapi_cli_t::match_allocate checks for run () != 0, but it only exits with -1 if errno == ENOMEM. It should always exit if run () returns a non-zero result. Return -1 when rq->traverser_run() returns a non-zero result regardless of the value of errno
Problem: The match_wo_alloc match option is not accessible from the C++ REAPI. Add a match-without-allocating match option to the C++ REAPI.
Problem: The match_wo_alloc match option is not accessible from the C REAPI. Add a match-without-allocating match option to the C REAPI.
Problem: The match_wo_alloc match option is not accessible from the resource-query utility. Add a match without_allocating option to resource-query.
Problem: No tests cover the functionality of match-without-allocating. Add t4016 to test match-without-allocating.
af7c3c0
to
192624b
Compare
Problem: REAPI does not have any allocation unit testing. Add unit tests of initialization and allocation via Catch2.
Problem: No test exists for the MATCH_WITHOUT_ALLOCATING match_op_t in the C REAPI Add one
Problem: No test exists for the MATCH_WITHOUT_ALLOCATING match_op_t in the C++ REAPI Add one
192624b
to
258d8d7
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #1387 +/- ##
========================================
- Coverage 77.2% 77.0% -0.3%
========================================
Files 114 117 +3
Lines 16320 16587 +267
========================================
+ Hits 12609 12775 +166
- Misses 3711 3812 +101
🚀 New features to boost your workflow:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds a MATCH_WITHOUT_ALLOCATING match option to sched-fluxion-resource, flux ion-resource, the C and C++ REAPI, and resource-query. A request for MATCH_WITHOUT_ALLOCATING will return resources that a jobspec would have been matched with under MATCH_ALLOCATE, but it will not actually allocate those resources.
A caveat is that MATCH_WITHOUT_ALLOCATING will return the soonest-available match, even into the future, while MATCH_ALLOCATE will return failure if no match is possible starting exactly at the requested match time.
Matching without allocating is a first step towards basic multi-cluster scheduling.
A more useful next step is to find and report the longest available matching resources in a cluster, for which the likelihood that another cluster has available matching resources within the same timeframe is highest. Then, users can broadcast requests for the longest available matching resources on many clusters (without actually allocating the requests) and then choose suitable times and resources on which to actually allocate their multi-cluster job.