-
Notifications
You must be signed in to change notification settings - Fork 47
Add support for growing allocations #1340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🍔
resource/modules/resource_match.cpp
Outdated
goto error; | ||
} | ||
} else { | ||
if (strcmp ("grow_allocate", cmd) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to solidify my understanding here:
- When there is a call to the match request callback, the only condition (match op) this is allowed for is
grow_allocate
- In the cast that the jobid exists but it's not
grow_allocate
, that is an error case. Only that one supports an existing id. - In the case we don't have a job id and it is
grow_allocate
that's another error because we require one! - Shrinking an allocation is done via partial cancel (thus not in this function)
One case I'm interested to test is if it's possible to shrink down to 0 and then go back up, and also what happens if you try to grow or shrink in unreasonable ways (where those ways are TBA!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right on all four items.
One case I'm interested to test is if it's possible to shrink down to 0 and then go back up
It should be possible to do that in sched
but not in core
.
MATCH_ALLOCATE, | ||
MATCH_ALLOCATE_W_SATISFIABILITY, | ||
MATCH_ALLOCATE_ORELSE_RESERVE, | ||
MATCH_GROW_ALLOCATION, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion for naming consistency:
MATCH_GROW_ALLOCATION, | |
MATCH_GROW_ALLOCATE, |
I shouldn't be reviewing this early, I've read that word less than 10 times and am getting semantic satiation. "ALLOCATE" looks like some kind of fruit name to me right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I waffled on this and am back to preferring MATCH_GROW_ALLOCATION
. My current thinking is that the pattern is verb_verb_{noun/verb/conjunction}, and GROW_ALLOCATION
is clearer.
// only the initial time matters for allocate | ||
const int64_t target_time = op == match_op_t::MATCH_ALLOCATE ? meta.at : graph_end - 1; | ||
const int64_t target_time = | ||
(op == match_op_t::MATCH_ALLOCATE || op == match_op_t::MATCH_GROW_ALLOCATION) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this syntax say "if the statement is true (if it's allocate or grow) default to the meta.at, otherwise the graph end -1 (or is it the other way around)? I know this syntax exists in JavaScript and I always get it wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if that is the case, it's saying that if we grow or match allocate, we want the time to be the job metadata "at" which I assume is the time matched at? Just to check - for grow do we have a new at time for "at" set here in the job metadata (otherwise it would be the previous at associated with the jobid when the first allocation was done)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for grow do we have a new at time for "at" set here
That's correct, the at
time is relative to when the job request (grow or allocate) is submitted.
// no chance, don't even try | ||
if (op == match_op_t::MATCH_ALLOCATE_ORELSE_RESERVE || op == match_op_t::MATCH_ALLOCATE) { | ||
if (op == match_op_t::MATCH_ALLOCATE_ORELSE_RESERVE || op == match_op_t::MATCH_ALLOCATE | ||
|| op == match_op_t::MATCH_GROW_ALLOCATION) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EEEEEEEEEEEEBUSY continues doing nodey things ⌨️
break; | ||
} | ||
case match_op_t::MATCH_ALLOCATE: | ||
case match_op_t::MATCH_GROW_ALLOCATION: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understand this syntax - does this mean do nothing for match allocate, or that match allocate is the same as match grow allocate? I would guess the latter since one line was just added (so we want EBUSY for both cases).
In that case, this is saying if no resources are available, just return busy. I think this answers my question about reserve - we can't reserve with an update. It's a yes / no, and right now, if we don't have it ask again later!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this mean do nothing for match allocate
Kind of, yes. It's fall-through syntax, see: https://en.cppreference.com/w/cpp/language/switch.html or https://learn.microsoft.com/en-us/cpp/cpp/switch-statement-cpp?view=msvc-170
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be extra clear, maybe add the standard fallthrough annotation here. That way if we turn on fallthrough warnings it will not come back and bug us later.
https://en.cppreference.com/w/cpp/language/attributes/fallthrough
case match_op_t::MATCH_GROW_ALLOCATION: | |
[[fallthrough]]; | |
case match_op_t::MATCH_GROW_ALLOCATION: |
} else { | ||
jobid = static_cast<int64_t> (std::strtoll (args[3].c_str (), NULL, 10)); | ||
if (!ctx->jobs.contains (jobid)) { | ||
std::cerr << "ERROR: nonexistent jobid " << jobid << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is (I think) a third check for existence - I am guessing this is done because the state might change, and better to be safe than sorry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This existence check is on a different code path. Currently, resource-query
isn't a client of all the C++ API functions but it's going to transition to being a full client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh maybe this is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, my browser didn't update the thread and I missed your comment - gotcha.
eff17d2
to
0b51ab9
Compare
Problem: adding support for growing an allocation requires a new match_op. Add match_op_t::MATCH_GROW_ALLOCATION.
Problem: adding support for growing an allocation requires a new match_op. Add match_op_t::MATCH_GROW_ALLOCATION to the traverser. While the new match operation behaves identically to MATCH_ALLOCATE in the traverser, creating a new operation is clearer in intent.
Problem: adding support for growing an allocation requires a new match command in the resource module. Add the grow_allocation command to the module and add error handling for existent or nonexistent jobids.
Problem: adding support for growing an allocation requires a new match operation in the REAPI CLI. Add the MATCH_GROW_ALLOCATION operation to the CLI and add error handling for nonexistent jobids.
Problem: adding support for growing an allocation requires a new match operation in the C REAPI CLI. Add the MATCH_GROW_ALLOCATION operation to the C CLI and add error handling for nonexistent jobids.
Problem: the new MATCH_GROW_ALLOCATION isn't documented in the C REAPI module. Add an entry for MATCH_GROW_ALLOCATION.
Problem: adding support for growing an allocation requires handing the grow_allocation command in resource-query. Add the grow_allocation command to resource-query and add error handling for nonexistent jobids.
Problem: adding support for growing an allocation requires handing the grow_allocation command in rq2. Add the grow_allocation command to rq2 and add error handling for nonexistent jobids.
Problem: growing an allocation is untested in the sharness testsuite. Add tests to the existing grow sharness test.
Problem: growing an allocation does not have any unit testing. Add unit tests of initialization and allocation growth via Catch2.
Problem: shrinking an allocation does not have any unit testing. Add unit tests of initialization and allocation shrink via Catch2.
Problem: growing an allocation is not supported in flux ion-resource. Add the ability to send match RPCs to grow an existing allocation.
Problem: growing an allocation via flux ion-resource is untested. Add tests to grow an existing allocation via flux ion-resource.
I've added Catch2-based unit testing and improved testsuite tests by adding support for allocation growth in That said, there are two issues that are open for discussion:
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1340 +/- ##
========================================
- Coverage 76.4% 76.2% -0.2%
========================================
Files 112 115 +3
Lines 16700 17048 +348
========================================
+ Hits 12765 13002 +237
- Misses 3935 4046 +111
🚀 New features to boost your workflow:
|
I'm good with As to the timing, I think we need to have something in free that does a cancel on any in-flight grow requests. Speaking of that actually, I haven't searched for it but this fails if the job matching its ID is in a state other than "allocated/running" right? Stacking grows on a pending job sounds like a recipe for strange, strange cases. |
This PR adds
grow_allocate
functionality, which attempts to allocate new resources specified by a jobspec to an existing jobid. The PR adds necessary functionality to the traverser, resource module, and REAPI, and adds testsuite tests for correct behavior.This PR is WIPpending client testing in FluxQueue, Fluence, and additional testsuite tests.