-
Notifications
You must be signed in to change notification settings - Fork 47
[WIP] Change allocation and reservation behavior of implicit slot exclusivity #1408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
76dd71a
to
9be2a92
Compare
e25bef6
to
fb6c232
Compare
I rebased the PR off the latest |
@milroy It looks like the new expected output is correct. Please feel free to update this test |
fb6c232
to
43c4ac7
Compare
43c4ac7
to
83b958b
Compare
Problem: adding and resetting exclusive resource types currently involves checking insertion return to determine if an out-of-memory event occurred. C++ dictates that insertions will throw `bad_alloc` in that case. Remove the checks which increase complexity unnecessarily.
Problem: there is unexpected and unintentional behavior of the implicit exclusivity of resources specified below a slot in a jobspec. In particular, while the vertices are allocated or reserved and their metadata entered appropriately, their counts are not propagated to parent aggregate filters. The missing propagation is due to exclusive being set to false in `dfu_impl_t::prune()`. This behavior translates to aggregate filters tracking fewer than the actual allocated or reserved counts. This condition negates the benefit of aggregate filters and leads to searching far more of the graph that necessary. Add a jobspec traversal in prime_jobspec. The traversal sets all resources under each slot as exclusive. It also enables extension to allowing slots to contain resources requested non-exclusively.
Problem: t3001 expected output doesn't include sub-slot resources allocated implicitly. Update the expected output to include the full set of allocated resources.
Problem: t3002 expected output doesn't include sub-slot resources allocated implicitly. Update the expected output to include the full set of allocated resources.
Problem: t3003 expected output doesn't include sub-slot resources allocated implicitly. Update the expected output to include the full set of allocated resources.
Problem: t3004 expected output doesn't include sub-slot resources allocated implicitly. Update the expected output to include the full set of allocated resources.
Problem: t3005 expected output doesn't include sub-slot resources allocated implicitly. Update the expected output to include the full set of allocated resources.
Problem: t3006 expected output doesn't include sub-slot resources allocated implicitly. Update the expected output to include the full set of allocated resources.
Problem: t3007 expected output doesn't include sub-slot resources allocated implicitly. Update the expected output to include the full set of allocated resources.
Problem: t3008 expected output doesn't include sub-slot resources allocated implicitly. Update the expected output to include the full set of allocated resources.
Problem: t3009 expected output doesn't include sub-slot resources allocated implicitly. Update the expected output to include the full set of allocated resources.
Problem: t3014 expected output doesn't include sub-slot resources allocated implicitly. Update the expected output to include the full set of allocated resources.
Problem: t3024 expected output doesn't include sub-slot resources allocated implicitly. Update the expected output to include the full set of allocated resources.
Problem: t3031 expected output doesn't include sub-slot resources allocated implicitly. Update the expected output to include the full set of allocated resources.
Problem: t3035 expected output doesn't include sub-slot resources allocated implicitly. Furthermore, the aggregate filter counts do not accurately reflect the occupied resources. Update the expected output to include the full set of allocated resources, and update jobspecs and commands to test for desired behavior. That includes checking for jobid tags and adjusting the slot location for some jobspecs.
Problem: t3038 expected output doesn't include sub-slot resources allocated implicitly. Update the expected output to include the full set of allocated resources.
83b958b
to
5ade75c
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1408 +/- ##
======================================
Coverage 77.6% 77.6%
======================================
Files 116 116
Lines 16694 16700 +6
======================================
+ Hits 12956 12963 +7
+ Misses 3738 3737 -1
🚀 New features to boost your workflow:
|
Problem summary
PR #1406 identified the unexpected and unintentional behavior of the implicit exclusivity of resources specified below a
slot
in a jobspec. In particular, while the vertices are allocated or reserved and their metadata entered appropriately, their counts are not propagated to parent aggregate filters. The missing propagation is due toexclusive
being set tofalse
here:flux-sched/resource/traversers/dfu_impl.cpp
Line 238 in 8e1718d
which translates to
dfu
being empty here:flux-sched/resource/traversers/dfu_impl_update.cpp
Line 75 in 8e1718d
hence the aggregate filter isn't updated.
Beyond presenting difficulties for partial cancel, this behavior translates to aggregate filters tracking fewer than the actual allocated or reserved counts. This condition negates the benefit of aggregate filters and leads to searching far more of the graph that necessary.
Solution
This PR introduces a jobspec traversal in
prime_jobspec
. The traversal sets all resources under each slot as exclusive. It also enables extension to allowing slots to contain resources requested non-exclusively. This PR also changes a large number of expected output. I've reviewed the output and made changes to commands as needed (e.g., changingfind jobid-alloc=1 agfilter=t
tofind jobid-span=1 agfilter=t
since higher-level vertices do not need to be tagged with an allocation corresponding of lower-level vertices).The extent of the changes means this PR should be WIP to allow further analysis. This PR is based on PR #1406.