Skip to content

Conversation

milroy
Copy link
Member

@milroy milroy commented Sep 26, 2025

Problem summary

PR #1406 identified the unexpected and unintentional behavior of the implicit exclusivity of resources specified below a slot in a jobspec. In particular, while the vertices are allocated or reserved and their metadata entered appropriately, their counts are not propagated to parent aggregate filters. The missing propagation is due to exclusive being set to false here:

if (resource.type != slot_rt && (rc = by_excl (meta, s, u, exclusive, resource)) == -1)

which translates to dfu being empty here:
if (subtree_plan && !dfu.empty ()) {

hence the aggregate filter isn't updated.

Beyond presenting difficulties for partial cancel, this behavior translates to aggregate filters tracking fewer than the actual allocated or reserved counts. This condition negates the benefit of aggregate filters and leads to searching far more of the graph that necessary.

Solution

This PR introduces a jobspec traversal in prime_jobspec. The traversal sets all resources under each slot as exclusive. It also enables extension to allowing slots to contain resources requested non-exclusively. This PR also changes a large number of expected output. I've reviewed the output and made changes to commands as needed (e.g., changing find jobid-alloc=1 agfilter=t to find jobid-span=1 agfilter=t since higher-level vertices do not need to be tagged with an allocation corresponding of lower-level vertices).

The extent of the changes means this PR should be WIP to allow further analysis. This PR is based on PR #1406.

@milroy milroy force-pushed the implicit-exclusivity branch 2 times, most recently from 76dd71a to 9be2a92 Compare September 29, 2025 06:41
@milroy milroy force-pushed the implicit-exclusivity branch 2 times, most recently from e25bef6 to fb6c232 Compare October 2, 2025 22:59
@milroy
Copy link
Member Author

milroy commented Oct 2, 2025

I rebased the PR off the latest master, and as expected one of the t3038-resource-flexible.t tests failed. @zekemorton, can you look at the output and let me know if the new output is correct?

@zekemorton
Copy link
Collaborator

@milroy It looks like the new expected output is correct. Please feel free to update this test

milroy added 16 commits October 8, 2025 12:10
Problem: adding and resetting exclusive resource types currently
involves checking insertion return to determine if an out-of-memory
event occurred. C++ dictates that insertions will throw `bad_alloc` in
that case.

Remove the checks which increase complexity unnecessarily.
Problem: there is unexpected and unintentional behavior of the implicit
exclusivity of resources specified below a slot in a jobspec. In
particular, while the vertices are allocated or reserved and their
metadata entered appropriately, their counts are not propagated to
parent aggregate filters. The missing propagation is due to exclusive
being set to false in `dfu_impl_t::prune()`.

This behavior translates to aggregate filters tracking fewer than the
actual allocated or reserved counts. This condition negates the benefit
of aggregate filters and leads to searching far more of the graph that
necessary.

Add a jobspec traversal in prime_jobspec. The traversal sets all
resources under each slot as exclusive. It also enables extension to
allowing slots to contain resources requested non-exclusively.
Problem: t3001 expected output doesn't include sub-slot resources
allocated implicitly.

Update the expected output to include the full set of allocated
resources.
Problem: t3002 expected output doesn't include sub-slot resources
allocated implicitly.

Update the expected output to include the full set of allocated
resources.
Problem: t3003 expected output doesn't include sub-slot resources
allocated implicitly.

Update the expected output to include the full set of allocated
resources.
Problem: t3004 expected output doesn't include sub-slot resources
allocated implicitly.

Update the expected output to include the full set of allocated
resources.
Problem: t3005 expected output doesn't include sub-slot resources
allocated implicitly.

Update the expected output to include the full set of allocated
resources.
Problem: t3006 expected output doesn't include sub-slot resources
allocated implicitly.

Update the expected output to include the full set of allocated
resources.
Problem: t3007 expected output doesn't include sub-slot resources
allocated implicitly.

Update the expected output to include the full set of allocated
resources.
Problem: t3008 expected output doesn't include sub-slot resources
allocated implicitly.

Update the expected output to include the full set of allocated
resources.
Problem: t3009 expected output doesn't include sub-slot resources
allocated implicitly.

Update the expected output to include the full set of allocated
resources.
Problem: t3014 expected output doesn't include sub-slot resources
allocated implicitly.

Update the expected output to include the full set of allocated
resources.
Problem: t3024 expected output doesn't include sub-slot resources
allocated implicitly.

Update the expected output to include the full set of allocated
resources.
Problem: t3031 expected output doesn't include sub-slot resources
allocated implicitly.

Update the expected output to include the full set of allocated
resources.
Problem: t3035 expected output doesn't include sub-slot resources
allocated implicitly. Furthermore, the aggregate filter counts do not
accurately reflect the occupied resources.

Update the expected output to include the full set of allocated
resources, and update jobspecs and commands to test for desired
behavior. That includes checking for jobid tags and adjusting the slot
location for some jobspecs.
Problem: t3038 expected output doesn't include sub-slot resources
allocated implicitly.

Update the expected output to include the full set of allocated
resources.
@milroy milroy force-pushed the implicit-exclusivity branch from 83b958b to 5ade75c Compare October 8, 2025 19:10
Copy link

codecov bot commented Oct 8, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.6%. Comparing base (9ab5b4d) to head (a20d82e).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@          Coverage Diff           @@
##           master   #1408   +/-   ##
======================================
  Coverage    77.6%   77.6%           
======================================
  Files         116     116           
  Lines       16694   16700    +6     
======================================
+ Hits        12956   12963    +7     
+ Misses       3738    3737    -1     
Files with missing lines Coverage Δ
resource/policies/base/matcher.cpp 73.4% <100.0%> (+0.7%) ⬆️
resource/traversers/dfu.cpp 90.1% <100.0%> (+<0.1%) ⬆️
resource/traversers/dfu_flexible.cpp 92.1% <100.0%> (+0.8%) ⬆️
resource/traversers/dfu_flexible.hpp 93.7% <ø> (ø)
resource/traversers/dfu_impl.cpp 87.3% <100.0%> (+0.1%) ⬆️
resource/traversers/dfu_impl.hpp 94.5% <ø> (ø)
resource/traversers/dfu_impl_update.cpp 79.1% <ø> (-0.6%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants