Skip to content

Conversation

trws
Copy link
Member

@trws trws commented Aug 19, 2025

problem: recent PRs have added things that require more recent flux versions and not added any check for the version being found

solution: add the required version into the pkg_check_modules call in cmake/FindFluxCore.cmake

Related to #1390

@trws trws requested a review from garlick August 19, 2025 17:37
problem: recent PRs have added things that require more recent flux
versions and not added any check for the version being found

solution: add the required version into the pkg_check_modules call in
cmake/FindFluxCore.cmake
Copy link

codecov bot commented Aug 19, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (master@5c04074). Learn more about missing BASE report.
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff            @@
##             master   #1391   +/-   ##
========================================
  Coverage          ?   77.2%           
========================================
  Files             ?     114           
  Lines             ?   16320           
  Branches          ?       0           
========================================
  Hits              ?   12609           
  Misses            ?    3711           
  Partials          ?       0           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@grondo
Copy link
Contributor

grondo commented Aug 19, 2025

Hm, I didn't think the intent of any recent PR was to bump the required version of flux-core (yet). Do we know what exactly caused the incompatibility? It might be nice to fix that and allow flux-sched to still be built alongside the deployed version of core.

@trws
Copy link
Member Author

trws commented Aug 19, 2025

It may actually pass, I misread how the flags were being added in 1390, if so this version should change, but we should be tracking a working target version this way so that users get an appropriate message when trying to use one that's too old.

@garlick
Copy link
Member

garlick commented Aug 19, 2025

#1390 just sets a broker attribute in some tests. That broker attribute doesn't exist in previous releases but setting an unknown attribute isn't treated as an error, for better or worse.

@grondo
Copy link
Contributor

grondo commented Aug 19, 2025

I am getting errors running make check on a TOSS 4 system with flux-core v0.76.0. However, I went back to the latest flux-sched tag and still seem to get the same errors, so my feeling is that this is a Fluxion testsuite error and not any dependence on a given version of flux-core. I haven't run down the source of the errors yet though (actually I wonder if anyone else has issues, or perhaps it is user error on my part)

@trws
Copy link
Member Author

trws commented Aug 19, 2025 via email

@trws
Copy link
Member Author

trws commented Aug 19, 2025

Were the errors like these? https://github.com/flux-framework/flux-sched/actions/runs/17078065102/job/48424613295#step:6:2084

I’m not 100% sure why, but a couple of different PRs got those errors for a while earlier today, then they seemed to be resolved. Not sure why…

@grondo
Copy link
Contributor

grondo commented Aug 19, 2025

It doesn't appear to be similar, though I may need to do more testing. Here's an example failure from t1007-recovery-full.t:

expecting success: 
    load_resource match-format=rv1 policy=high &&
    load_qmanager

Aug 19 21:44:57.102922 UTC 2025 sched-fluxion-resource.info[0]: version 0.45.0
Aug 19 21:44:57.103778 UTC 2025 sched-fluxion-resource.info[0]: populate_resource_db: loaded resources from core's resource.acquire
Aug 19 21:44:57.135515 UTC 2025 sched-fluxion-qmanager.info[0]: version 0.45.0
ok 3 - recovery: loading flux-sched modules works (rv1)

expecting success: 
    jobid1=$(flux job submit basic.json) &&
    jobid2=$(flux job submit basic.json) &&
    jobid3=$(flux job submit basic.json) &&
    jobid4=$(flux job submit basic.json) &&
    jobid5=$(flux job submit basic.json) &&
    jobid6=$(flux job submit basic.json) &&
    jobid7=$(flux job submit basic.json) &&
    flux job wait-event -t 10 ${jobid4} start &&
    flux job wait-event -t 10 ${jobid7} submit

1755639897.500147 start
1755639897.639151 submit userid=6885 urgency=16 flags=0 version=1
ok 4 - recovery: submit to occupy resources fully (rv1)

expecting success: 
    remove_qmanager &&
    remove_resource &&
    flux cancel ${jobid1} &&
    flux job wait-event -t 10 ${jobid1} release

flux-job: wait-event timeout on event 'release'
not ok 5 - recovery: cancel one running job without fluxion
#	
#	    remove_qmanager &&
#	    remove_resource &&
#	    flux cancel ${jobid1} &&
#	    flux job wait-event -t 10 ${jobid1} release
#	

expecting success: 
    load_resource match-format=rv1 policy=high &&
    load_qmanager_sync &&
    test_must_fail flux ion-resource info ${jobid1} &&
    flux ion-resource info ${jobid2} | grep "ALLOCATED" &&
    flux ion-resource info ${jobid3} | grep "ALLOCATED" &&
    flux ion-resource info ${jobid4} | grep "ALLOCATED" &&
    flux job wait-event -t 10 ${jobid5} start &&
    test_expect_code 3 flux ion-resource info ${jobid6}

Aug 19 21:45:07.987805 UTC 2025 sched-fluxion-resource.info[0]: version 0.45.0
Aug 19 21:45:07.988870 UTC 2025 sched-fluxion-resource.info[0]: populate_resource_db: loaded resources from core's resource.acquire
Aug 19 21:45:08.018220 UTC 2025 sched-fluxion-qmanager.info[0]: version 0.45.0
{
 "queues": {
  "default": {
   "policy": "fcfs",
   "queue_depth": 32,
   "max_queue_depth": 1000000,
   "queue_parameters": {},
   "policy_parameters": {},
   "action_counts": {
    "pending": 3,
    "running": 4,
    "reserved": 0,
    "rejected": 0,
    "complete": 0,
    "cancelled": 0,
    "reprioritized": 0
   },
   "pending_queues": {
    "pending": [
     "fppqqGB",
     "frFs93V",
     "fsZUWR5"
    ],
    "pending_provisional": [],
    "blocked": []
   },
   "scheduled_queues": {
    "running": [
     "fkTKx5D",
     "fmxoDhZ",
     "fiyLfjD",
     "foLrYvB"
    ],
    "rejected": [],
    "canceled": []
   }
  }
 }
}
JOBID                STATUS               AT                   OVERHEAD (Secs)     
27548188672          ALLOCATED            2025-08-19T21:44:57  0.000239722         
test_must_fail: command succeeded: flux ion-resource info fiyLfjD
not ok 6 - recovery: works when both modules restart (rv1)
#	
#	    load_resource match-format=rv1 policy=high &&
#	    load_qmanager_sync &&
#	    test_must_fail flux ion-resource info ${jobid1} &&
#	    flux ion-resource info ${jobid2} | grep "ALLOCATED" &&
#	    flux ion-resource info ${jobid3} | grep "ALLOCATED" &&
#	    flux ion-resource info ${jobid4} | grep "ALLOCATED" &&
#	    flux job wait-event -t 10 ${jobid5} start &&
#	    test_expect_code 3 flux ion-resource info ${jobid6}
#	


(with more cascading failures as the test progresses)

@trws
Copy link
Member Author

trws commented Aug 19, 2025

Huh... where were you testing? I just built and ran master on dane and everything passed first try. There must be something going on but I'm having trouble tracing it down.

@grondo
Copy link
Contributor

grondo commented Aug 19, 2025

Hm, this was tuolumne. I was also wondering if it is something in my environment. Sorry I haven't had a chance to run this down yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants