Skip to content

Conversation

@dwsutherland
Copy link
Member

@dwsutherland dwsutherland commented Nov 12, 2025

closes #7070

From element

David Sutherland
12:30
The only trouble states are failed/submit-failed and succeeded (which would be a pointless total with this change, as it would always be zero)..

David Sutherland
13:14
The other question would be; what do we do with family states? (which selects the most important state of it's tasks)
Should they represent n=0 too?

Maybe represent n=0 state if an active task(s) exist, otherwise n>0

This is what's done here, n=0 tasks have precedence with family state totals, workflow state totals.

n0-state-totals

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • Covered by existing tests.
  • Changelog entry included if this is a change that can affect users
  • Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

@dwsutherland dwsutherland added this to the 8.6.1 milestone Nov 12, 2025
@dwsutherland dwsutherland self-assigned this Nov 12, 2025
@dwsutherland dwsutherland changed the title other & test fixes focus family and workflow state totals about n=0 tasks Nov 12, 2025
@dwsutherland
Copy link
Member Author

Although state totals are tested.. A specific test would be to ensure failed state totals are zero in the presence of a handled failed state (which would be out of n=0)..
As can be demonstrated manually:

[scheduler]
    UTC mode = True
    allow implicit tasks = True
[scheduling]
    initial cycle point = 20100101T00
    final cycle point = 20200101T00
    [[graph]]
        R1 = "prep => foo"
        PT12H = """
foo[-PT12H] => foo => bar
fiz:failed => bar
"""
[runtime]
    [[root]]
        script = sleep 10
    [[FOO]]
    [[foo]]
        inherit = FOO
    [[fiz]]
        post-script = false
image

@oliver-sanders
Copy link
Member

The other question would be; what do we do with family states? (which selects the most important state of it's tasks)

In #7070 we were only thinking about the top-level workflow state counts as these are what we use in the workflows view (cylc-ui sidebar) where we think restricting the counts to n=0 would improve the usefulness of the state indicators.

The aggregate family states are probably best left as they are (i.e, the aggregate family state reflects the state of all the tasks it contains within the current n-window). This way the tree view makes more sense, we also want to bring offline data (which will lie outside of the scheduler's n-window) into the GUI sometime which will probably also require aggregate state to represent all relevant tasks contained within the store.

I appreciate that the aggregate family states are awkwardly related to the top-level workflow state counts due to the implementation which makes this difference a bit fun :(

@dwsutherland
Copy link
Member Author

dwsutherland commented Nov 12, 2025

I appreciate that the aggregate family states are awkwardly related to the top-level workflow state counts due to the implementation which makes this difference a bit fun :(

Yeah, this.. I think I'm gonna need another field, or internal dict, if I can't use the same.

@dwsutherland
Copy link
Member Author

dwsutherland commented Nov 13, 2025

I appreciate that the aggregate family states are awkwardly related to the top-level workflow state counts due to the implementation which makes this difference a bit fun :(

Yeah, this.. I think I'm gonna need another field, or internal dict, if I can't use the same.

Actually, I think it's fine the way it is implemented here..
Because, let's think about this:

  • if the family contains only n>0 states (child families & tasks) then it will be the aggregate of all states
  • if the family contains only n=0 states, then it will be an aggregate of those states
  • if the family contains a mix of n=0 and n>0, then the n=0 states represented have higher priority and usually do anyway from extract_group_state,
      ordered_states = [
          TASK_STATUS_SUBMIT_FAILED,
          TASK_STATUS_FAILED,
          TASK_STATUS_EXPIRED,
          TASK_STATUS_RUNNING,
          TASK_STATUS_SUBMITTED,
          TASK_STATUS_PREPARING,
          TASK_STATUS_WAITING,
          TASK_STATUS_SUCCEEDED
      ]
    
    with the exception/inconsistency being handled failed/submit-failed that will represent n>0 families if present. i.e.
image

But I think this is acceptable..
Because all it says is n=0 is higher priority than n>0 (which includes handled failures) and then, within those, ordered_states priority is followed..
i.e. handled failures are only represented by the family when succeeded (n>0) are the only others present (since all other states are n=0).
(obviously all unhandled failures would take priority in both n>=0, which is what we want)

@oliver-sanders
Copy link
Member

oliver-sanders commented Nov 13, 2025

Because all it says is n=0 is higher priority than n>0 (which includes handled failures) and then, within those, ordered_states priority is followed..

It is often the case that n=0 states trump n!=0 states, however, this is not true in general.

All states can be present in n=0. All but preparing, submitted, running can be n!=0.

So if n=0 contains a succeeded task and n=1 contains a failed task, the family status would come out as succeeded.

@dwsutherland
Copy link
Member Author

dwsutherland commented Nov 14, 2025

It is often the case that n=0 states trump n!=0 states, however, this is not true in general.

I think it's reasonable to redefine it as true.. It says; an active family hasn't failed if there are handled failures amongst it's members.

@dwsutherland
Copy link
Member Author

@oliver-sanders - Ok I've changed the family/group state to be extracted from all..
image

This doesn't cause a problem in that aggregate because the group state is always determined from the state totals, which is generated from sub-family state-totals (not the group/family state) and task states.

@oliver-sanders
Copy link
Member

Ok I've changed the family/group state to be extracted from all..

Thanks, sorry for the confusion.

Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - but could you put comments in the code to explain the state total meanings:

  • task totals n=0 only (these are the important states for monitoring at full-workflow summary)
  • family totals n-window (so that family totals in workflow views - e.g. treeview - reflect the members shown in the view)

@dwsutherland
Copy link
Member Author

dwsutherland commented Nov 21, 2025

family totals n-window (so that family totals in workflow views - e.g. treeview - reflect the members shown in the view)

Family totals are n=0 also (because the workflow state totals are a collection of all family totals), we don't actually use these in any other way.

The family state (however), is determined from all child task/family states n>=0.

@wxtim
Copy link
Member

wxtim commented Nov 25, 2025

This definately needs an update to Cylc docs. I will do that myself if I have time (but I have a big review that I would really like to finish.

@oliver-sanders
Copy link
Member

oliver-sanders commented Nov 25, 2025

This definately needs an update to Cylc docs. I will do that myself if I have time (but I have a big review that I would really like to finish.

We'll be putting in a changes entry for the workflows sidebar refresh, we'll mention this change in that entry. I.e, doesn't need to be done with this PR.

Comment on lines 2171 to 2174
State totals of families reflect zero n-window (n=0), if no n=0
children (tasks/families) exist then the totals include the states of
all children. Family group state, however, is determined from all child
states via the n>=0 state totals.
Copy link
Member

@oliver-sanders oliver-sanders Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, but it's a bit confusing having the state totals mean different things depending on whether the family is n=0 or not.

We don't actually need to compute the parent family state from the state totals as we can derive it from the child states contained within it instead.

I put together a small change which:

  • Makes family state totals reflect n=0 in all cases.
  • Computes the parent family state from the child states.

dwsutherland#25

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Computes the parent family state from the child states.

We couldn't do this before, because child states weren't homogenous (WRT what they represented).. but can now.. merged

@wxtim
Copy link
Member

wxtim commented Nov 25, 2025

Now reviewing 8.6.x...oliver-sanders:cylc-flow:ds-n0-totals-i7070 - i.e. this PR if Oliver's PR is merged in.

I think that Oliver's change makes more sense.

oliver-sanders and others added 3 commits November 26, 2025 11:12
* Family state totals now always reflect the n=0 irrespective of whether
  the family is itself in the n=0 window or not.
* Replace state counter with a state set.
Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@oliver-sanders oliver-sanders merged commit 45e5e25 into cylc:8.6.x Nov 27, 2025
22 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants