Skip to content

PU-level scheduling in resource & differing behavior from sched-simple #624

@SteVwonder

Description

@SteVwonder

When running with simple-sched, it appears that the behavior when specifying -c1 to flux-mini, is to run the process on a PU.

❯ FLUX_QMANAGER_RC_NOOP=t FLUX_RESOURCE_RC_NOOP=t ./bin/flux start

❯ flux module list                                                                                                                                                                                                                                                                        21:36:55 ()
Module                   Size Digest  Idle  S Service
userdb                1122616 E537E35    9  S 
aggregator            1141360 4319017    9  S 
cron                  1202976 AC1B9B5    0  S 
kvs                   1558376 D2EDB0A    0  S 
job-exec              1276224 93DC36A    9  S 
connector-local       1110920 A097C9C    0  R 
job-manager           1332792 8B529A1    9  S 
sched-simple          1241920 55B0BE9    9  S sched
kvs-watch             1299296 2D970AF    9  S 
barrier               1124544 C1742F5    9  S 
job-info              1357552 5B9B170    9  S 
job-ingest            1219136 4C12AA0    9  S 
content-sqlite        1130384 DFA6333    9  S content-backing

❯ flux hwloc info
1 Machine, 36 Cores, 72 PUs

❯ ~/Repositories/flux-framework/flux-core/t/ingest/submitbench -r 72 <(flux mini submit -c1 -n1 --dry-run sleep 100)
<snip>

❯ flux jobs -o '{name} {state}' | uniq -c                                                                                                                                                                                                                                                 21:38:27 ()
      1 NAME STATE
     72 sleep RUN

That does not appear to be the case with resource:

❯ flux start

❯ flux module list                                                                                                                                                                                                                                                                        21:30:08 ()
Module                   Size Digest  Idle  S Service
kvs-watch             1299296 2D970AF    6  S 
job-manager           1332792 8B529A1    6  S 
aggregator            1141360 4319017   30  S 
kvs                   1558376 D2EDB0A    0  S 
content-sqlite        1130384 DFA6333    6  S content-backing
userdb                1122616 E537E35   30  S 
job-exec              1276224 93DC36A    6  S 
resource             18210776 3BF88C6    6  S 
cron                  1202976 AC1B9B5    0  S 
connector-local       1110920 A097C9C    0  R 
qmanager              1088552 73E97AA    6  S sched
job-ingest            1219136 4C12AA0   25  S 
barrier               1124544 C1742F5    0  S 
job-info              1357552 5B9B170    6  S 


❯ flux hwloc info                                                                                                                                                                                                                                                                         
1 Machine, 36 Cores, 72 PUs

❯ ~/Repositories/flux-framework/flux-core/t/ingest/submitbench -r 72 <(flux mini submit -c1 -n1 --dry-run sleep 100)
<snip>

❯ flux jobs -o '{name} {state}' | uniq -c
      1 NAME STATE
     36 sleep SCHED
     36 sleep RUN

It seems wrong to me that the same jobspec has such different behavior under the two different schedulers. Do we have a way in flux-sched to enable PU-level scheduling? I'm wondering if this is something that needs to be handled at the flux-mini level? Ultimately, not sure I have many intelligent thoughts on this right now, but I wanted to at least document it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions