Skip to content

[Bug] when turn on deviceshare.AscendHAMiVNPUEnable, only one type of NPU will be scheduled #4778

@kiritoxkiriko

Description

@kiritoxkiriko

Description

If a cluster has more than one type of NPU, eg. both Ascend910B3 and Ascend910B4 in same cluster.
Only workloads using certain NPU type will be scheduled, the pods using other NPUs will Pending forever

Steps to reproduce the issue

  1. prepared a cluster has more than one NPU type, in my case, i have two nodes:

    • NodeA: Ascend910B3 * 8
    • NodeB: Ascend910B4 * 8

    use volcano scheduler in master branch, and turned on vNPU HAMi mode according to this doc

  2. prepared two workload

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: npu-test-deployment-A
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: npu-test
      template:
        metadata:
          labels:
            app: npu-test
        spec:
          containers:
          - name: finetune-training-container
            image: swr.cn-south-1.myhuaweicloud.com/ascendhub/ascend-pytorch:24.0.RC1-A2-1.11.0-ubuntu20.04
            imagePullPolicy: Always
            args: [ "sleep", "infinity" ]
            resources:
              limits:
                huawei.com/Ascend910B3: 1
                huawei.com/Ascend910B3-memory: 32768
          restartPolicy: Always
          schedulerName: volcano
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: npu-test-deployment-B
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: npu-test
      template:
        metadata:
          labels:
            app: npu-test
        spec:
          containers:
          - name: finetune-training-container
            image: swr.cn-south-1.myhuaweicloud.com/ascendhub/ascend-pytorch:24.0.RC1-A2-1.11.0-ubuntu20.04
            imagePullPolicy: Always
            args: [ "sleep", "infinity" ]
            resources:
              limits:
                huawei.com/Ascend910B4: 1
                huawei.com/Ascend910B4-memory: 32768
          restartPolicy: Always
          schedulerName: volcano

    these two deployments has same spec except card type

  3. apply these yaml, only npu-test-deployment-B's pod (which use Ascend910B4) will pending forever.
    After checking correlated podgroup's event, it show these warnings:

    ----     ------         ----                    ----     -------
    Normal   Unschedulable  24m (x2 over 24m)       volcano  resource in cluster is overused: overused huawei.com/Ascend910B4-memory
    Warning  Unschedulable  4m39s (x1186 over 24m)  volcano  1/1 tasks in gang unschedulable: pod group is not ready, 1 Pending, 1 minAvailable; Pending: 1 Unschedulable

Describe the results you received and expected

The workload using NPUs other than Ascend910B3 will Pending forever.

And the warning in podgroup should not exist since huawei.com/Ascend910B4-memory should inside ignore list.

What version of Volcano are you using?

latest

Any other relevant information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions