Skip to content

Commit 759be1c

Browse files
authored
[docs] Update ascend910b-support.md (#816)
Signed-off-by: windsonsea <[email protected]>
1 parent 1e23508 commit 759be1c

File tree

2 files changed

+64
-43
lines changed

2 files changed

+64
-43
lines changed

docs/ascend910b-support.md

Lines changed: 26 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,12 @@
1-
## Introduction
1+
# Introduction to huawei.com/Ascend910 support
22

3-
**We now support huawei.com/Ascend910 by implementing most device-sharing features as nvidia-GPU**, including:
3+
**HAMi now supports huawei.com/Ascend910 by implementing most device-sharing features as nvidia-GPU**, including:
44

5-
***NPU sharing***: Each task can allocate a portion of Ascend NPU instead of a whole NLU card, thus NPU can be shared among multiple tasks.
5+
* **_NPU sharing_**: Each task can allocate a portion of Ascend NPU instead of a whole NLU card, thus NPU can be shared among multiple tasks.
66

7-
***Device Memory Control***: Ascend NPUs can be allocated with certain device memory size and guarantee it that it does not exceed the boundary.
8-
9-
***Device Core Control***: Ascend NPUs can be allocated with certain compute cores and guarantee it that it does not exceed the boundary.
7+
* **_Device Memory Control_**: Ascend NPUs can be allocated with certain device memory size and guarantee it that it does not exceed the boundary.
108

9+
* **_Device Core Control_**: Ascend NPUs can be allocated with certain compute cores and guarantee it that it does not exceed the boundary.
1110

1211
## Prerequisites
1312

@@ -20,29 +19,33 @@
2019
* Install the chart using helm, See 'enabling vGPU support in kubernetes' section [here](https://github.com/Project-HAMi/HAMi#enabling-vgpu-support-in-kubernetes)
2120

2221
* Tag Ascend-910B node with the following command
23-
```
22+
23+
```bash
2424
kubectl label node {ascend-node} accelerator=huawei-Ascend910
2525
```
2626

2727
* Install [Ascend docker runtime](https://gitee.com/ascend/ascend-docker-runtime)
2828

2929
* Download yaml for Ascend-vgpu-device-plugin from HAMi Project [here](https://github.com/Project-HAMi/ascend-device-plugin/blob/master/build/ascendplugin-910-hami.yaml), and deploy
3030

31-
```
31+
```bash
3232
wget https://raw.githubusercontent.com/Project-HAMi/ascend-device-plugin/master/build/ascendplugin-910-hami.yaml
3333
kubectl apply -f ascendplugin-910-hami.yaml
3434
```
3535

3636
## Custom ascend share configuration
37+
3738
HAMi currently has a [built-in share configuration](https://github.com/Project-HAMi/HAMi/blob/master/charts/hami/templates/scheduler/device-configmap.yaml) for ascend.
3839

3940
You can customize the ascend share configuration by following the steps below:
4041

4142
<details>
4243
<summary>customize ascend config</summary>
4344

44-
### Create a new directory files in hami charts, the directory structure is as follows
45-
45+
### Create a new directory in hami charts
46+
47+
The directory structure is as follows:
48+
4649
```bash
4750
tree -L 1
4851
.
@@ -52,11 +55,13 @@ You can customize the ascend share configuration by following the steps below:
5255
└── values.yaml
5356
```
5457

55-
### Create the device-config.yaml file, the content is as follows
58+
### Create device-config.yaml
59+
60+
The content is as follows:
5661

5762
```yaml
5863
vnpus:
59-
- chipName: 910B
64+
- chipName: 910B
6065
commonWord: Ascend910A
6166
resourceName: huawei.com/Ascend910A
6267
resourceMemoryName: huawei.com/Ascend910A-memory
@@ -76,7 +81,7 @@ You can customize the ascend share configuration by following the steps below:
7681
- name: vir16
7782
memory: 17476
7883
aiCore: 16
79-
- chipName: 910B3
84+
- chipName: 910B3
8085
commonWord: Ascend910B
8186
resourceName: huawei.com/Ascend910B
8287
resourceMemoryName: huawei.com/Ascend910B-memory
@@ -93,7 +98,7 @@ You can customize the ascend share configuration by following the steps below:
9398
memory: 32768
9499
aiCore: 10
95100
aiCPU: 3
96-
- chipName: 310P3
101+
- chipName: 310P3
97102
commonWord: Ascend310P
98103
resourceName: huawei.com/Ascend310P
99104
resourceMemoryName: huawei.com/Ascend310P-memory
@@ -115,16 +120,19 @@ You can customize the ascend share configuration by following the steps below:
115120
aiCore: 4
116121
aiCPU: 4
117122
```
118-
### Helm installation and updates will be based on the configuration in this file, overwriting the built-in configuration of Helm
119-
</details>
120123
124+
### Install and update with Helm
125+
126+
Helm installation and updates will be based on the configuration in this file, overwriting the built-in configuration of Helm.
127+
128+
</details>
121129
122130
## Running Ascend jobs
123131
124132
Ascend 910Bs can now be requested by a container
125133
using the `huawei.com/ascend910` and `huawei.com/ascend910-memory` resource type:
126134

127-
```
135+
```yaml
128136
apiVersion: v1
129137
kind: Pod
130138
metadata:
@@ -146,4 +154,4 @@ spec:
146154

147155
1. Ascend-910B-sharing in init container is not supported.
148156

149-
2. `huawei.com/Ascend910-memory` only work when `huawei.com/Ascend910=1`.
157+
1. `huawei.com/Ascend910-memory` only work when `huawei.com/Ascend910=1`.

docs/ascend910b-support_cn.md

Lines changed: 38 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,51 @@
1-
## 简介
1+
# huawei.com/Ascend910 支持简介
22

3-
本组件支持复用华为升腾910B设备,并为此提供以下几种与vGPU类似的复用功能,包括:
3+
HAMi 支持复用华为升腾 910B 设备,并为此提供以下几种与 vGPU 类似的复用功能,包括:
44

5-
*** NPU 共享***: 每个任务可以只占用一部分显卡,多个任务可以共享一张显卡
5+
* **_NPU 共享_**: 每个任务可以只占用一部分显卡,多个任务可以共享一张显卡
66

7-
***可限制分配的显存大小***: 你现在可以用显存值(例如3000M)来分配NPU,本组件会确保任务使用的显存不会超过分配数值
7+
* **_可限制分配的显存大小_**: 你现在可以用显存值(例如 3000M)来分配 NPU,本组件会确保任务使用的显存不会超过分配数值
88

9-
***可限制分配的算力大小***: 你现在可以用百分比来分配 NPU的算力,本组件会确保任务使用的算力不会超过分配数值
9+
* **_可限制分配的算力大小_**: 你现在可以用百分比来分配 NPU 的算力,本组件会确保任务使用的算力不会超过分配数值
1010

1111
## 节点需求
1212

1313
* Ascend docker runtime
1414
* driver version > 24.1.rc1
1515
* Ascend device type: 910B,910B3,310P
1616

17-
## 开启NPU复用
17+
## 开启 NPU 复用
1818

19-
* 通过helm部署本组件, 参照[主文档中的开启vgpu支持章节](https://github.com/Project-HAMi/HAMi/blob/master/README_cn.md#kubernetes开启vgpu支持)
19+
* 通过 helm 部署本组件, 参照[主文档中的开启 vGPU 支持章节](https://github.com/Project-HAMi/HAMi/blob/master/README_cn.md#kubernetes开启vgpu支持)
2020

21-
* 使用以下指令,为Ascend 910B所在节点打上label
22-
```
21+
* 使用以下指令,为 Ascend 910B 所在节点打上 label
22+
23+
```bash
2324
kubectl label node {ascend-node} accelerator=huawei-Ascend910
2425
```
2526

2627
* 部署[Ascend docker runtime](https://gitee.com/ascend/ascend-docker-runtime)
2728

28-
* 从HAMi项目中获取并安装[ascend-device-plugin](https://github.com/Project-HAMi/ascend-device-plugin/blob/master/build/ascendplugin-910-hami.yaml),并进行部署
29+
* 从 HAMi 项目中获取并安装[ascend-device-plugin](https://github.com/Project-HAMi/ascend-device-plugin/blob/master/build/ascendplugin-910-hami.yaml),并进行部署
2930

30-
```
31+
```bash
3132
wget https://raw.githubusercontent.com/Project-HAMi/ascend-device-plugin/master/build/ascendplugin-910-hami.yaml
3233
kubectl apply -f ascendplugin-910-hami.yaml
3334
```
3435

3536
## 自定义 NPU 虚拟化参数
37+
3638
HAMi 目前有一个 NPU 内置[虚拟化配置文件](https://github.com/Project-HAMi/HAMi/blob/master/charts/hami/templates/scheduler/device-configmap.yaml).
3739

3840
当然 HAMi 也支持通过以下方式自定义虚拟化参数:
41+
3942
<details>
4043
<summary>自定义配置</summary>
4144

42-
### 在 HAMi charts 创建 files 的目录,创建后的目录架构应为如下所示
43-
45+
### 在 HAMi charts 创建 files 的目录
46+
47+
创建后的目录架构应为如下所示:
48+
4449
```bash
4550
tree -L 1
4651
.
@@ -50,11 +55,13 @@ HAMi 目前有一个 NPU 内置[虚拟化配置文件](https://github.com/Projec
5055
└── values.yaml
5156
```
5257

53-
### 在 files 目录下创建 Create the device-config.yaml 文件,配置文件如下所示, 可以按需调整
58+
### 在 files 目录下创建 device-config.yaml
59+
60+
配置文件如下所示,可以按需调整:
5461

5562
```yaml
5663
vnpus:
57-
- chipName: 910B
64+
- chipName: 910B
5865
commonWord: Ascend910A
5966
resourceName: huawei.com/Ascend910A
6067
resourceMemoryName: huawei.com/Ascend910A-memory
@@ -74,7 +81,7 @@ HAMi 目前有一个 NPU 内置[虚拟化配置文件](https://github.com/Projec
7481
- name: vir16
7582
memory: 17476
7683
aiCore: 16
77-
- chipName: 910B3
84+
- chipName: 910B3
7885
commonWord: Ascend910B
7986
resourceName: huawei.com/Ascend910B
8087
resourceMemoryName: huawei.com/Ascend910B-memory
@@ -91,7 +98,7 @@ HAMi 目前有一个 NPU 内置[虚拟化配置文件](https://github.com/Projec
9198
memory: 32768
9299
aiCore: 10
93100
aiCPU: 3
94-
- chipName: 310P3
101+
- chipName: 310P3
95102
commonWord: Ascend310P
96103
resourceName: huawei.com/Ascend310P
97104
resourceMemoryName: huawei.com/Ascend310P-memory
@@ -113,13 +120,19 @@ HAMi 目前有一个 NPU 内置[虚拟化配置文件](https://github.com/Projec
113120
aiCore: 4
114121
aiCPU: 4
115122
```
116-
### Helm 安装、更新将基于该配置文件,覆盖默认的配置文件
123+
124+
### Helm 安装和更新
125+
126+
Helm 安装、更新将基于该配置文件,覆盖默认的配置文件
127+
117128
</details>
118129
130+
## 运行 NPU 任务
119131
120-
## 运行NPU任务
132+
现在使用 `huawei.com/ascend910` 和 `huawei.com/ascend910-memory` 资源类型,
133+
可以通过容器来请求 Ascend 910B:
121134

122-
```
135+
```yaml
123136
apiVersion: v1
124137
kind: Pod
125138
metadata:
@@ -131,14 +144,14 @@ spec:
131144
command: ["bash", "-c", "sleep 86400"]
132145
resources:
133146
limits:
134-
huawei.com/Ascend910: 1 # requesting 1 vGPUs
135-
huawei.com/Ascend910-memory: 2000 # requesting 2000m device memory
147+
huawei.com/Ascend910: 1 # 请求 1 个 vGPU
148+
huawei.com/Ascend910-memory: 2000 # 请求 2000m 设备内容
136149
```
137150

138151
## 注意事项
139152

140-
1. 目前Ascend910B设备,只支持2种粒度的切分,分别是1/4卡和1/2卡,分配的显存会自动对齐到在分配额之上最近的粒度上
153+
1. 目前 Ascend910B 设备,只支持 2 种粒度的切分,分别是 1/4 卡和 1/2 卡,分配的显存会自动对齐到在分配额之上最近的粒度上
141154

142-
2. 在init container中无法使用NPU复用功能
155+
2. 在 init container 中无法使用 NPU 复用功能
143156

144-
3. 只有申请单MLU的任务可以指定显存`Ascend910-memory`的数值,若申请的NPU数量大于1,则所有申请的NPU都会被整卡分配
157+
3. 只有申请单 MLU 的任务可以指定显存 `Ascend910-memory` 的数值,若申请的 NPU 数量大于 1,则所有申请的 NPU 都会被整卡分配

0 commit comments

Comments
 (0)