Commit ec8cb01
authored
Moe bf16 ep (InternLM#4144)
* refactor pytorch.nn.moe
* add ep support
* fix tp
* support blocked fp8 moe with split_size<world_size
* unit test allow both fa3 and fa
* add singleton
* singleton and ctxmgrbase
* comment
* add static
* remove chunk
* remove forward dptp
* bound check
* remove monkey patch
* rename kernel1 parent dc28b85 commit ec8cb01
File tree
32 files changed
+2601
-2234
lines changed- lmdeploy/pytorch
- backends
- cuda
- moe
- default
- dlinfer
- devices
- kernels/cuda
- nn
- moe
- third_party/deep_gemm
- weight_loader
- tests/pytorch
- engine
- kernel
32 files changed
+2601
-2234
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | | - | |
| 8 | + | |
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
| |||
256 | 256 | | |
257 | 257 | | |
258 | 258 | | |
259 | | - | |
| 259 | + | |
260 | 260 | | |
261 | 261 | | |
262 | 262 | | |
| |||
0 commit comments