You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Make wheel build manual CI job and diffusers test fix (#529)
- Allow wheel build and release manual without depending on test status
(sometimes nmm-sandbox tests fail because of unavailable slurm machines)
Signed-off-by: Keval Morabia <[email protected]>
Copy file name to clipboardExpand all lines: CHANGELOG.rst
+2-9Lines changed: 2 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,13 +1,5 @@
1
1
Model Optimizer Changelog (Linux)
2
2
=================================
3
-
0.41 (2025-12-xx)
4
-
^^^^^^^^^^^^^^^^^
5
-
6
-
**Deprecations**
7
-
8
-
**New Features**
9
-
- Add FP8/NVFP4 KV cache quantization support for Megatron Core models.
10
-
11
3
12
4
0.40 (2025-12-xx)
13
5
^^^^^^^^^^^^^^^^^
@@ -20,8 +12,9 @@ Model Optimizer Changelog (Linux)
20
12
21
13
- Add MoE (e.g. Qwen3-30B-A3B) pruning support for ``num_moe_experts``, ``moe_ffn_hidden_size`` and ``moe_shared_expert_intermediate_size`` parameters in Minitron pruning (``mcore_minitron``).
22
14
- Add ``specdec_bench`` example to benchmark speculative decoding performance. See `examples/specdec_bench/README.md <https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/specdec_bench#speculative-decoding-benchmark>`_ for more details.
15
+
- Add FP8/NVFP4 KV cache quantization support for Megatron Core models.
0 commit comments