Compress tutorial (PoC) #492

danielkorzekwa · 2025-11-03T13:56:41Z

What does this PR do?

Compress tutorial (PoC) + compress cli app.

using MIP-based NAS search algorithm. Signed-off-by: Daniel Korzekwa <[email protected]>

Signed-off-by: Daniel Korzekwa <[email protected]>

…ation. Signed-off-by: Daniel Korzekwa <[email protected]>

Signed-off-by: Daniel Korzekwa <[email protected]>

…ress module. Signed-off-by: Daniel Korzekwa <[email protected]>

…ntal/ folder to not be run by CICD yet. Signed-off-by: Daniel Korzekwa <[email protected]>

Signed-off-by: Keval Morabia <[email protected]>

…tmp_path. Signed-off-by: Daniel Korzekwa <[email protected]>

Signed-off-by: Daniel Korzekwa <[email protected]>

…thm. Signed-off-by: Daniel Korzekwa <[email protected]>

Signed-off-by: Daniel Korzekwa <[email protected]>

…o_decilm_convertion

…as_convert Signed-off-by: Daniel Korzekwa <[email protected]>

Signed-off-by: Daniel Korzekwa <[email protected]>

…as_convert

Signed-off-by: Daniel Korzekwa <[email protected]>

codecov · 2025-11-03T14:11:14Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.40%. Comparing base (1c12fd8) to head (25b4aed).

Additional details and impacted files

@@                Coverage Diff                @@
##           feature/compress     #492   +/-   ##
=================================================
  Coverage             73.40%   73.40%           
=================================================
  Files                   180      180           
  Lines                 18127    18127           
=================================================
  Hits                  13306    13306           
  Misses                 4821     4821

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

## What does this PR do? **Type of change:** Documentation **Overview:** Updated the tutorial with more details on how to choose the required config parameters and added MMLU evaluation. --------- Signed-off-by: Liana Mikaelyan <[email protected]>

AAnoosheh · 2025-11-06T13:51:22Z

modelopt/torch/_compress/dataset/prepare_dataset.py

@@ -0,0 +1,64 @@
+# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.


Why does this need to be its own file?

it is how it was design, any suggestions?

We can rename from modelopt/torch/_compress/dataset/prepare_dataset.py to modelopt/torch/_compress/utils/dataset_utils.py and later unify with https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/modelopt/torch/utils/dataset_utils.py

We already have nemotron-post-training-dataset-v2 supported in modelopt/torch/utils/dataset_utils.py so ideally we should be able to just used that

It seems made for the Nemotron post-training dataset rather than being generic. Which file even uses this?

AAnoosheh · 2025-11-06T13:54:28Z

modelopt/torch/_compress/dateutils.py

@@ -0,0 +1,41 @@
+# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.


@kevalmorabia97 Would we want to adopt such timestamped logger for ModelOpt globally?

logging approach should one across modelopt, I made a little one for now, we should use some standard python logging library. Created internal Nvidia modelopt issue: issues/40

kevalmorabia97 · 2025-11-07T09:54:07Z

examples/compress/main.py

+            config={},  # this is not used as the search space is defined in the hydra config
+        )
+
+        print(timestamped("Compress Progress 8/8: compression pipeline completed (multi-gpu)"))


How about we just make the print part of the function def print_timestamped("")

kevalmorabia97 · 2025-11-07T10:10:42Z

examples/compress/main.py

+        )
+
+        # mip_and_realize_models (distributed processing)
+        # TODO: How to make it part of mnt.search() api, similarly to run_full_compress() API


I think this can be improved once everything is self contained in modelopt. We dont need separate function for mip_only. We can re-run same run_full_compress but internally for each sub-step, it should check if checkpoint already exists and skip that step.

This generic solution will also help in other cases where whole compress pipeline takes too long and we want to resume from some intermediate step

kevalmorabia97 · 2025-11-07T10:18:34Z

modelopt/torch/_compress/dataset/prepare_dataset.py

@@ -0,0 +1,64 @@
+# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.


We can rename from modelopt/torch/_compress/dataset/prepare_dataset.py to modelopt/torch/_compress/utils/dataset_utils.py and later unify with https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/modelopt/torch/utils/dataset_utils.py

We already have nemotron-post-training-dataset-v2 supported in modelopt/torch/utils/dataset_utils.py so ideally we should be able to just used that

kevalmorabia97 · 2025-11-07T10:25:31Z

modelopt/torch/_compress/dateutils.py

+    return datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
+
+
+def timestamped(message: str) -> str:


this can be changed to print_timestamped() and moved to https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/modelopt/torch/utils/logging.py

Ideally configuring standard python logger to do this would be better than custom timestamped message. See this example of using logging: https://stackoverflow.com/a/44175370

@danielkorzekwa Can all the print(timestamped(...)) be refactored as logging.info(...) and we just adjust the standard logger later?

kevalmorabia97 · 2025-11-07T10:28:18Z

examples/compress/README.md

+The supported modifications are: 
+
+- `ffn_intermediate_size`: different FFN intermediate sizes
+- `attention op/noop`: complete removal of attention layers


Didnt we decide to keep PoC just ffn pruning and no attn module replacement?

kevalmorabia97 · 2025-11-07T10:31:35Z

examples/compress/README.md

+
+   **_NOTE:_**
+   How to choose `intermediate_size_list`? 
+   The list specifies the candidate FFN sizes that we wish to search over. It is recommended to choose several pruning sizes (e.g. 15%, 20%, 30% etc of the original). Note that the values must be hardware-friendly (divisible by a multiple of 2) to avoid issues with tensor operations in subsequent steps. 


Let's recommend divisible by 64 instead? FFN value is 14336 so having only multiples of 64 in search space should be more than enougha no?

kevalmorabia97 · 2025-11-07T10:51:55Z

examples/compress/README.md

+
+   ```bash
+   ...
+   block_0:   attention  gqa_4   ffn  intermediate_14336


GQA4 will only work with TP4 if training in Megatron-fw. Maybe deployment also but I dont know for sure. Should we remove GQA pruning from search space?

kevalmorabia97 · 2025-11-07T10:52:33Z

examples/compress/README.md

+   ```bash
+   ...
+   block_0:   attention  gqa_4   ffn  intermediate_14336
+   block_1:   attention  gqa_4   ffn  intermediate_14336


Why is no ffn being pruned here? Is it because we use memory target and attention takes more memory so its pruned first by puzzle?

kevalmorabia97 · 2025-11-07T10:54:36Z

examples/compress/README.md

+```bash
+lm_eval --model hf \
+  --model_args pretrained=path/to/model,dtype=bfloat16,trust_remote_code=true,parallelize=True \
+  --tasks mmlu_humanities \


why mmlu_humanities instead of generic mmlu?

kevalmorabia97 · 2025-11-07T10:56:30Z

examples/compress/Dockerfile

Why do we need a dockerfile? If puzzle is self contained modelopt and dependencies are in setup.py then install modelopt will install everything needed and then we just need users to use a trtllm docker image withou any custom dockerfile or docker build step

kevalmorabia97 · 2025-11-07T10:59:35Z

Bunch of code quality checks are also failing

danielkorzekwa and others added 30 commits October 27, 2025 11:50

The main compression function for a model

c758ad5

using MIP-based NAS search algorithm. Signed-off-by: Daniel Korzekwa <[email protected]>

Code formatting

8af9903

Signed-off-by: Daniel Korzekwa <[email protected]>

Model search space configuration used by test_compress.py test.

5ba6c27

Signed-off-by: Daniel Korzekwa <[email protected]>

Tokenizer used by test_compress.py test.

0bc5d84

Signed-off-by: Daniel Korzekwa <[email protected]>

Tokenizer utility used by test_compress.py test

87d4fa5

Signed-off-by: Daniel Korzekwa <[email protected]>

e2e tests for compress.py

ced1e99

Signed-off-by: Daniel Korzekwa <[email protected]>

Add convert_llama3_config_to_decilm_config + unit test

5de0bdc

Signed-off-by: Daniel Korzekwa <[email protected]>

Remove unused bypass distillation config files.

800414c

Signed-off-by: Daniel Korzekwa <[email protected]>

Moving integration tests to tests/experimental to not trigger CICD

16abcc9

Signed-off-by: Daniel Korzekwa <[email protected]>

update docs

a5ba1c7

Signed-off-by: Daniel Korzekwa <[email protected]>

Replace mprint with print and replace osp.join with path1 / path2 not…

1bda391

…ation. Signed-off-by: Daniel Korzekwa <[email protected]>

Refactor file checking assertions to use .is_file() and .exists()

bb38401

Signed-off-by: Daniel Korzekwa <[email protected]>

Add a new dependency section to setyp.py for the modelopt.torch._comp…

8415548

…ress module. Signed-off-by: Daniel Korzekwa <[email protected]>

Move test_convert_llama3_config_to_decilm_config.py to tests/experime…

b1b1833

…ntal/ folder to not be run by CICD yet. Signed-off-by: Daniel Korzekwa <[email protected]>

Merge branch 'feature/compress' into dkorzekwa/e2e_compression_test

d4ffc91

Fix: Add missing LICENSE headers

6f28e4a

Signed-off-by: Keval Morabia <[email protected]>

Use spawn_multiprocess_job for test_compress test (to be able to use …

016fb63

…tmp_path. Signed-off-by: Daniel Korzekwa <[email protected]>

Add comments.

0ccf1c4

Signed-off-by: Daniel Korzekwa <[email protected]>

Add _save_dummy_dataset to the test_compress.py

58439ca

Signed-off-by: Daniel Korzekwa <[email protected]>

Refactoring: Move torch distributed env variables to dist_utils.py

2e5f776

Signed-off-by: Daniel Korzekwa <[email protected]>

Refactoring: move torch distributed variables to dist_utils

6274db5

Signed-off-by: Daniel Korzekwa <[email protected]>

Move os.environ["WANDB_DISABLED"] = "true" to dist_utils.py

d942e0a

Signed-off-by: Daniel Korzekwa <[email protected]>

Implement integration test for mnt.convert() for the _compress algori…

f765921

…thm. Signed-off-by: Daniel Korzekwa <[email protected]>

Implement mtn.convert() for compress() algorithm.

de876d6

Signed-off-by: Daniel Korzekwa <[email protected]>

Merge branch 'dkorzekwa/e2e_compression_test' into dkorzekwa/llama3_t…

72bdc7a

…o_decilm_convertion

Merge branch 'dkorzekwa/llama3_to_decilm_convertion' into dkorzekwa/n…

40d28af

…as_convert Signed-off-by: Daniel Korzekwa <[email protected]>

Fix broken test - incorrect package names.

f7fe23c

Signed-off-by: Daniel Korzekwa <[email protected]>

Merge branch 'dkorzekwa/llama3_to_decilm_convertion' into dkorzekwa/n…

3d1d286

…as_convert

Implementing nas.convert for compress algorithm.

a210483

Signed-off-by: Daniel Korzekwa <[email protected]>

Improve docs

739f868

Signed-off-by: Daniel Korzekwa <[email protected]>

danielkorzekwa added 20 commits November 1, 2025 21:27

Update docs

64b33e2

Signed-off-by: Daniel Korzekwa <[email protected]>

Update compress tutorial

21a602c

Signed-off-by: Daniel Korzekwa <[email protected]>

Update compress tutorial ffn search space

9a381fe

Signed-off-by: Daniel Korzekwa <[email protected]>

Update tutorial

c47e0af

Signed-off-by: Daniel Korzekwa <[email protected]>

Implement mip_only mode.

ce8d53a

Signed-off-by: Daniel Korzekwa <[email protected]>

Improve logging. Convert HF to DeciLM checkpoint only once (single-gpu)

c754419

Signed-off-by: Daniel Korzekwa <[email protected]>

update docs

6505631

Signed-off-by: Daniel Korzekwa <[email protected]>

Update compress tutorial with --mip_only part.

734c32c

Signed-off-by: Daniel Korzekwa <[email protected]>

Update docs

ee14792

Signed-off-by: Daniel Korzekwa <[email protected]>

Update tutorial llama config file.

5dca0aa

Signed-off-by: Daniel Korzekwa <[email protected]>

Update compress tutorial

5454c59

Signed-off-by: Daniel Korzekwa <[email protected]>

Update docs

b3fd9df

Signed-off-by: Daniel Korzekwa <[email protected]>

Update compress setting to increase the number of eval samples.

d4ed34a

Signed-off-by: Daniel Korzekwa <[email protected]>

Update compress tutorial

9979872

Signed-off-by: Daniel Korzekwa <[email protected]>

Update tutorial

8cb50d4

Signed-off-by: Daniel Korzekwa <[email protected]>

Update tutorial.

2856ca1

Signed-off-by: Daniel Korzekwa <[email protected]>

Update compress tutorial.

553107a

Signed-off-by: Daniel Korzekwa <[email protected]>

Add Dockerfile for the compress tutorial

3917a78

Signed-off-by: Daniel Korzekwa <[email protected]>

Update compress tutorial

6e1d910

Signed-off-by: Daniel Korzekwa <[email protected]>

Merge branch 'feature/compress' into dkorzekwa/compress_tutorial

25b4aed

Signed-off-by: Daniel Korzekwa <[email protected]>

danielkorzekwa requested review from a team as code owners November 3, 2025 13:56

danielkorzekwa requested review from kevalmorabia97 and removed request for a team November 3, 2025 13:56

LianaMikael requested a review from a team as a code owner November 4, 2025 10:03

AAnoosheh reviewed Nov 6, 2025

View reviewed changes

kevalmorabia97 reviewed Nov 7, 2025

View reviewed changes

		@@ -0,0 +1,64 @@
		# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

		@@ -0,0 +1,41 @@
		# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

		return datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")


		def timestamped(message: str) -> str:

Compress tutorial (PoC) #492

Are you sure you want to change the base?

Compress tutorial (PoC) #492

Uh oh!

Conversation

danielkorzekwa commented Nov 3, 2025

What does this PR do?

Uh oh!

codecov bot commented Nov 3, 2025

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kevalmorabia97 Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kevalmorabia97 commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kevalmorabia97 Nov 7, 2025 •

edited

Loading