Skip to content

Conversation

@ajrasane
Copy link
Contributor

What does this PR do?

Type of change:
Example update

Overview:

  • Optimize the benchmarking function in the diffusers example
python diffusion_trt.py --model flux-dev --benchmark --model-dtype BFloat16 --skip-image --torch

Testing

Backbone-only inference latency (BFloat16):
  Average: 139.48 ms
  P50: 139.36 ms
  P95: 141.13 ms
  P99: 141.35 ms

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: No
  • Did you add or update any necessary documentation?: No
  • Did you update Changelog?: No

@ajrasane ajrasane requested a review from a team as a code owner October 31, 2025 01:18
@ajrasane ajrasane self-assigned this Oct 31, 2025
@ajrasane ajrasane requested a review from cjluo-nv October 31, 2025 01:18
@codecov
Copy link

codecov bot commented Oct 31, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.36%. Comparing base (ca94c96) to head (e4453f7).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #487   +/-   ##
=======================================
  Coverage   74.36%   74.36%           
=======================================
  Files         181      182    +1     
  Lines       18192    18216   +24     
=======================================
+ Hits        13529    13547   +18     
- Misses       4663     4669    +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kevalmorabia97
Copy link
Collaborator

Please make sure to run internal gitlab diffuesrs cicd test to verify they dont break with this change

@ajrasane ajrasane force-pushed the ajrasane/benchmark_diffusers branch from 89f6c25 to 1aafbbc Compare November 7, 2025 19:28
Signed-off-by: ajrasane <[email protected]>
@ajrasane ajrasane force-pushed the ajrasane/benchmark_diffusers branch from 094aa94 to 646458a Compare November 7, 2025 20:05
def forward_hook(_module, _input, _output):
_ = backbone(**dummy_inputs_dict)
end_event.record()
torch.cuda.synchronize()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to call sync here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The synchronization call is needed. Or we run into this error:

RuntimeError: Both events must be completed before calculating elapsed time.

@ajrasane ajrasane enabled auto-merge (squash) November 10, 2025 11:09
@ajrasane ajrasane disabled auto-merge November 10, 2025 11:22
@ajrasane ajrasane requested a review from a team as a code owner November 10, 2025 11:33
@ajrasane ajrasane requested a review from i-riyad November 10, 2025 11:33
@ajrasane ajrasane enabled auto-merge (squash) November 10, 2025 11:34
@ajrasane ajrasane merged commit e74a468 into main Nov 10, 2025
26 checks passed
@ajrasane ajrasane deleted the ajrasane/benchmark_diffusers branch November 10, 2025 13:06
kevalmorabia97 pushed a commit that referenced this pull request Nov 10, 2025
## What does this PR do?

**Type of change:** 
Example update

**Overview:** 
- Optimize the benchmarking function in the diffusers example

```python
python diffusion_trt.py --model flux-dev --benchmark --model-dtype BFloat16 --skip-image --torch
```

## Testing
```
Backbone-only inference latency (BFloat16):
  Average: 139.48 ms
  P50: 139.36 ms
  P95: 141.13 ms
  P99: 141.35 ms
```

## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->

- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes
- **Did you write any new necessary tests?**: No
- **Did you add or update any necessary documentation?**: No
- **Did you update
[Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**:
No

---------

Signed-off-by: ajrasane <[email protected]>
mxinO pushed a commit that referenced this pull request Nov 11, 2025
## What does this PR do?

**Type of change:**
Example update

**Overview:**
- Optimize the benchmarking function in the diffusers example

```python
python diffusion_trt.py --model flux-dev --benchmark --model-dtype BFloat16 --skip-image --torch
```

## Testing
```
Backbone-only inference latency (BFloat16):
  Average: 139.48 ms
  P50: 139.36 ms
  P95: 141.13 ms
  P99: 141.35 ms
```

## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->

- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes
- **Did you write any new necessary tests?**: No
- **Did you add or update any necessary documentation?**: No
- **Did you update
[Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**:
No

---------

Signed-off-by: ajrasane <[email protected]>
Signed-off-by: mxin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants