From 144cdb74b8b014489d12c63d4a9014196b955086 Mon Sep 17 00:00:00 2001
From: binliu <binliu@nvidia.com>
Date: Wed, 4 Sep 2024 15:01:37 +0000
Subject: [PATCH 1/5] update model inference benchmark for vista3d

Signed-off-by: binliu <binliu@nvidia.com>
---
 models/vista3d/docs/README.md | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/models/vista3d/docs/README.md b/models/vista3d/docs/README.md
index 741dbcb1..cf9fd862 100644
--- a/models/vista3d/docs/README.md
+++ b/models/vista3d/docs/README.md
@@ -44,6 +44,31 @@ In Evaluation Mode: Segmentation
 
 #### Validation Accuracy
 
+#### TensorRT speedup
+The `vista3d` bundle supports acceleration with TensorRT. The table below displays the speedup ratios observed on an A100 80G GPU.
+
+| method | torch_fp32(ms) | torch_amp(ms) | trt_fp32(ms) | trt_fp16(ms) | speedup amp | speedup fp32 | speedup fp16 | amp vs fp16|
+| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+| model computation | 577.00 | 91.90 | 353.69 | 60.02 | 6.28 | 1.63 | 9.58 | 1.53 |
+| end2end | - | - | - | - | - | - | - | - |
+
+Where:
+- `model computation` means the speedup ratio of model's inference with a random input without preprocessing and postprocessing
+- `end2end` means run the bundle end-to-end with the TensorRT based model.
+- `torch_fp32` and `torch_amp` are for the PyTorch models with or without `amp` mode.
+- `trt_fp32` and `trt_fp16` are for the TensorRT based models converted in corresponding precision.
+- `speedup amp`, `speedup fp32` and `speedup fp16` are the speedup ratios of corresponding models versus the PyTorch float32 model
+- `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model.
+
+This result is benchmarked under:
+ - TensorRT: 10.3.0+cuda12.6
+ - Torch-TensorRT Version: 2.5.0
+ - CPU Architecture: x86-64
+ - OS: ubuntu 20.04
+ - Python version:3.10.12
+ - CUDA version: 12.6
+ - GPU models and configuration: A100 80G
+
 ## MONAI Bundle Commands
 In addition to the Pythonic APIs, a few command line interfaces (CLI) are provided to interact with the bundle. The CLI supports flexible use cases, such as overriding configs at runtime and predefining arguments in a file.
 

From cbadb6867d6c64a0dd73f78dc030a6634830dedc Mon Sep 17 00:00:00 2001
From: binliu <binliu@nvidia.com>
Date: Thu, 5 Sep 2024 15:53:14 +0000
Subject: [PATCH 2/5] add vista2d model benchmark

Signed-off-by: binliu <binliu@nvidia.com>
---
 models/vista2d/docs/README.md | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/models/vista2d/docs/README.md b/models/vista2d/docs/README.md
index a3a902e4..ca33a487 100644
--- a/models/vista2d/docs/README.md
+++ b/models/vista2d/docs/README.md
@@ -74,6 +74,31 @@ Please note that the data used in this config file is: "/cellpose_dataset/test/0
 python -m monai.bundle run --config_file "['configs/inference.json', 'configs/inference_trt.json']"
 ```
 
+#### TensorRT speedup
+The `vista2d` bundle supports acceleration with TensorRT. The table below displays the speedup ratios observed on an A100 80G GPU.
+
+| method | torch_fp32(ms) | torch_amp(ms) | trt_fp32(ms) | trt_fp16(ms) | speedup amp | speedup fp32 | speedup fp16 | amp vs fp16|
+| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+| model computation | 90.11 | 39.68 | 71.7 | 17.32 | 2.27 | 1.26 | 5.20 | 2.29 |
+| end2end | - | - | - | - | - | - | - | - |
+
+Where:
+- `model computation` means the speedup ratio of model's inference with a random input without preprocessing and postprocessing
+- `end2end` means run the bundle end-to-end with the TensorRT based model.
+- `torch_fp32` and `torch_amp` are for the PyTorch models with or without `amp` mode.
+- `trt_fp32` and `trt_fp16` are for the TensorRT based models converted in corresponding precision.
+- `speedup amp`, `speedup fp32` and `speedup fp16` are the speedup ratios of corresponding models versus the PyTorch float32 model
+- `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model.
+
+This result is benchmarked under:
+ - TensorRT: 10.3.0+cuda12.6
+ - Torch-TensorRT Version: 2.5.0
+ - CPU Architecture: x86-64
+ - OS: ubuntu 20.04
+ - Python version:3.10.12
+ - CUDA version: 12.6
+ - GPU models and configuration: A100 80G
+
 ### Execute multi-GPU inference
 ```bash
 torchrun --nproc_per_node=gpu -m monai.bundle run_workflow "scripts.workflow.VistaCell" --config_file configs/hyper_parameters.yaml --mode infer --pretrained_ckpt_name model.pt

From 23a73d652970c8b23a3cf31b89fbc3ae993a3c06 Mon Sep 17 00:00:00 2001
From: binliu <binliu@nvidia.com>
Date: Thu, 5 Sep 2024 15:57:34 +0000
Subject: [PATCH 3/5] add swin model benchmark

Signed-off-by: binliu <binliu@nvidia.com>
---
 .../docs/README.md                            | 25 +++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/models/swin_unetr_btcv_segmentation/docs/README.md b/models/swin_unetr_btcv_segmentation/docs/README.md
index ae25b6c9..828f705c 100644
--- a/models/swin_unetr_btcv_segmentation/docs/README.md
+++ b/models/swin_unetr_btcv_segmentation/docs/README.md
@@ -71,6 +71,31 @@ Dice score was used for evaluating the performance of the model. This model achi
 
 ![A graph showing the validation mean Dice for 5000 epochs.](https://developer.download.nvidia.com/assets/Clara/Images/monai_swin_unetr_btcv_segmentation_val_dice_v2.png)
 
+#### TensorRT speedup
+The `swin_unetr` bundle supports acceleration with TensorRT. The table below displays the speedup ratios observed on an A100 80G GPU.
+
+| method | torch_fp32(ms) | torch_amp(ms) | trt_fp32(ms) | trt_fp16(ms) | speedup amp | speedup fp32 | speedup fp16 | amp vs fp16|
+| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+| model computation | 503.1 | 123.77 | 229.85 | 42.87 | 4.06 | 2.19 | 11.74 | 2.89 |
+| end2end | - | - | - | - | - | - | - | - |
+
+Where:
+- `model computation` means the speedup ratio of model's inference with a random input without preprocessing and postprocessing
+- `end2end` means run the bundle end-to-end with the TensorRT based model.
+- `torch_fp32` and `torch_amp` are for the PyTorch models with or without `amp` mode.
+- `trt_fp32` and `trt_fp16` are for the TensorRT based models converted in corresponding precision.
+- `speedup amp`, `speedup fp32` and `speedup fp16` are the speedup ratios of corresponding models versus the PyTorch float32 model
+- `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model.
+
+This result is benchmarked under:
+ - TensorRT: 10.3.0+cuda12.6
+ - Torch-TensorRT Version: 2.5.0
+ - CPU Architecture: x86-64
+ - OS: ubuntu 20.04
+ - Python version:3.10.12
+ - CUDA version: 12.6
+ - GPU models and configuration: A100 80G
+
 ## MONAI Bundle Commands
 In addition to the Pythonic APIs, a few command line interfaces (CLI) are provided to interact with the bundle. The CLI supports flexible use cases, such as overriding configs at runtime and predefining arguments in a file.
 

From cad8db5ef3ca2b0e2a067a13053d26e338d5ece7 Mon Sep 17 00:00:00 2001
From: binliu <binliu@nvidia.com>
Date: Thu, 5 Sep 2024 16:01:12 +0000
Subject: [PATCH 4/5] add model benchmark for pathology_nuclei_classification

Signed-off-by: binliu <binliu@nvidia.com>
---
 models/pathology_nuclei_classification/docs/README.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/models/pathology_nuclei_classification/docs/README.md b/models/pathology_nuclei_classification/docs/README.md
index 125c875a..ea1ab985 100644
--- a/models/pathology_nuclei_classification/docs/README.md
+++ b/models/pathology_nuclei_classification/docs/README.md
@@ -144,7 +144,7 @@ This bundle supports acceleration with TensorRT. The table below displays the sp
 
 | method | torch_fp32(ms) | torch_amp(ms) | trt_fp32(ms) | trt_fp16(ms) | speedup amp | speedup fp32 | speedup fp16 | amp vs fp16|
 | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
-| model computation | 9.99 | 14.14 | 4.62 | 2.37 | 0.71 | 2.16 | 4.22 | 5.97 |
+| model computation | 12.06 | 20.57 | 3.23 | 1.48 | 0.59 | 3.73 | 8.15 | 13.90 |
 | end2end | 412.95 | 408.88 | 351.64 | 286.85 | 1.01 | 1.17 | 1.44 | 1.43 |
 
 Where:
@@ -156,12 +156,12 @@ Where:
 - `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model.
 
 This result is benchmarked under:
- - TensorRT: 8.6.1+cuda12.0
- - Torch-TensorRT Version: 1.4.0
+ - TensorRT: 10.3.0+cuda12.6
+ - Torch-TensorRT Version: 2.5.0
  - CPU Architecture: x86-64
  - OS: ubuntu 20.04
- - Python version:3.8.10
- - CUDA version: 12.1
+ - Python version:3.10.12
+ - CUDA version: 12.6
  - GPU models and configuration: A100 80G
 
 ## MONAI Bundle Commands

From 7e4d8bffa128314720cb956c0c87e7ed2790d0a8 Mon Sep 17 00:00:00 2001
From: binliu <binliu@nvidia.com>
Date: Thu, 5 Sep 2024 16:05:01 +0000
Subject: [PATCH 5/5] add model benchmark for
 pathology_nuclei_classification_segmentation

Signed-off-by: binliu <binliu@nvidia.com>
---
 .../docs/README.md                            | 25 +++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/models/pathology_nuclei_segmentation_classification/docs/README.md b/models/pathology_nuclei_segmentation_classification/docs/README.md
index c9dd9b83..a9987423 100644
--- a/models/pathology_nuclei_segmentation_classification/docs/README.md
+++ b/models/pathology_nuclei_segmentation_classification/docs/README.md
@@ -93,6 +93,31 @@ stage2:
 
 ![A graph showing the validation mean dice over 50 epochs in stage2](https://developer.download.nvidia.com/assets/Clara/Images/monai_pathology_segmentation_classification_val_stage1_v2.png)
 
+#### TensorRT speedup
+This bundle supports acceleration with TensorRT. The table below displays the speedup ratios observed on an A100 80G GPU.
+
+| method | torch_fp32(ms) | torch_amp(ms) | trt_fp32(ms) | trt_fp16(ms) | speedup amp | speedup fp32 | speedup fp16 | amp vs fp16|
+| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+| model computation | 27.15 | 20.14 | 19.54 | 5.63 | 1.35 | 1.39 | 4.82 | 3.58 |
+| end2end | - |
+
+Where:
+- `model computation` means the speedup ratio of model's inference with a random input without preprocessing and postprocessing
+- `end2end` means run the bundle end-to-end with the TensorRT based model.
+- `torch_fp32` and `torch_amp` are for the PyTorch models with or without `amp` mode.
+- `trt_fp32` and `trt_fp16` are for the TensorRT based models converted in corresponding precision.
+- `speedup amp`, `speedup fp32` and `speedup fp16` are the speedup ratios of corresponding models versus the PyTorch float32 model
+- `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model.
+
+This result is benchmarked under:
+ - TensorRT: 10.3.0+cuda12.6
+ - Torch-TensorRT Version: 2.5.0
+ - CPU Architecture: x86-64
+ - OS: ubuntu 20.04
+ - Python version:3.10.12
+ - CUDA version: 12.6
+ - GPU models and configuration: A100 80G
+
 ## MONAI Bundle Commands
 In addition to the Pythonic APIs, a few command line interfaces (CLI) are provided to interact with the bundle. The CLI supports flexible use cases, such as overriding configs at runtime and predefining arguments in a file.