Project-MONAI · binliunls · Sep 4, 2024 · Sep 5, 2024 · Sep 5, 2024 · Sep 5, 2024
diff --git a/models/pathology_nuclei_classification/docs/README.md b/models/pathology_nuclei_classification/docs/README.md
@@ -144,7 +144,7 @@ This bundle supports acceleration with TensorRT. The table below displays the sp
 
 | method | torch_fp32(ms) | torch_amp(ms) | trt_fp32(ms) | trt_fp16(ms) | speedup amp | speedup fp32 | speedup fp16 | amp vs fp16|
 | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
-| model computation | 9.99 | 14.14 | 4.62 | 2.37 | 0.71 | 2.16 | 4.22 | 5.97 |
+| model computation | 12.06 | 20.57 | 3.23 | 1.48 | 0.59 | 3.73 | 8.15 | 13.90 |
 | end2end | 412.95 | 408.88 | 351.64 | 286.85 | 1.01 | 1.17 | 1.44 | 1.43 |
 
 Where:
@@ -156,12 +156,12 @@ Where:
 - `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model.
 
 This result is benchmarked under:
- - TensorRT: 8.6.1+cuda12.0
- - Torch-TensorRT Version: 1.4.0
+ - TensorRT: 10.3.0+cuda12.6
+ - Torch-TensorRT Version: 2.5.0
  - CPU Architecture: x86-64
  - OS: ubuntu 20.04
- - Python version:3.8.10
- - CUDA version: 12.1
+ - Python version:3.10.12
+ - CUDA version: 12.6
  - GPU models and configuration: A100 80G
 
 ## MONAI Bundle Commands

diff --git a/models/pathology_nuclei_segmentation_classification/docs/README.md b/models/pathology_nuclei_segmentation_classification/docs/README.md
@@ -93,6 +93,31 @@ stage2:
 
 ![A graph showing the validation mean dice over 50 epochs in stage2](https://developer.download.nvidia.com/assets/Clara/Images/monai_pathology_segmentation_classification_val_stage1_v2.png)
 
+#### TensorRT speedup
+This bundle supports acceleration with TensorRT. The table below displays the speedup ratios observed on an A100 80G GPU.
+
+| method | torch_fp32(ms) | torch_amp(ms) | trt_fp32(ms) | trt_fp16(ms) | speedup amp | speedup fp32 | speedup fp16 | amp vs fp16|
+| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+| model computation | 27.15 | 20.14 | 19.54 | 5.63 | 1.35 | 1.39 | 4.82 | 3.58 |
+| end2end | - |
+
+Where:
+- `model computation` means the speedup ratio of model's inference with a random input without preprocessing and postprocessing
+- `end2end` means run the bundle end-to-end with the TensorRT based model.
+- `torch_fp32` and `torch_amp` are for the PyTorch models with or without `amp` mode.
+- `trt_fp32` and `trt_fp16` are for the TensorRT based models converted in corresponding precision.
+- `speedup amp`, `speedup fp32` and `speedup fp16` are the speedup ratios of corresponding models versus the PyTorch float32 model
+- `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model.
+
+This result is benchmarked under:
+ - TensorRT: 10.3.0+cuda12.6
+ - Torch-TensorRT Version: 2.5.0
+ - CPU Architecture: x86-64
+ - OS: ubuntu 20.04
+ - Python version:3.10.12
+ - CUDA version: 12.6
+ - GPU models and configuration: A100 80G
+
 ## MONAI Bundle Commands
 In addition to the Pythonic APIs, a few command line interfaces (CLI) are provided to interact with the bundle. The CLI supports flexible use cases, such as overriding configs at runtime and predefining arguments in a file.
 

diff --git a/models/swin_unetr_btcv_segmentation/docs/README.md b/models/swin_unetr_btcv_segmentation/docs/README.md
@@ -71,6 +71,31 @@ Dice score was used for evaluating the performance of the model. This model achi
 
 ![A graph showing the validation mean Dice for 5000 epochs.](https://developer.download.nvidia.com/assets/Clara/Images/monai_swin_unetr_btcv_segmentation_val_dice_v2.png)
 
+#### TensorRT speedup
+The `swin_unetr` bundle supports acceleration with TensorRT. The table below displays the speedup ratios observed on an A100 80G GPU.
+
+| method | torch_fp32(ms) | torch_amp(ms) | trt_fp32(ms) | trt_fp16(ms) | speedup amp | speedup fp32 | speedup fp16 | amp vs fp16|
+| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+| model computation | 503.1 | 123.77 | 229.85 | 42.87 | 4.06 | 2.19 | 11.74 | 2.89 |
+| end2end | - | - | - | - | - | - | - | - |
+
+Where:
+- `model computation` means the speedup ratio of model's inference with a random input without preprocessing and postprocessing
+- `end2end` means run the bundle end-to-end with the TensorRT based model.
+- `torch_fp32` and `torch_amp` are for the PyTorch models with or without `amp` mode.
+- `trt_fp32` and `trt_fp16` are for the TensorRT based models converted in corresponding precision.
+- `speedup amp`, `speedup fp32` and `speedup fp16` are the speedup ratios of corresponding models versus the PyTorch float32 model
+- `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model.
+
+This result is benchmarked under:
+ - TensorRT: 10.3.0+cuda12.6
+ - Torch-TensorRT Version: 2.5.0
+ - CPU Architecture: x86-64
+ - OS: ubuntu 20.04
+ - Python version:3.10.12
+ - CUDA version: 12.6
+ - GPU models and configuration: A100 80G
+
 ## MONAI Bundle Commands
 In addition to the Pythonic APIs, a few command line interfaces (CLI) are provided to interact with the bundle. The CLI supports flexible use cases, such as overriding configs at runtime and predefining arguments in a file.
 

diff --git a/models/vista2d/docs/README.md b/models/vista2d/docs/README.md
@@ -74,6 +74,31 @@ Please note that the data used in this config file is: "/cellpose_dataset/test/0
 python -m monai.bundle run --config_file "['configs/inference.json', 'configs/inference_trt.json']"
 ```
 
+#### TensorRT speedup
+The `vista2d` bundle supports acceleration with TensorRT. The table below displays the speedup ratios observed on an A100 80G GPU.
+
+| method | torch_fp32(ms) | torch_amp(ms) | trt_fp32(ms) | trt_fp16(ms) | speedup amp | speedup fp32 | speedup fp16 | amp vs fp16|
+| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+| model computation | 90.11 | 39.68 | 71.7 | 17.32 | 2.27 | 1.26 | 5.20 | 2.29 |
+| end2end | - | - | - | - | - | - | - | - |
+
+Where:
+- `model computation` means the speedup ratio of model's inference with a random input without preprocessing and postprocessing
+- `end2end` means run the bundle end-to-end with the TensorRT based model.
+- `torch_fp32` and `torch_amp` are for the PyTorch models with or without `amp` mode.
+- `trt_fp32` and `trt_fp16` are for the TensorRT based models converted in corresponding precision.
+- `speedup amp`, `speedup fp32` and `speedup fp16` are the speedup ratios of corresponding models versus the PyTorch float32 model
+- `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model.
+
+This result is benchmarked under:
+ - TensorRT: 10.3.0+cuda12.6
+ - Torch-TensorRT Version: 2.5.0
+ - CPU Architecture: x86-64
+ - OS: ubuntu 20.04
+ - Python version:3.10.12
+ - CUDA version: 12.6
+ - GPU models and configuration: A100 80G
+
 ### Execute multi-GPU inference
 ```bash
 torchrun --nproc_per_node=gpu -m monai.bundle run_workflow "scripts.workflow.VistaCell" --config_file configs/hyper_parameters.yaml --mode infer --pretrained_ckpt_name model.pt

diff --git a/models/vista3d/docs/README.md b/models/vista3d/docs/README.md
@@ -44,6 +44,31 @@ In Evaluation Mode: Segmentation
 
 #### Validation Accuracy
 
+#### TensorRT speedup
+The `vista3d` bundle supports acceleration with TensorRT. The table below displays the speedup ratios observed on an A100 80G GPU.
+
+| method | torch_fp32(ms) | torch_amp(ms) | trt_fp32(ms) | trt_fp16(ms) | speedup amp | speedup fp32 | speedup fp16 | amp vs fp16|
+| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+| model computation | 577.00 | 91.90 | 353.69 | 60.02 | 6.28 | 1.63 | 9.58 | 1.53 |
+| end2end | - | - | - | - | - | - | - | - |
+
+Where:
+- `model computation` means the speedup ratio of model's inference with a random input without preprocessing and postprocessing
+- `end2end` means run the bundle end-to-end with the TensorRT based model.
+- `torch_fp32` and `torch_amp` are for the PyTorch models with or without `amp` mode.
+- `trt_fp32` and `trt_fp16` are for the TensorRT based models converted in corresponding precision.
+- `speedup amp`, `speedup fp32` and `speedup fp16` are the speedup ratios of corresponding models versus the PyTorch float32 model
+- `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model.
+
+This result is benchmarked under:
+ - TensorRT: 10.3.0+cuda12.6
+ - Torch-TensorRT Version: 2.5.0
+ - CPU Architecture: x86-64
+ - OS: ubuntu 20.04
+ - Python version:3.10.12
+ - CUDA version: 12.6
+ - GPU models and configuration: A100 80G
+
 ## MONAI Bundle Commands
 In addition to the Pythonic APIs, a few command line interfaces (CLI) are provided to interact with the bundle. The CLI supports flexible use cases, such as overriding configs at runtime and predefining arguments in a file.