From 144cdb74b8b014489d12c63d4a9014196b955086 Mon Sep 17 00:00:00 2001 From: binliu Date: Wed, 4 Sep 2024 15:01:37 +0000 Subject: [PATCH 1/5] update model inference benchmark for vista3d Signed-off-by: binliu --- models/vista3d/docs/README.md | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/models/vista3d/docs/README.md b/models/vista3d/docs/README.md index 741dbcb1..cf9fd862 100644 --- a/models/vista3d/docs/README.md +++ b/models/vista3d/docs/README.md @@ -44,6 +44,31 @@ In Evaluation Mode: Segmentation #### Validation Accuracy +#### TensorRT speedup +The `vista3d` bundle supports acceleration with TensorRT. The table below displays the speedup ratios observed on an A100 80G GPU. + +| method | torch_fp32(ms) | torch_amp(ms) | trt_fp32(ms) | trt_fp16(ms) | speedup amp | speedup fp32 | speedup fp16 | amp vs fp16| +| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | +| model computation | 577.00 | 91.90 | 353.69 | 60.02 | 6.28 | 1.63 | 9.58 | 1.53 | +| end2end | - | - | - | - | - | - | - | - | + +Where: +- `model computation` means the speedup ratio of model's inference with a random input without preprocessing and postprocessing +- `end2end` means run the bundle end-to-end with the TensorRT based model. +- `torch_fp32` and `torch_amp` are for the PyTorch models with or without `amp` mode. +- `trt_fp32` and `trt_fp16` are for the TensorRT based models converted in corresponding precision. +- `speedup amp`, `speedup fp32` and `speedup fp16` are the speedup ratios of corresponding models versus the PyTorch float32 model +- `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model. + +This result is benchmarked under: + - TensorRT: 10.3.0+cuda12.6 + - Torch-TensorRT Version: 2.5.0 + - CPU Architecture: x86-64 + - OS: ubuntu 20.04 + - Python version:3.10.12 + - CUDA version: 12.6 + - GPU models and configuration: A100 80G + ## MONAI Bundle Commands In addition to the Pythonic APIs, a few command line interfaces (CLI) are provided to interact with the bundle. The CLI supports flexible use cases, such as overriding configs at runtime and predefining arguments in a file. From cbadb6867d6c64a0dd73f78dc030a6634830dedc Mon Sep 17 00:00:00 2001 From: binliu Date: Thu, 5 Sep 2024 15:53:14 +0000 Subject: [PATCH 2/5] add vista2d model benchmark Signed-off-by: binliu --- models/vista2d/docs/README.md | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/models/vista2d/docs/README.md b/models/vista2d/docs/README.md index a3a902e4..ca33a487 100644 --- a/models/vista2d/docs/README.md +++ b/models/vista2d/docs/README.md @@ -74,6 +74,31 @@ Please note that the data used in this config file is: "/cellpose_dataset/test/0 python -m monai.bundle run --config_file "['configs/inference.json', 'configs/inference_trt.json']" ``` +#### TensorRT speedup +The `vista2d` bundle supports acceleration with TensorRT. The table below displays the speedup ratios observed on an A100 80G GPU. + +| method | torch_fp32(ms) | torch_amp(ms) | trt_fp32(ms) | trt_fp16(ms) | speedup amp | speedup fp32 | speedup fp16 | amp vs fp16| +| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | +| model computation | 90.11 | 39.68 | 71.7 | 17.32 | 2.27 | 1.26 | 5.20 | 2.29 | +| end2end | - | - | - | - | - | - | - | - | + +Where: +- `model computation` means the speedup ratio of model's inference with a random input without preprocessing and postprocessing +- `end2end` means run the bundle end-to-end with the TensorRT based model. +- `torch_fp32` and `torch_amp` are for the PyTorch models with or without `amp` mode. +- `trt_fp32` and `trt_fp16` are for the TensorRT based models converted in corresponding precision. +- `speedup amp`, `speedup fp32` and `speedup fp16` are the speedup ratios of corresponding models versus the PyTorch float32 model +- `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model. + +This result is benchmarked under: + - TensorRT: 10.3.0+cuda12.6 + - Torch-TensorRT Version: 2.5.0 + - CPU Architecture: x86-64 + - OS: ubuntu 20.04 + - Python version:3.10.12 + - CUDA version: 12.6 + - GPU models and configuration: A100 80G + ### Execute multi-GPU inference ```bash torchrun --nproc_per_node=gpu -m monai.bundle run_workflow "scripts.workflow.VistaCell" --config_file configs/hyper_parameters.yaml --mode infer --pretrained_ckpt_name model.pt From 23a73d652970c8b23a3cf31b89fbc3ae993a3c06 Mon Sep 17 00:00:00 2001 From: binliu Date: Thu, 5 Sep 2024 15:57:34 +0000 Subject: [PATCH 3/5] add swin model benchmark Signed-off-by: binliu --- .../docs/README.md | 25 +++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/models/swin_unetr_btcv_segmentation/docs/README.md b/models/swin_unetr_btcv_segmentation/docs/README.md index ae25b6c9..828f705c 100644 --- a/models/swin_unetr_btcv_segmentation/docs/README.md +++ b/models/swin_unetr_btcv_segmentation/docs/README.md @@ -71,6 +71,31 @@ Dice score was used for evaluating the performance of the model. This model achi ![A graph showing the validation mean Dice for 5000 epochs.](https://developer.download.nvidia.com/assets/Clara/Images/monai_swin_unetr_btcv_segmentation_val_dice_v2.png) +#### TensorRT speedup +The `swin_unetr` bundle supports acceleration with TensorRT. The table below displays the speedup ratios observed on an A100 80G GPU. + +| method | torch_fp32(ms) | torch_amp(ms) | trt_fp32(ms) | trt_fp16(ms) | speedup amp | speedup fp32 | speedup fp16 | amp vs fp16| +| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | +| model computation | 503.1 | 123.77 | 229.85 | 42.87 | 4.06 | 2.19 | 11.74 | 2.89 | +| end2end | - | - | - | - | - | - | - | - | + +Where: +- `model computation` means the speedup ratio of model's inference with a random input without preprocessing and postprocessing +- `end2end` means run the bundle end-to-end with the TensorRT based model. +- `torch_fp32` and `torch_amp` are for the PyTorch models with or without `amp` mode. +- `trt_fp32` and `trt_fp16` are for the TensorRT based models converted in corresponding precision. +- `speedup amp`, `speedup fp32` and `speedup fp16` are the speedup ratios of corresponding models versus the PyTorch float32 model +- `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model. + +This result is benchmarked under: + - TensorRT: 10.3.0+cuda12.6 + - Torch-TensorRT Version: 2.5.0 + - CPU Architecture: x86-64 + - OS: ubuntu 20.04 + - Python version:3.10.12 + - CUDA version: 12.6 + - GPU models and configuration: A100 80G + ## MONAI Bundle Commands In addition to the Pythonic APIs, a few command line interfaces (CLI) are provided to interact with the bundle. The CLI supports flexible use cases, such as overriding configs at runtime and predefining arguments in a file. From cad8db5ef3ca2b0e2a067a13053d26e338d5ece7 Mon Sep 17 00:00:00 2001 From: binliu Date: Thu, 5 Sep 2024 16:01:12 +0000 Subject: [PATCH 4/5] add model benchmark for pathology_nuclei_classification Signed-off-by: binliu --- models/pathology_nuclei_classification/docs/README.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/models/pathology_nuclei_classification/docs/README.md b/models/pathology_nuclei_classification/docs/README.md index 125c875a..ea1ab985 100644 --- a/models/pathology_nuclei_classification/docs/README.md +++ b/models/pathology_nuclei_classification/docs/README.md @@ -144,7 +144,7 @@ This bundle supports acceleration with TensorRT. The table below displays the sp | method | torch_fp32(ms) | torch_amp(ms) | trt_fp32(ms) | trt_fp16(ms) | speedup amp | speedup fp32 | speedup fp16 | amp vs fp16| | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | -| model computation | 9.99 | 14.14 | 4.62 | 2.37 | 0.71 | 2.16 | 4.22 | 5.97 | +| model computation | 12.06 | 20.57 | 3.23 | 1.48 | 0.59 | 3.73 | 8.15 | 13.90 | | end2end | 412.95 | 408.88 | 351.64 | 286.85 | 1.01 | 1.17 | 1.44 | 1.43 | Where: @@ -156,12 +156,12 @@ Where: - `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model. This result is benchmarked under: - - TensorRT: 8.6.1+cuda12.0 - - Torch-TensorRT Version: 1.4.0 + - TensorRT: 10.3.0+cuda12.6 + - Torch-TensorRT Version: 2.5.0 - CPU Architecture: x86-64 - OS: ubuntu 20.04 - - Python version:3.8.10 - - CUDA version: 12.1 + - Python version:3.10.12 + - CUDA version: 12.6 - GPU models and configuration: A100 80G ## MONAI Bundle Commands From 7e4d8bffa128314720cb956c0c87e7ed2790d0a8 Mon Sep 17 00:00:00 2001 From: binliu Date: Thu, 5 Sep 2024 16:05:01 +0000 Subject: [PATCH 5/5] add model benchmark for pathology_nuclei_classification_segmentation Signed-off-by: binliu --- .../docs/README.md | 25 +++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/models/pathology_nuclei_segmentation_classification/docs/README.md b/models/pathology_nuclei_segmentation_classification/docs/README.md index c9dd9b83..a9987423 100644 --- a/models/pathology_nuclei_segmentation_classification/docs/README.md +++ b/models/pathology_nuclei_segmentation_classification/docs/README.md @@ -93,6 +93,31 @@ stage2: ![A graph showing the validation mean dice over 50 epochs in stage2](https://developer.download.nvidia.com/assets/Clara/Images/monai_pathology_segmentation_classification_val_stage1_v2.png) +#### TensorRT speedup +This bundle supports acceleration with TensorRT. The table below displays the speedup ratios observed on an A100 80G GPU. + +| method | torch_fp32(ms) | torch_amp(ms) | trt_fp32(ms) | trt_fp16(ms) | speedup amp | speedup fp32 | speedup fp16 | amp vs fp16| +| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | +| model computation | 27.15 | 20.14 | 19.54 | 5.63 | 1.35 | 1.39 | 4.82 | 3.58 | +| end2end | - | + +Where: +- `model computation` means the speedup ratio of model's inference with a random input without preprocessing and postprocessing +- `end2end` means run the bundle end-to-end with the TensorRT based model. +- `torch_fp32` and `torch_amp` are for the PyTorch models with or without `amp` mode. +- `trt_fp32` and `trt_fp16` are for the TensorRT based models converted in corresponding precision. +- `speedup amp`, `speedup fp32` and `speedup fp16` are the speedup ratios of corresponding models versus the PyTorch float32 model +- `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model. + +This result is benchmarked under: + - TensorRT: 10.3.0+cuda12.6 + - Torch-TensorRT Version: 2.5.0 + - CPU Architecture: x86-64 + - OS: ubuntu 20.04 + - Python version:3.10.12 + - CUDA version: 12.6 + - GPU models and configuration: A100 80G + ## MONAI Bundle Commands In addition to the Pythonic APIs, a few command line interfaces (CLI) are provided to interact with the bundle. The CLI supports flexible use cases, such as overriding configs at runtime and predefining arguments in a file.