Skip to content

Low ViT Performance Gain on Jetson Thor Using FP8 vs FP16 #4599

@bowCine89

Description

@bowCine89

Description

Hello,

Looking at the documentation, to enable fp8 operations you need some onnx surgery (inserting Q/DQ at specific locations) to trigger the right MHA (Multi-Head Attention) fusion in conjunction with fp8 precision.

However, the performance improvement is quite low for base ViT model (~20% latency reduction). It is even worse on the EfficientSAM encoder with basically no gain.

By looking at the profiling and layer info from TensorRT the FP8 seems there (even though some tactics are quite cryptic, especially the gmm_mha_v2_#weirdbitstream).

Environment

  • TensorRT Version: 10.13.3
  • NVIDIA GPU: Thor (Jetson DevKit)
  • NVIDIA Driver Version: 580.00
  • CUDA Version: 13

Relevant Files

Steps To Reproduce

Model Optimizer -> commit

ViT-Base FP8 onnx generation:
python3 -m modelopt.onnx.quantization --onnx_path=./vit_base_patch8_224_Opset17.onnx --quantize_mode=fp8 --output_path=./vitb_fp8.onnx

EfficientSAM-S FP8 onnx generation:
python3 -m modelopt.onnx.quantization --onnx_path=./efficientsam_s_encoder.onnx --quantize_mode=fp8 --output_path=./sam_s_fp8.onnx

ViT-Base FP8 engine generation:
trtexec --stronglyTyped --onnx=./vitb_fp8.onnx --saveEngine=./vitb_fp8.engine

ViT-Base FP8 engine generation:
trtexec --stronglyTyped --onnx=./sam_s_fp8.onnx --saveEngine=./sam_s_fp8.engine

TensorRT Layer Info and Profiles

vit_base_patch8_224_Opset17_fp8.json
vit_base_patch8_224_Opset17_fp8.profile.txt
vit_base_patch8_224_Opset17_fp16.json
vit_base_patch8_224_Opset17_fp16.profile.txt
efficientsam_s_encoder_fp8.json
efficientsam_s_encoder_fp8.profile.txt
efficientsam_s_encoder_fp16.json
efficientsam_s_encoder_fp16.profile.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    Module:ONNXIssues relating to ONNX usage and importModule:PerformanceGeneral performance issuesModule:QuantizationIssues related to Quantization

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions