Skip to content

🐛 [Bug] Exporting engine with hardware_compatible does not create hardware compatible egine #3941

@olokobayusuf

Description

@olokobayusuf

Bug Description

from tensorrt import Logger, Runtime
from torch import randn
from torchvision.models import mobilenet_v2, MobileNet_V2_Weights
from torch_tensorrt import convert_method_to_trt_engine

# Create model
weights = MobileNet_V2_Weights.DEFAULT
model = mobilenet_v2(weights=weights).eval()
example_input = randn(1, 3, 224, 224)

# Create TRT engine
engine_bytes = convert_method_to_trt_engine(
    model,
    ir="dynamo",
    inputs=[example_input],
    version_compatible=True,
    hardware_compatible=True,
    require_full_compilation=True
)

# Check hardware compat
logger = Logger(Logger.WARNING)
runtime = Runtime(logger)
engine = runtime.deserialize_cuda_engine(engine_bytes)
print("Hardware compat level:", engine.hardware_compatibility_level)
# prints: Hardware compat level: HardwareCompatibilityLevel.NONE

I am running on an A100 (sm_80). I also see that the correct hardware_compatible flag is being passed to C++, from the torch-tensorrt logger:

CompilationSettings(
    enabled_precisions={<dtype.f32: 7>},
    workspace_size=1073741824,
    min_block_size=5,
    torch_executed_ops=set(),
    pass_through_build_failures=False,
    max_aux_streams=None,
    version_compatible=True,
    optimization_level=3,
    use_python_runtime=False,
    truncate_double=False,
    use_fast_partitioner=True,
    enable_experimental_decompositions=False,
    device=Device(type=DeviceType.GPU, gpu_id=0), 
    equire_full_compilation=True,
    disable_tf32=False,
    assume_dynamic_shape_support=False,
    sparse_weights=False, engine_capability=<EngineCapability.STANDARD: 1>,
    num_avg_timing_iters=1, dla_sram_size=1048576,
    dla_local_dram_size=1073741824,
    dla_global_dram_size=536870912,
    dryrun=False,
    hardware_compatible=True,
    timing_cache_path='/tmp/torch_tensorrt_engine_cache/timing_cache.bin',
    lazy_engine_init=False,
    cache_built_engines=False,
    reuse_cached_engines=False,
    use_explicit_typing=False,
    use_fp32_acc=False,
    refit_identical_engine_weights=False,
    strip_engine_weights=False,
    immutable_weights=True,
    enable_weight_streaming=False,
    enable_cross_compile_for_windows=False,
    tiling_optimization_level='none',
    l2_limit_for_tiling=-1,
    use_distributed_mode_trace=False,
    offload_module_to_cpu=False
)

Any ideas.

To Reproduce

Steps to reproduce the behavior:

  1. Run Python script above

Expected behavior

Engine hardware compatibility shows AMPERE_PLUS.

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

  • Torch-TensorRT Version (e.g. 1.0.0): 2.9.0
  • PyTorch Version (e.g. 1.0): 2.9.0
  • CPU Architecture: x86_64
  • OS (e.g., Linux): Linux
  • How you installed PyTorch (conda, pip, libtorch, source): pip
  • Build command you used (if compiling from source):
  • Are you using local sources or building from archives: No
  • Python version: 3.12
  • CUDA version: 12.8
  • GPU models and configuration: Nvidia A100
  • Any other relevant information: None

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions