-
Notifications
You must be signed in to change notification settings - Fork 376
Open
Labels
Description
Bug Description
from tensorrt import Logger, Runtime
from torch import randn
from torchvision.models import mobilenet_v2, MobileNet_V2_Weights
from torch_tensorrt import convert_method_to_trt_engine
# Create model
weights = MobileNet_V2_Weights.DEFAULT
model = mobilenet_v2(weights=weights).eval()
example_input = randn(1, 3, 224, 224)
# Create TRT engine
engine_bytes = convert_method_to_trt_engine(
model,
ir="dynamo",
inputs=[example_input],
version_compatible=True,
hardware_compatible=True,
require_full_compilation=True
)
# Check hardware compat
logger = Logger(Logger.WARNING)
runtime = Runtime(logger)
engine = runtime.deserialize_cuda_engine(engine_bytes)
print("Hardware compat level:", engine.hardware_compatibility_level)
# prints: Hardware compat level: HardwareCompatibilityLevel.NONEI am running on an A100 (sm_80). I also see that the correct hardware_compatible flag is being passed to C++, from the torch-tensorrt logger:
CompilationSettings(
enabled_precisions={<dtype.f32: 7>},
workspace_size=1073741824,
min_block_size=5,
torch_executed_ops=set(),
pass_through_build_failures=False,
max_aux_streams=None,
version_compatible=True,
optimization_level=3,
use_python_runtime=False,
truncate_double=False,
use_fast_partitioner=True,
enable_experimental_decompositions=False,
device=Device(type=DeviceType.GPU, gpu_id=0),
equire_full_compilation=True,
disable_tf32=False,
assume_dynamic_shape_support=False,
sparse_weights=False, engine_capability=<EngineCapability.STANDARD: 1>,
num_avg_timing_iters=1, dla_sram_size=1048576,
dla_local_dram_size=1073741824,
dla_global_dram_size=536870912,
dryrun=False,
hardware_compatible=True,
timing_cache_path='/tmp/torch_tensorrt_engine_cache/timing_cache.bin',
lazy_engine_init=False,
cache_built_engines=False,
reuse_cached_engines=False,
use_explicit_typing=False,
use_fp32_acc=False,
refit_identical_engine_weights=False,
strip_engine_weights=False,
immutable_weights=True,
enable_weight_streaming=False,
enable_cross_compile_for_windows=False,
tiling_optimization_level='none',
l2_limit_for_tiling=-1,
use_distributed_mode_trace=False,
offload_module_to_cpu=False
)Any ideas.
To Reproduce
Steps to reproduce the behavior:
- Run Python script above
Expected behavior
Engine hardware compatibility shows AMPERE_PLUS.
Environment
Build information about Torch-TensorRT can be found by turning on debug messages
- Torch-TensorRT Version (e.g. 1.0.0): 2.9.0
- PyTorch Version (e.g. 1.0): 2.9.0
- CPU Architecture: x86_64
- OS (e.g., Linux): Linux
- How you installed PyTorch (
conda,pip,libtorch, source):pip - Build command you used (if compiling from source):
- Are you using local sources or building from archives: No
- Python version: 3.12
- CUDA version: 12.8
- GPU models and configuration: Nvidia A100
- Any other relevant information: None