Skip to content

Error: Unexpected error from cudaGetDeviceCount() #1411

@3niPhantom

Description

@3niPhantom

Search before asking

  • I have searched the jetson-containers issues and found no similar feature requests.

jetson-containers Component

Packages

Bug

Hi,

I am trying to build an image from jetson-containers with the following command:

jetson-containers build --name=test_combo_image realsense nanoowl nanosam

During the testing phase of the build, I get the following error:

───────────────────────────────────────────────────────────────────────┐
│ > TESTING  test_combo_image:r36.4.tegra-aarch64-cu126-22.04-torch2trt │
└───────────────────────────────────────────────────────────────────────┘

docker run -t --rm --network=host --privileged --runtime=nvidia \
  --volume /ssd/repos/jetson-containers/packages/pytorch/torch2trt:/test \
  --volume /ssd/repos/jetson-containers/data:/data \
  test_combo_image:r36.4.tegra-aarch64-cu126-22.04-torch2trt \
    /bin/bash -c 'python3 /test/test.py


testing torch2trt...
/usr/local/lib/python3.10/dist-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Traceback (most recent call last):
  File "/test/test.py", line 9, in <module>
    model = alexnet(pretrained=True).eval().cuda()
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1082, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 928, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 928, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 955, in _apply
    param_applied = fn(param)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1082, in <lambda>
    return self._apply(lambda t: t.cuda(device))
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 412, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 34: CUDA driver is a stub library
[12:25:51] ===================================================================================== 
[12:25:51] ===================================================================================== 
[12:25:51] 💣 `jetson-containers build` failed after 75.1 seconds (1.3 minutes) 
[12:25:51] Error: Command 'docker run -t --rm --network=host --privileged --runtime=nvidia   --volume /ssd/repos/jetson-containers/packages/pytorch/torch2trt:/test   --volume /ssd/repos/jetson-containers/data:/data   test_combo_image:r36.4.tegra-aarch64-cu126-22.04-torch2trt     /bin/bash -c 'python3 /test/test.py' 2>&1 | tee /ssd/repos/jetson-containers/logs/20250906_122435/test/19-1_test_combo_image_r36.4.tegra-aarch64-cu126-22.04-torch2trt_test.py.txt; exit ${PIPESTATUS[0]}' returned non-zero exit status 1. 
[12:25:51] ===================================================================================== 
[12:25:51] ===================================================================================== 
[12:25:51] Failed building:  realsense, nanoowl, nanosam

Traceback (most recent call last):
  File "/ssd/repos/jetson-containers/jetson_containers/build.py", line 129, in <module>
    build_container(**vars(args))
  File "/ssd/repos/jetson-containers/jetson_containers/container.py", line 246, in build_container
    test_container(container_name, pkg, simulate, build_idx=idx)
  File "/ssd/repos/jetson-containers/jetson_containers/container.py", line 456, in test_container
    status = subprocess.run(cmd.replace(_NEWLINE_, ' '), executable='/bin/bash', shell=True, check=True)
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'docker run -t --rm --network=host --privileged --runtime=nvidia   --volume /ssd/repos/jetson-containers/packages/pytorch/torch2trt:/test   --volume /ssd/repos/jetson-containers/data:/data   test_combo_image:r36.4.tegra-aarch64-cu126-22.04-torch2trt     /bin/bash -c 'python3 /test/test.py' 2>&1 | tee /ssd/repos/jetson-containers/logs/20250906_122435/test/19-1_test_combo_image_r36.4.tegra-aarch64-cu126-22.04-torch2trt_test.py.txt; exit ${PIPESTATUS[0]}' returned non-zero exit status 1

I suspect it is due to some driver/library mismatch. Can anyone help me out? Thanks!

Environment

  • dpkg -l | grep jetpack shows no output.
  • Output for uname -r
    • 5.15.148-tegra
  • Output for apt show nvidia-jetpack
    •  Package: nvidia-jetpack
       Version: 6.2.1+b38
       Priority: standard
       Section: metapackages
       Source: nvidia-jetpack (6.2.1)
       Maintainer: NVIDIA Corporation
       Installed-Size: 199 kB
       Depends: nvidia-jetpack-runtime (= 6.2.1+b38), nvidia-jetpack-dev (= 6.2.1+b38)
       Homepage: http://developer.nvidia.com/jetson
       Download-Size: 29.3 kB
       APT-Sources: https://repo.download.nvidia.com/jetson/common r36.4/main arm64 Packages
       Description: NVIDIA Jetpack Meta Package
      

Also note:

┌───────────────────────┬────────────────────────┐
│ L4T_VERSION   36.4.4  │ JETPACK_VERSION  6.2.1 │
│ CUDA_VERSION  12.6    │ PYTHON_VERSION   3.10  │
│ SYSTEM_ARCH   aarch64 │ LSB_RELEASE      22.04 │
└─

Additional

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions