Skip to content

Conversation

@oraluben
Copy link
Contributor

@oraluben oraluben commented Dec 2, 2025

Resolves #292

Summary

This PR refactors the CUBIN launcher utilities to improve flexibility, modernization, and CMake integration. It introduces new macros for loading CUBINs from byte arrays (enabling C++23 #embed support or bin2c in CUDA Toolkit) and provides updated CMake utilities that better align with standard CMake practices.

Key Changes

C++ API & Headers

  • New Macro: TVM_FFI_LOAD_LIBRARY_FROM_BYTES(name, imageBytes): Allows registering a CUBIN module from a raw byte array (e.g., unsigned char[]). This supports modern embedding techniques like C++23 #embed (or compiler extensions) and traditional header-based embedding (e.g. via bin2c).
  • Header Refactoring:
    • include/tvm/ffi/extra/cuda/unify_api.h: specific low-level CUDA Driver/Runtime API wrappers moved here. This header introduces a unified abstraction layer over CUDA Driver and Runtime APIs. It provides:
      • Unified Handles: LibraryHandle, KernelHandle, StreamHandle, ResultHandle, etc. that map to their respective Driver (e.g., CUfunction) or Runtime (e.g., cudaKernel_t) counterparts.
      • Unified Error Checking: TVM_FFI_CHECK_CUDA_ERROR macro that handles return codes from either API.
      • Seamless Switching: Users can select the underlying API by defining TVM_FFI_CUBIN_LAUNCHER_USE_DRIVER_API (defined true value for Driver, otherwise for Runtime) before inclusion, with automatic default selection based on CUDA version.
    • include/tvm/ffi/extra/cuda/base.h: dim3 struct definition moved here.
    • include/tvm/ffi/extra/cuda/cubin_launcher.h: simplified to focus on CubinModule and CubinKernel.
  • API Update: CubinModule constructor now accepts const unsigned char* to support byte arrays.

CMake Utilities (cmake/Utils/EmbedCubin.cmake)

  • Updated Approach: Transitioned from imperative command-based utilities (old tvm_ffi_generate_cubin/tvm_ffi_embed_cubin) to a declarative, target-based approach that is more idiomatic to modern CMake.
  • New Functions:
    • add_tvm_ffi_cubin: Compatibility wrapper for compiling CUDA to CUBIN.
    • add_tvm_ffi_fatbin: Compatibility wrapper for compiling CUDA to FATBIN.
    • tvm_ffi_embed_bin_into: Helper to embed CUBIN/FATBIN into an existing object target (linking approach). It leverages PRE_LINK commands to inject binary data into object files just before linking. While using PRE_LINK to modify object files is non-trivial, it proves to be the most reliable method for deep integration with CMake's dependency graph and target system.
  • Removed: Obsolete functions tvm_ffi_generate_cubin and tvm_ffi_embed_cubin.

Examples

  • Restructured: examples/cubin_launcher/embedded_cubin/ is now split into sub-examples demonstrating different embedding techniques:
    • embed_with_tvm_ffi: Uses the CMake/object-linking approach with TVM_FFI_EMBED_CUBIN.
    • cpp_embed: Uses #embed (with C++ 23) with TVM_FFI_LOAD_LIBRARY_FROM_BYTES.
    • include_bin2c: Shows embedding via bin2c style (header file).

Documentation

  • Updated docs/guides/cubin_launcher.rst to document the new CMake functions, the TVM_FFI_LOAD_LIBRARY_FROM_BYTES macro, and the recommended embedding approaches.

Test status

I run cubin launcher examples (2 migrated, 2 new) on a CUDA 13.1 env. For the #embed one, use clang-20 as host compiler.

I've confirmed that both Runtime and Driver API work, correct symbols from different APIs are used and kernel works as expected.

Notably, when compiling with cudart 13.1, it works with cudart 12.8 comes with torch.

@oraluben oraluben force-pushed the embed-cubin-v2 branch 2 times, most recently from b23444a to bc91c5a Compare December 2, 2025 06:20
@oraluben oraluben force-pushed the embed-cubin-v2 branch 3 times, most recently from f562ab3 to b1feeb9 Compare December 17, 2025 00:56
@oraluben oraluben marked this pull request as ready for review December 17, 2025 02:46
@oraluben oraluben changed the title [WIP] Refactor cubin launcher [Enhancement] Refactor cubin launcher Dec 17, 2025
@oraluben
Copy link
Contributor Author

This is ready for review @junrushao @yaoyaoding

Copy link
Contributor

@yaoyaoding yaoyaoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @oraluben !

The update on the cmake looks great to me! I also learnt something new for cmake.

A major concern I have is for the unify_api.h (see the comment below).

I also left some minor comments.

@oraluben
Copy link
Contributor Author

I asked Gemini on the best parctice of indicating "internal-only" header and it suggests us to use some "detail" subfolder and define the internal functions in "details" namespace. Maybe we can create a "detail" subfolder like /include/tvm/ffi/extra/cuda/detail and define the compatibility layer in a "details" namespace.

My gemini also prefers "detail", but I do think "internal" would be better :)

Currently we have tvm/ffi/extras/cuda/internal/unified_api.h and a new namespace tvm::ffi::cuda_api for all those internal type/impls.

I believe all comments have been resolved, PTAL again @yaoyaoding

cc @junrushao

@junrushao
Copy link
Member

I’d love to delegate this PR to Yaoyao and just let me know when it’s ready to merge 🫡

Copy link
Contributor

@yaoyaoding yaoyaoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good to me now, thanks @oraluben !

cc @junrushao

@junrushao
Copy link
Member

Thank you both! This is huge amount of work 🫡

@junrushao junrushao merged commit b16f11f into apache:main Dec 24, 2025
7 checks passed
@oraluben oraluben deleted the embed-cubin-v2 branch December 24, 2025 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Proposal] Refactor cubin launcher

3 participants