[Enhancement] Refactor cubin launcher #300

oraluben · 2025-12-02T05:08:51Z

Resolves #292

Summary

This PR refactors the CUBIN launcher utilities to improve flexibility, modernization, and CMake integration. It introduces new macros for loading CUBINs from byte arrays (enabling C++23 #embed support or bin2c in CUDA Toolkit) and provides updated CMake utilities that better align with standard CMake practices.

Key Changes

C++ API & Headers

New Macro: TVM_FFI_LOAD_LIBRARY_FROM_BYTES(name, imageBytes): Allows registering a CUBIN module from a raw byte array (e.g., unsigned char[]). This supports modern embedding techniques like C++23 #embed (or compiler extensions) and traditional header-based embedding (e.g. via bin2c).
Header Refactoring:
- include/tvm/ffi/extra/cuda/unify_api.h: specific low-level CUDA Driver/Runtime API wrappers moved here. This header introduces a unified abstraction layer over CUDA Driver and Runtime APIs. It provides:
  - Unified Handles: LibraryHandle, KernelHandle, StreamHandle, ResultHandle, etc. that map to their respective Driver (e.g., CUfunction) or Runtime (e.g., cudaKernel_t) counterparts.
  - Unified Error Checking: TVM_FFI_CHECK_CUDA_ERROR macro that handles return codes from either API.
  - Seamless Switching: Users can select the underlying API by defining TVM_FFI_CUBIN_LAUNCHER_USE_DRIVER_API (defined true value for Driver, otherwise for Runtime) before inclusion, with automatic default selection based on CUDA version.
- include/tvm/ffi/extra/cuda/base.h: dim3 struct definition moved here.
- include/tvm/ffi/extra/cuda/cubin_launcher.h: simplified to focus on CubinModule and CubinKernel.
API Update: CubinModule constructor now accepts const unsigned char* to support byte arrays.

CMake Utilities (`cmake/Utils/EmbedCubin.cmake`)

Updated Approach: Transitioned from imperative command-based utilities (old tvm_ffi_generate_cubin/tvm_ffi_embed_cubin) to a declarative, target-based approach that is more idiomatic to modern CMake.
New Functions:
- add_tvm_ffi_cubin: Compatibility wrapper for compiling CUDA to CUBIN.
- add_tvm_ffi_fatbin: Compatibility wrapper for compiling CUDA to FATBIN.
- tvm_ffi_embed_bin_into: Helper to embed CUBIN/FATBIN into an existing object target (linking approach). It leverages PRE_LINK commands to inject binary data into object files just before linking. While using PRE_LINK to modify object files is non-trivial, it proves to be the most reliable method for deep integration with CMake's dependency graph and target system.
Removed: Obsolete functions tvm_ffi_generate_cubin and tvm_ffi_embed_cubin.

Examples

Restructured: examples/cubin_launcher/embedded_cubin/ is now split into sub-examples demonstrating different embedding techniques:
- embed_with_tvm_ffi: Uses the CMake/object-linking approach with TVM_FFI_EMBED_CUBIN.
- cpp_embed: Uses #embed (with C++ 23) with TVM_FFI_LOAD_LIBRARY_FROM_BYTES.
- include_bin2c: Shows embedding via bin2c style (header file).

Documentation

Updated docs/guides/cubin_launcher.rst to document the new CMake functions, the TVM_FFI_LOAD_LIBRARY_FROM_BYTES macro, and the recommended embedding approaches.

Test status

I run cubin launcher examples (2 migrated, 2 new) on a CUDA 13.1 env. For the #embed one, use clang-20 as host compiler.

I've confirmed that both Runtime and Driver API work, correct symbols from different APIs are used and kernel works as expected.

Notably, when compiling with cudart 13.1, it works with cudart 12.8 comes with torch.

…S; update CMake targets for fatbin generation.

oraluben · 2025-12-17T09:04:01Z

This is ready for review @junrushao @yaoyaoding

yaoyaoding

Thanks @oraluben !

The update on the cmake looks great to me! I also learnt something new for cmake.

A major concern I have is for the unify_api.h (see the comment below).

I also left some minor comments.

cmake/Utils/EmbedCubin.cmake

docs/guides/cubin_launcher.rst

examples/cubin_launcher/dynamic_cubin/src/lib_dynamic.cc

include/tvm/ffi/extra/cuda/unify_api.h

Co-authored-by: Yaoyao Ding <[email protected]>

…_BYTES`

…add doc:wq

oraluben · 2025-12-23T04:44:00Z

I asked Gemini on the best parctice of indicating "internal-only" header and it suggests us to use some "detail" subfolder and define the internal functions in "details" namespace. Maybe we can create a "detail" subfolder like /include/tvm/ffi/extra/cuda/detail and define the compatibility layer in a "details" namespace.

My gemini also prefers "detail", but I do think "internal" would be better :)

Currently we have tvm/ffi/extras/cuda/internal/unified_api.h and a new namespace tvm::ffi::cuda_api for all those internal type/impls.

I believe all comments have been resolved, PTAL again @yaoyaoding

cc @junrushao

junrushao · 2025-12-23T05:51:17Z

I’d love to delegate this PR to Yaoyao and just let me know when it’s ready to merge 🫡

yaoyaoding

The PR looks good to me now, thanks @oraluben !

cc @junrushao

junrushao · 2025-12-24T17:19:44Z

Thank you both! This is huge amount of work 🫡

oraluben force-pushed the embed-cubin-v2 branch 2 times, most recently from b23444a to bc91c5a Compare December 2, 2025 06:20

oraluben force-pushed the embed-cubin-v2 branch 3 times, most recently from f562ab3 to b1feeb9 Compare December 17, 2025 00:56

oraluben marked this pull request as ready for review December 17, 2025 02:46

oraluben and others added 18 commits December 17, 2025 11:31

[cmake] Use object library to compile cuda files to cubin/fatbin

b182ccd

check for cudart version

c997be8

tmp

0aa4b0e

[1/n] Use unify API

cbacfcb

[2/n] Remove all rt api

6a21adb

[3/3] Unify rt and driver api

b553c7e

upd namespace

6bffd5e

fix version check

848b8e2

adapt dynamic example

8bea855

update

2125287

embed example

b8b246c

update cmake doc

2427cc4

update cmake doc

e9546c1

upd

fb01e61

rename macro

871f686

Add an example with cpp's resource inclusion

556629b

add example for bin2c

d6bb52a

cleanup

463c3bf

oraluben force-pushed the embed-cubin-v2 branch from 12b7f54 to 463c3bf Compare December 17, 2025 03:31

oraluben added 5 commits December 17, 2025 11:59

update doc

617a686

lint

1ca425c

Refactor CUBIN embedding macros to use TVM_FFI_LOAD_LIBRARY_FROM_BYTE…

30c11c6

…S; update CMake targets for fatbin generation.

doc

5378347

vibe documenting

5b13cbf

oraluben added 2 commits December 17, 2025 15:56

lint

03819ef

add doc for TVM_FFI_LOAD_LIBRARY_FROM_BYTES

a28bf1b

oraluben changed the title ~~[WIP] Refactor cubin launcher~~ [Enhancement] Refactor cubin launcher Dec 17, 2025

ci

3a3c3ad

upd

acdc148

yaoyaoding requested changes Dec 18, 2025

View reviewed changes

oraluben and others added 2 commits December 18, 2025 17:34

use proper name for cuda result type

71a0a85

Use a better signature for tvm_ffi_embed_bin_into

4134d13

Co-authored-by: Yaoyao Ding <[email protected]>

oraluben force-pushed the embed-cubin-v2 branch from 865c9ee to 4134d13 Compare December 18, 2025 09:45

oraluben added 2 commits December 19, 2025 09:37

Rename TVM_FFI_LOAD_LIBRARY_FROM_BYTES to `TVM_FFI_EMBED_CUBIN_FROM…

8cda1f2

…_BYTES`

Remove INTERMEDIATE_FILE arg

6dbe317

oraluben force-pushed the embed-cubin-v2 branch from 3098fce to 6dbe317 Compare December 19, 2025 01:55

oraluben added 6 commits December 19, 2025 11:50

Move copy util to a separate file

8db2de4

Merge branch 'main' into embed-cubin-v2

48c3c0c

Set CMAKE_CUDA_RUNTIME_LIBRARY to Shared when no default value and …

81ae232

…add doc:wq

move unified api to new location and namespace

6a98732

Fix issues found by gemini

11253ec

doc update from gemini

2d177b6

yaoyaoding approved these changes Dec 24, 2025

View reviewed changes

junrushao merged commit b16f11f into apache:main Dec 24, 2025
7 checks passed

oraluben deleted the embed-cubin-v2 branch December 24, 2025 23:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Enhancement] Refactor cubin launcher #300

[Enhancement] Refactor cubin launcher #300

Uh oh!

oraluben commented Dec 2, 2025 •

edited

Loading

Uh oh!

oraluben commented Dec 17, 2025

Uh oh!

yaoyaoding left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oraluben commented Dec 23, 2025

Uh oh!

junrushao commented Dec 23, 2025

Uh oh!

yaoyaoding left a comment

Uh oh!

junrushao commented Dec 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Enhancement] Refactor cubin launcher #300

[Enhancement] Refactor cubin launcher #300

Uh oh!

Conversation

oraluben commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

C++ API & Headers

CMake Utilities (cmake/Utils/EmbedCubin.cmake)

Examples

Documentation

Test status

Uh oh!

oraluben commented Dec 17, 2025

Uh oh!

yaoyaoding left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oraluben commented Dec 23, 2025

Uh oh!

junrushao commented Dec 23, 2025

Uh oh!

yaoyaoding left a comment

Choose a reason for hiding this comment

Uh oh!

junrushao commented Dec 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

oraluben commented Dec 2, 2025 •

edited

Loading

CMake Utilities (`cmake/Utils/EmbedCubin.cmake`)