-
Notifications
You must be signed in to change notification settings - Fork 43
[Enhancement] Refactor cubin launcher #300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b23444a to
bc91c5a
Compare
f562ab3 to
b1feeb9
Compare
12b7f54 to
463c3bf
Compare
…S; update CMake targets for fatbin generation.
|
This is ready for review @junrushao @yaoyaoding |
yaoyaoding
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @oraluben !
The update on the cmake looks great to me! I also learnt something new for cmake.
A major concern I have is for the unify_api.h (see the comment below).
I also left some minor comments.
Co-authored-by: Yaoyao Ding <[email protected]>
865c9ee to
4134d13
Compare
3098fce to
6dbe317
Compare
My gemini also prefers "detail", but I do think "internal" would be better :) Currently we have I believe all comments have been resolved, PTAL again @yaoyaoding cc @junrushao |
|
I’d love to delegate this PR to Yaoyao and just let me know when it’s ready to merge 🫡 |
yaoyaoding
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks good to me now, thanks @oraluben !
cc @junrushao
|
Thank you both! This is huge amount of work 🫡 |
Resolves #292
Summary
This PR refactors the CUBIN launcher utilities to improve flexibility, modernization, and CMake integration. It introduces new macros for loading CUBINs from byte arrays (enabling C++23
#embedsupport orbin2cin CUDA Toolkit) and provides updated CMake utilities that better align with standard CMake practices.Key Changes
C++ API & Headers
TVM_FFI_LOAD_LIBRARY_FROM_BYTES(name, imageBytes): Allows registering a CUBIN module from a raw byte array (e.g.,unsigned char[]). This supports modern embedding techniques like C++23#embed(or compiler extensions) and traditional header-based embedding (e.g. viabin2c).include/tvm/ffi/extra/cuda/unify_api.h: specific low-level CUDA Driver/Runtime API wrappers moved here. This header introduces a unified abstraction layer over CUDA Driver and Runtime APIs. It provides:LibraryHandle,KernelHandle,StreamHandle,ResultHandle, etc. that map to their respective Driver (e.g.,CUfunction) or Runtime (e.g.,cudaKernel_t) counterparts.TVM_FFI_CHECK_CUDA_ERRORmacro that handles return codes from either API.TVM_FFI_CUBIN_LAUNCHER_USE_DRIVER_API(defined true value for Driver, otherwise for Runtime) before inclusion, with automatic default selection based on CUDA version.include/tvm/ffi/extra/cuda/base.h:dim3struct definition moved here.include/tvm/ffi/extra/cuda/cubin_launcher.h: simplified to focus onCubinModuleandCubinKernel.CubinModuleconstructor now acceptsconst unsigned char*to support byte arrays.CMake Utilities (
cmake/Utils/EmbedCubin.cmake)tvm_ffi_generate_cubin/tvm_ffi_embed_cubin) to a declarative, target-based approach that is more idiomatic to modern CMake.add_tvm_ffi_cubin: Compatibility wrapper for compiling CUDA to CUBIN.add_tvm_ffi_fatbin: Compatibility wrapper for compiling CUDA to FATBIN.tvm_ffi_embed_bin_into: Helper to embed CUBIN/FATBIN into an existing object target (linking approach). It leveragesPRE_LINKcommands to inject binary data into object files just before linking. While usingPRE_LINKto modify object files is non-trivial, it proves to be the most reliable method for deep integration with CMake's dependency graph and target system.tvm_ffi_generate_cubinandtvm_ffi_embed_cubin.Examples
examples/cubin_launcher/embedded_cubin/is now split into sub-examples demonstrating different embedding techniques:embed_with_tvm_ffi: Uses the CMake/object-linking approach withTVM_FFI_EMBED_CUBIN.cpp_embed: Uses#embed(with C++ 23) withTVM_FFI_LOAD_LIBRARY_FROM_BYTES.include_bin2c: Shows embedding viabin2cstyle (header file).Documentation
docs/guides/cubin_launcher.rstto document the new CMake functions, theTVM_FFI_LOAD_LIBRARY_FROM_BYTESmacro, and the recommended embedding approaches.Test status
I run cubin launcher examples (2 migrated, 2 new) on a CUDA 13.1 env. For the
#embedone, use clang-20 as host compiler.I've confirmed that both Runtime and Driver API work, correct symbols from different APIs are used and kernel works as expected.
Notably, when compiling with cudart 13.1, it works with cudart 12.8 comes with torch.