Skip to content

Conversation

derek-gerstmann
Copy link
Contributor

@derek-gerstmann derek-gerstmann commented Sep 2, 2025

Rework bundle_static to extract all object files from all static libs and do a partial link to resolve all internal references.

Added new tools/bundle_static.sh script for MacOS and Linux, which replaces the calls to libtool and mir in BundleStatic.cmake.

and do a partial link to resolve all internal reference.
@alexreinking
Copy link
Member

alexreinking commented Sep 3, 2025

(I have detailed this experiment in other forums, but I am reproducing it here for open-source visibility)

Consider the following source files:

// a.c
extern int b();
int a() { return b() + 1; }
// b.c
int b () { return 41; }
// main.c
#include <stdio.h>
extern int a();
int main() { printf("%d\n", a()); }

Clearly, main.c depends on a.c, which in turn depends on b.c.

Let's first try compiling them separately as two static libraries:

$ cc -o a.o -c a.c
$ libtool -static -o liba.a a.o
$ cc -o b.o -c b.c
$ libtool -static -o libb.a b.o

If we look at the output of nm we see that a.o requires a symbol _b and b.o has no dependencies:

$ nm -u liba.a

a.o:
_b
$ nm -u libb.a

b.o:

Sure enough, it is insufficient to link to just liba.a:

$ cc -o main-bad main.c liba.a
Undefined symbols for architecture arm64:
  "_b", referenced from:
      _a in liba.a[2](a.o)
ld: symbol(s) not found for architecture arm64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [main-bad] Error 1

But linking to liba.a and libb.a works, of course:

$ cc -o main-both main.c liba.a libb.a
$ ./main-both
42

Now we ask, "how can we bundle these static libraries together?" One option is to use libtool in the same way we do in the current BundleStatic implementation:

$ libtool -static -o libbundled.a liba.a libb.a

And we can see that this produces a library that links and runs:

$ cc -o main-bundled main.c libbundled.a
$ ./main-bundled
42

Another approach would be to unpack and repack the object files directly. We already have the object files lying around, so I'll just put them together directly:

$ libtool -static -o liball.a a.o b.o

Unsurprisingly, this links and runs, too:

$ cc -o main-all main.c liball.a
$ ./main-all
42

But now let's look at the output of nm -u on these two libraries:

$ nm -u libbundled.a 

a.o:
_b

b.o:
$ nm -u liball.a 

a.o:
_b

b.o:

The output is identical! And it looks like _b is undefined. Yet, clearly the linker does not have trouble resolving this symbol internally.

Therefore, I don't believe the build issue this PR means to solve has to do with "internal references". The purpose of libtool and ranlib (which libtool runs/replaces) is to build an internal index (.__SYMDEF) of symbols. I'm not sure that, semantically, internal references are an issue that can exist in the first place if those tools are used. So I'm not sure what's going on.

The proposed solution here is very complex and uses features (namely relocatable objects via ld -Ur) that have never been required by bundled libraries before (either in CMake or in the Makefile). Until we can identify the root cause of the failure, I am hesitant to replace a solution that I can test works with one that I can't test fixes a problem I can't observe.

Here are some additional details about my setup:

A Makefile to help reproduce the above
LIBTOOL ?= libtool

.PHONY: all
all: main-bundled main-all 

main-bundled: main.c libbundled.a
	$(CC) -o $@ $^

main-all: main.c liball.a
	$(CC) -o $@ $^

main-bad: main.c liba.a
	$(CC) -o $@ $^

main-both: main.c liba.a libb.a
	$(CC) -o $@ $^

%.o: %.c
	$(CC) -o $@ -c $^

lib%.a: %.o
	$(LIBTOOL) -static -o $@ $^

libbundled.a: liba.a libb.a
	$(LIBTOOL) -static -o $@ $^

liball.a: a.o b.o
	$(LIBTOOL) -static -o $@ $^

.PHONY: clean
clean:
	$(RM) liba.a libb.a a.o b.o libbundled.a liball.a main-both main-bundled main-all
Versions of the tools I'm using
$ xcodebuild -version
Xcode 16.4
Build version 16F6
$ nm -V
llvm-nm, compatible with GNU nm
Apple LLVM version 17.0.0
  Optimized build.
$ ld -v        
@(#)PROGRAM:ld PROJECT:ld-1167.5
BUILD 01:45:05 Apr 30 2025
configured to support archs: armv6 armv7 armv7s arm64 arm64e arm64_32 i386 x86_64 x86_64h armv6m armv7k armv7m armv7em
will use ld-classic for: armv6 armv7 armv7s i386 armv6m armv7k armv7m armv7em
LTO support using: LLVM version 17.0.0 (static support for 29, runtime is 29)
TAPI support using: Apple TAPI version 17.0.0 (tapi-1700.0.3.5)
$ libtool -V
Apple Inc. version cctools-1024.3
$ cc --version
Apple clang version 17.0.0 (clang-1700.0.13.5)
Target: arm64-apple-darwin24.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Relatedly, we need to decide on an Xcode version support policy. On Linux, our policy is "whatever ships with the latest Ubuntu LTS" (13, but 14 is available). And we don't have a great policy for MSVC.

I propose that we require the newest Xcode that runs on the oldest supported macOS version. Apple is very aggressive about requiring developers to upgrade to new Xcode versions (anything published to the App Store must be built with the very newest version).

According to endoflife.date, macOS 13 is the oldest supported version.

According to Apple, Xcode 15.2 is the newest supported version of Xcode on macOS 13.

Therefore, I would like a reproducer that works on Xcode 15.2.


All this said, I want to get you unstuck. Would this patch work instead? If it does, I would be glad to merge it.

diff --git a/cmake/BundleStatic.cmake b/cmake/BundleStatic.cmake
index dec8a901c..fe91a2aec 100644
--- a/cmake/BundleStatic.cmake
+++ b/cmake/BundleStatic.cmake
@@ -74,6 +74,19 @@ function(bundle_static TARGET)
         VERBATIM
     )
 
+    # If the user has specified a custom bundler, use it.
+    set(BundleStatic_TOOL "" CACHE FILEPATH "A custom tool to use to bundle static libs")
+    mark_as_advanced(BundleStatic_TOOL)
+    if (BundleStatic_TOOL)
+        add_custom_command(
+            TARGET "${TARGET}" POST_BUILD
+            COMMAND "${BundleStatic_TOOL}" "$<TARGET_FILE:${TARGET}>" "$<TARGET_FILE:${TARGET}>.tmp" "${cmd}"
+            COMMAND_EXPAND_LISTS
+            VERBATIM
+        )
+        return()
+    endif ()
+
     # Finally merge everything together using the platform tool.
     find_program(LIB lib.exe HINTS "${CMAKE_AR}")
     if (WIN32 AND LIB)

Then you could set BundleStatic_TOOL to the script in this PR.

FWIW, I think of static-lib bundling as more of a hack than a portable primitive. Static libraries are toolchain artifacts, not standardized formats, so rebundling them is brittle and invites trouble. I'd rather treat it as an escape-hatch hook, as this patch does. If the real goal is dependency management, using Halide as a shared library is well supported everywhere and sidesteps these pitfalls cleanly.


CC @slomp @shoaibkamil @abadams

@derek-gerstmann
Copy link
Contributor Author

The root cause of the breakage for bundling LLVM into our static libs was the refactoring that was done here:
6dc2b3e

In particular, the config option Halide_BUNDLE_LLVM got removed, and it was renamed to Halide_BUNDLE_STATIC, and particular assumptions were made for MacOS and Linux build environments which were different than before. Since cmake silently ignores any unknown config option, and Halide_BUNDLE_STATIC is not tested in a build_bot configuration, this went unnoticed. The end result is that cmake builds would succeed, but the resulting libHalide.a would have undefined symbols for all of LLVM.

When I investigated this, and enabled the correct Halide_BUNDLE_STATIC option, on MacOS libtool now warns about duplicate symbols:

/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning duplicate member name 'Analysis.cpp.o' from '/Users/dgerstmann/Workspace/OpenSource/LLVM/Builds/llvm-19.1.7-arm-64-osx/lib/libLLVMAnalysis.a(Analysis.cpp.o)' and '/Users/dgerstmann/Workspace/OpenSource/LLVM/Builds/llvm-19.1.7-arm-64-osx/lib/libLLVMCodeGen.a(Analysis.cpp.o)' /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning duplicate member name 'COFF.cpp.o' from '/Users/dgerstmann/Workspace/OpenSource/LLVM/Builds/llvm-19.1.7-arm-64-osx/lib/libLLVMBinaryFormat.a(COFF.cpp.o)' and '/Users/dgerstmann/Workspace/OpenSource/LLVM/Builds/llvm-19.1.7-arm-64-osx/lib/libLLVMJITLink.a(COFF.cpp.o)' /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning duplicate member name 'Core.cpp.o' from '/Users/dgerstmann/Workspace/OpenSource/LLVM/Builds/llvm-19.1.7-arm-64-osx/lib/libLLVMCore.a(Core.cpp.o)' and '/Users/dgerstmann/Workspace/OpenSource/LLVM/Builds/llvm-19.1.7-arm-64-osx/lib/libLLVMOrcJIT.a(Core.cpp.o)' /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning duplicate member name 'DXContainer.cpp.o' from '/Users/dgerstmann/Workspace/OpenSource/LLVM/Builds/llvm-19.1.7-arm-64-osx/lib/libLLVMObject.a(DXContainer.cpp.o)' and '/Users/dgerstmann/Workspace/OpenSource/LLVM/Builds/llvm-19.1.7-arm-64-osx/lib/libLLVMBinaryFormat.a(DXContainer.cpp.o)'

Since Libtool only uses the base name of objects for its members, this can lead to ambiguity and potential overwriting or incorrect linking within the resulting archive. Using Libtool also produces a libHalide.a that includes EVERYTHING from ALL llvm static libs, regardless of whether they are being used or not, and places the burden on the downstream linker to resolve all internal references, and strip all unused symbols.

On Linux, the current Halide_BUNDLE_STATIC does not support LLVM, and errors out immediately.

This PR reworks all the tooling for bundle_static so that there aren't any collisions between object files, and all internal references are resolved so that the final libHalide.a contains only the symbols required, reducing file size by >50MB, and reducing link times for downstream clients.

@abadams
Copy link
Member

abadams commented Sep 3, 2025

What should I do to repro the linux failure?

@derek-gerstmann
Copy link
Contributor Author

derek-gerstmann commented Sep 3, 2025

To reproduce the missing symbols for all of LLVM when building a static lib on main, you can use the following build options (compiling with clang on Linux/MacOS):

# save into build_options.cmake
set(CMAKE_BUILD_TYPE "Release" CACHE STRING "")
set(BUILD_SHARED_LIBS OFF CACHE BOOL "")
set(Halide_SHARED_LLVM OFF CACHE BOOL "")
set(Halide_ENABLE_EXCEPTIONS ON CACHE BOOL "")
set(Halide_ENABLE_RTTI ON CACHE BOOL "")
set(WITH_TESTS OFF CACHE BOOL "")
set(WITH_PYTHON_BINDINGS OFF CACHE BOOL "")
set(WITH_DOCS OFF CACHE BOOL "")
set(WITH_UTILS ON CACHE BOOL "")
set(WITH_TUTORIALS OFF CACHE BOOL "")

Then run cmake and configure using this file:

cmake -C build_options.cmake -S source -B build
cmake --build build --config Release

@derek-gerstmann
Copy link
Contributor Author

derek-gerstmann commented Sep 3, 2025

To reproduce the LibTool warnings for MacOS when bundling the llvm static libs on main, just add the Halide_BUNDLE_STATIC flag ) to the build options:

# save into build_options.cmake
set(CMAKE_BUILD_TYPE "Release" CACHE STRING "")
set(BUILD_SHARED_LIBS OFF CACHE BOOL "")
set(Halide_BUNDLE_STATIC ON CACHE BOOL "")
set(Halide_SHARED_LLVM OFF CACHE BOOL "")
set(Halide_ENABLE_EXCEPTIONS ON CACHE BOOL "")
set(Halide_ENABLE_RTTI ON CACHE BOOL "")
set(WITH_TESTS OFF CACHE BOOL "")
set(WITH_PYTHON_BINDINGS OFF CACHE BOOL "")
set(WITH_DOCS OFF CACHE BOOL "")
set(WITH_UTILS ON CACHE BOOL "")

Then run cmake and configure using this file:

cmake -C build_options.cmake -S source -B build
cmake --build build --config Release

This will link, but produces the warnings listed above. On Linux, this will error out during cmake config when compiling with clang.

@alexreinking
Copy link
Member

alexreinking commented Sep 3, 2025

In particular, the config option Halide_BUNDLE_LLVM got removed, and it was renamed to Halide_BUNDLE_STATIC, and particular assumptions were made for MacOS and Linux build environments which were different than before. Since cmake silently ignores any unknown config option, and Halide_BUNDLE_STATIC is not tested in a build_bot configuration, this went unnoticed. The end result is that cmake builds would succeed, but the resulting libHalide.a would have undefined symbols for all of LLVM.

Renaming the option was justified since the refactor enabled bundling more than just LLVM (we also bundle flatbuffers and wabt).

When I investigated this, and enabled the correct Halide_BUNDLE_STATIC option, on MacOS libtool now warns about duplicate symbols ... Since Libtool only uses the base name of objects for its members, this can lead to ambiguity and potential overwriting or incorrect linking within the resulting archive.

I see the duplicate object warnings, too. I believe we should take them seriously. However, I am not able to produce a real link error locally. I have included another experiment below.

On Linux, the current Halide_BUNDLE_STATIC does not support LLVM, and errors out immediately.

I am also unable to reproduce this. The same experiment below works on my Ubuntu 24.04 setup.

Using Libtool also produces a libHalide.a that includes EVERYTHING from ALL llvm static libs, regardless of whether they are being used or not, and places the burden on the downstream linker to resolve all internal references, and strip all unused symbols.

Aside: this was also the case with the old BundleStatic implementation. It unpacked every dependent LLVM static library and included the objects in libHalide.a.

This PR reworks all the tooling for bundle_static so that there aren't any collisions between object files, and all internal references are resolved so that the final libHalide.a contains only the symbols required, reducing file size by >50MB, and reducing link times for downstream clients.

That sounds great! However, it's not clear to me what (if any) additional toolchain constraints relocatable objects have. Are they like LTO where only the same toolchain can use them? The docs for ld suggest it works only for a subsequent link for the same ld.

None of these questions matter if we upstream the hook I proposed and you inject the script in your build process.


Here's how I attempted to reproduce issues with the existing BundleStatic implementation. This is done on the current main with LLVM 21.1.0 that I built myself (stripped down, no zlib/zstd/other external dependencies) and installed in ~/dev/_llvm/21.

Now here's a bash script:

#!/usr/bin/env bash

set -eo pipefail

rm -rf build/static build/jit build/aot

export CMAKE_GENERATOR=Ninja
export CMAKE_BUILD_TYPE=RelWithDebInfo
export Halide_ROOT=$PWD/build/local
export Halide_LLVM_ROOT=$HOME/dev/_llvm/21

cmake -S . -B build/static \
  -DCMAKE_INSTALL_PREFIX="$Halide_ROOT" \
  -DBUILD_SHARED_LIBS=NO \
  -DHalide_BUNDLE_STATIC=ON \
  -DWITH_AUTOSCHEDULERS=NO \
  -DWITH_TESTS=NO \
  -DWITH_TUTORIALS=NO \
  -DWITH_UTILS=NO

cmake --build build/static --target install --verbose

cmake -S test/integration/jit -B build/jit
cmake --build build/jit
ctest --test-dir build/jit --output-on-failure

cmake -S test/integration/aot -B build/aot
cmake --build build/aot
ctest --test-dir build/aot --output-on-failure

When I run this, I see no linker errors on either macOS or Ubuntu 24.04 (using system GCC 13.3.0). Both CTest runs succeed.

@derek-gerstmann
Copy link
Contributor Author

On Linux, the current Halide_BUNDLE_STATIC does not support LLVM, and errors out immediately.

I am also unable to reproduce this. The same experiment below works on my Ubuntu 24.04 setup.

Building on Linux with LLVM / Clang fails when Halide_BUNDLE_STATIC=ON:


[2025-09-02T20:55:49.417Z] [MB: 7101ms][DEBUG] [13]CMake Error at cmake/BundleStatic.cmake:123 (message):
[2025-09-02T20:55:49.417Z] [MB: 7102ms][DEBUG] [13]-- Configuring incomplete, errors occurred!
[2025-09-02T20:55:49.417Z] [MB: 7103ms][DEBUG] [13]bundle_static_libs not implemented for the present toolchain
[2025-09-02T20:55:49.417Z] [MB: 7105ms][DEBUG] [13]Call Stack (most recent call first):
[2025-09-02T20:55:49.417Z] [MB: 7106ms][DEBUG] [13]src/CMakeLists.txt:491 (bundle_static)

That sounds great! However, it's not clear to me what (if any) additional toolchain constraints relocatable objects have. Are they like LTO where only the same toolchain can use them? The docs for ld suggest it works only for a subsequent link for the same ld.

The usage for "partial linking" is at the final link stage for producing a lib/binary, so that object files are merged and linked based on internal references. The distributed lib/binary should have no inherent restrictions.

None of these questions matter if we upstream the hook I proposed and you inject the script in your build process.

This would work, but I'd prefer if we didn't settle on using an escape hatch, and instead have a tested build_bot config that uses Halide_BUNDLE_STATIC so we can catch issues immediately, rather than downstream.

@alexreinking
Copy link
Member

Building on Linux with LLVM / Clang fails when Halide_BUNDLE_STATIC=ON:

I just repeated by experiment, patching |LLVM into the check version_info MATCHES "GNU|LLVM" in BundleStatic.cmake. The experiment now succeeds with Clang 19 (as obtained from apt.llvm.org).

This would work, but I'd prefer if we didn't settle on using an escape hatch, and instead have a tested build_bot config that uses Halide_BUNDLE_STATIC so we can catch issues immediately, rather than downstream.

I agree that we should test this if we're going to support it. However, until we can figure out how to reproduce the errors you're seeing, we wouldn't have caught this issue immediately. An escape hatch lets us kick the can with a minimally invasive patch until we know how to test this.

@derek-gerstmann
Copy link
Contributor Author

derek-gerstmann commented Sep 3, 2025

This PR was intended to improve and consolidate the MacOS and Linux static bundling. The bash script was simply a means to an end and easier for me to test. We could do everything in cmake if preferred.

In it's current form, on MacOS, LibTool produces warnings which indicate we're doing something potentially bad, and the libHalide.a has unresolved symbols but contains all of LLVM so they can be resolved by the linker later. Downstream, linking the resulting lib appears to work. On Linux using LLVM/Clang, the toolchain check in build_static errors out so I haven't been able to verify if the libs still link, and they use a different mechanism for bundling (mri).

@alexreinking
Copy link
Member

on MacOS, LibTool produces warnings which indicate we're doing something potentially bad

I probed this with a minimal case: Apple's archives are insensitive to duplicate member names, so the warning points at potential ODR hazards, not unresolved symbols. The renaming step appears unnecessary with modern libtool.

#!/bin/bash

set -exo pipefail

mkdir -p a b

echo 'int foo() {return 1;}' | clang -c -x c -o a/dup.o -
echo 'int bar() {return 2;}' | clang -c -x c -o b/dup.o -

libtool -static -o libdup.a a/dup.o b/dup.o
nm libdup.a

cat >main.c <<HERE
#include <stdio.h>
extern int foo();
extern int bar();
int main () { printf("foo = %d\nbar = %d\n", foo(), bar()); }
HERE

cc -o main main.c libdup.a
./main

Here's the output of this script:

% ./test.sh 
+ mkdir -p a b
+ echo 'int foo() {return 1;}'
+ clang -c -x c -o a/dup.o -
+ echo 'int bar() {return 2;}'
+ clang -c -x c -o b/dup.o -
+ libtool -static -o libdup.a a/dup.o b/dup.o
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning duplicate member name 'dup.o' from 'a/dup.o(dup.o)' and 'b/dup.o(dup.o)'
+ nm libdup.a

dup.o:
0000000000000000 T _foo
0000000000000000 t ltmp0
0000000000000008 s ltmp1

dup.o:
0000000000000000 T _bar
0000000000000000 t ltmp0
0000000000000008 s ltmp1
+ cat
+ cc -o main main.c libdup.a
+ ./main
foo = 1
bar = 2

and the libHalide.a has unresolved symbols but contains all of LLVM so they can be resolved by the linker later.

I think "has unresolved symbols" continues to be a red herring. nm reporting that individual object members in the archive have undefined symbols is normal and expected for static libraries.

@derek-gerstmann
Copy link
Contributor Author

I think "has unresolved symbols" continues to be a red herring. nm reporting that individual object members in the archive have undefined symbols is normal and expected for static libraries.

The "unresolved symbols" indicate that object files haven't been linked together and still need to get resolved. LibTool just merges the archives, which means all of LLVM gets unnecessarily bundled into the libHalide.a that we distribute which increases binary size for our distribution. This also places the burden on the downstream consumer to resolve these unresolved symbols from within a library we provided, which doesn't seem ideal. It also means that linking is more expensive.

@alexreinking
Copy link
Member

The "unresolved symbols" indicate that object files haven't been linked together and still need to get resolved. LibTool just merges the archives,

Undefined symbols inside members are normal and only get resolved at the final link, so I don’t see that as a problem in itself.

which means all of LLVM gets unnecessarily bundled into the libHalide.a that we distribute which increases binary size for our distribution. This also places the burden on the downstream consumer to resolve these unresolved symbols from within a library we provided, which doesn't seem ideal. It also means that linking is more expensive.

It's a bit vacuous, but we can get smaller archives and faster linking by not shipping libHalide.a at all and advising downstreams to link to libHalide.so / libHalide.dylib. 🙂

But stepping back, I think I'm confused. Are we still trying to fix a linker error?

@derek-gerstmann
Copy link
Contributor Author

derek-gerstmann commented Sep 3, 2025

But stepping back, I think I'm confused. Are we still trying to fix a linker error?

On main: I can't test Linux when compiling LLVM/Clang since I hit the cmake error for mismatched toolchain. So please add your patch for version_info MATCHES "GNU|LLVM" so we can test this downstream. On MacOS, downstream linkage worked, but Libtool produced warnings.

This PR: Was an improvement to consolidate Linux & MacOS bundle_static which reduced binary size with a pre-linked lib.

@alexreinking
Copy link
Member

On main: I can't test Linux when compiling LLVM/Clang since I hit the cmake error for mismatched toolchain. So please add your patch for version_info MATCHES "GNU|LLVM" so we can test this downstream.

Done. See #8799

On MacOS, downstream linkage worked, but Libtool produced warnings.

Okay, so it's working. We've seen that the warnings libtool produces can be safely ignored.

This PR: Was an improvement to consolidate Linux & MacOS bundle_static which reduced binary size with a pre-linked lib.

If it's just an optimization, we should hold off merging until we have a testing plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants