-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[bundle_static] Rework tooling to resolve internal references. #8796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
and do a partial link to resolve all internal reference.
(I have detailed this experiment in other forums, but I am reproducing it here for open-source visibility) Consider the following source files: // a.c
extern int b();
int a() { return b() + 1; } // b.c
int b () { return 41; } // main.c
#include <stdio.h>
extern int a();
int main() { printf("%d\n", a()); } Clearly, Let's first try compiling them separately as two static libraries: $ cc -o a.o -c a.c
$ libtool -static -o liba.a a.o
$ cc -o b.o -c b.c
$ libtool -static -o libb.a b.o If we look at the output of
Sure enough, it is insufficient to link to just $ cc -o main-bad main.c liba.a
Undefined symbols for architecture arm64:
"_b", referenced from:
_a in liba.a[2](a.o)
ld: symbol(s) not found for architecture arm64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [main-bad] Error 1 But linking to $ cc -o main-both main.c liba.a libb.a
$ ./main-both
42 Now we ask, "how can we bundle these static libraries together?" One option is to use $ libtool -static -o libbundled.a liba.a libb.a And we can see that this produces a library that links and runs: $ cc -o main-bundled main.c libbundled.a
$ ./main-bundled
42 Another approach would be to unpack and repack the object files directly. We already have the object files lying around, so I'll just put them together directly: $ libtool -static -o liball.a a.o b.o Unsurprisingly, this links and runs, too: $ cc -o main-all main.c liball.a
$ ./main-all
42 But now let's look at the output of $ nm -u libbundled.a
a.o:
_b
b.o:
$ nm -u liball.a
a.o:
_b
b.o: The output is identical! And it looks like Therefore, I don't believe the build issue this PR means to solve has to do with "internal references". The purpose of The proposed solution here is very complex and uses features (namely relocatable objects via Here are some additional details about my setup: A Makefile to help reproduce the aboveLIBTOOL ?= libtool
.PHONY: all
all: main-bundled main-all
main-bundled: main.c libbundled.a
$(CC) -o $@ $^
main-all: main.c liball.a
$(CC) -o $@ $^
main-bad: main.c liba.a
$(CC) -o $@ $^
main-both: main.c liba.a libb.a
$(CC) -o $@ $^
%.o: %.c
$(CC) -o $@ -c $^
lib%.a: %.o
$(LIBTOOL) -static -o $@ $^
libbundled.a: liba.a libb.a
$(LIBTOOL) -static -o $@ $^
liball.a: a.o b.o
$(LIBTOOL) -static -o $@ $^
.PHONY: clean
clean:
$(RM) liba.a libb.a a.o b.o libbundled.a liball.a main-both main-bundled main-all Versions of the tools I'm using$ xcodebuild -version
Xcode 16.4
Build version 16F6
$ nm -V
llvm-nm, compatible with GNU nm
Apple LLVM version 17.0.0
Optimized build.
$ ld -v
@(#)PROGRAM:ld PROJECT:ld-1167.5
BUILD 01:45:05 Apr 30 2025
configured to support archs: armv6 armv7 armv7s arm64 arm64e arm64_32 i386 x86_64 x86_64h armv6m armv7k armv7m armv7em
will use ld-classic for: armv6 armv7 armv7s i386 armv6m armv7k armv7m armv7em
LTO support using: LLVM version 17.0.0 (static support for 29, runtime is 29)
TAPI support using: Apple TAPI version 17.0.0 (tapi-1700.0.3.5)
$ libtool -V
Apple Inc. version cctools-1024.3
$ cc --version
Apple clang version 17.0.0 (clang-1700.0.13.5)
Target: arm64-apple-darwin24.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin Relatedly, we need to decide on an Xcode version support policy. On Linux, our policy is "whatever ships with the latest Ubuntu LTS" (13, but 14 is available). And we don't have a great policy for MSVC. I propose that we require the newest Xcode that runs on the oldest supported macOS version. Apple is very aggressive about requiring developers to upgrade to new Xcode versions (anything published to the App Store must be built with the very newest version). According to endoflife.date, macOS 13 is the oldest supported version. According to Apple, Xcode 15.2 is the newest supported version of Xcode on macOS 13. Therefore, I would like a reproducer that works on Xcode 15.2. All this said, I want to get you unstuck. Would this patch work instead? If it does, I would be glad to merge it. diff --git a/cmake/BundleStatic.cmake b/cmake/BundleStatic.cmake
index dec8a901c..fe91a2aec 100644
--- a/cmake/BundleStatic.cmake
+++ b/cmake/BundleStatic.cmake
@@ -74,6 +74,19 @@ function(bundle_static TARGET)
VERBATIM
)
+ # If the user has specified a custom bundler, use it.
+ set(BundleStatic_TOOL "" CACHE FILEPATH "A custom tool to use to bundle static libs")
+ mark_as_advanced(BundleStatic_TOOL)
+ if (BundleStatic_TOOL)
+ add_custom_command(
+ TARGET "${TARGET}" POST_BUILD
+ COMMAND "${BundleStatic_TOOL}" "$<TARGET_FILE:${TARGET}>" "$<TARGET_FILE:${TARGET}>.tmp" "${cmd}"
+ COMMAND_EXPAND_LISTS
+ VERBATIM
+ )
+ return()
+ endif ()
+
# Finally merge everything together using the platform tool.
find_program(LIB lib.exe HINTS "${CMAKE_AR}")
if (WIN32 AND LIB) Then you could set FWIW, I think of static-lib bundling as more of a hack than a portable primitive. Static libraries are toolchain artifacts, not standardized formats, so rebundling them is brittle and invites trouble. I'd rather treat it as an escape-hatch hook, as this patch does. If the real goal is dependency management, using Halide as a shared library is well supported everywhere and sidesteps these pitfalls cleanly. |
The root cause of the breakage for bundling LLVM into our static libs was the refactoring that was done here: In particular, the config option When I investigated this, and enabled the correct
Since Libtool only uses the base name of objects for its members, this can lead to ambiguity and potential overwriting or incorrect linking within the resulting archive. Using Libtool also produces a libHalide.a that includes EVERYTHING from ALL llvm static libs, regardless of whether they are being used or not, and places the burden on the downstream linker to resolve all internal references, and strip all unused symbols. On Linux, the current This PR reworks all the tooling for |
What should I do to repro the linux failure? |
To reproduce the missing symbols for all of LLVM when building a static lib on main, you can use the following build options (compiling with clang on Linux/MacOS):
Then run cmake and configure using this file:
|
To reproduce the LibTool warnings for MacOS when bundling the llvm static libs on main, just add the
Then run cmake and configure using this file:
This will link, but produces the warnings listed above. On Linux, this will error out during cmake config when compiling with clang. |
Renaming the option was justified since the refactor enabled bundling more than just LLVM (we also bundle flatbuffers and wabt).
I see the duplicate object warnings, too. I believe we should take them seriously. However, I am not able to produce a real link error locally. I have included another experiment below.
I am also unable to reproduce this. The same experiment below works on my Ubuntu 24.04 setup.
Aside: this was also the case with the old BundleStatic implementation. It unpacked every dependent LLVM static library and included the objects in
That sounds great! However, it's not clear to me what (if any) additional toolchain constraints relocatable objects have. Are they like LTO where only the same toolchain can use them? The docs for None of these questions matter if we upstream the hook I proposed and you inject the script in your build process. Here's how I attempted to reproduce issues with the existing BundleStatic implementation. This is done on the current Now here's a bash script: #!/usr/bin/env bash
set -eo pipefail
rm -rf build/static build/jit build/aot
export CMAKE_GENERATOR=Ninja
export CMAKE_BUILD_TYPE=RelWithDebInfo
export Halide_ROOT=$PWD/build/local
export Halide_LLVM_ROOT=$HOME/dev/_llvm/21
cmake -S . -B build/static \
-DCMAKE_INSTALL_PREFIX="$Halide_ROOT" \
-DBUILD_SHARED_LIBS=NO \
-DHalide_BUNDLE_STATIC=ON \
-DWITH_AUTOSCHEDULERS=NO \
-DWITH_TESTS=NO \
-DWITH_TUTORIALS=NO \
-DWITH_UTILS=NO
cmake --build build/static --target install --verbose
cmake -S test/integration/jit -B build/jit
cmake --build build/jit
ctest --test-dir build/jit --output-on-failure
cmake -S test/integration/aot -B build/aot
cmake --build build/aot
ctest --test-dir build/aot --output-on-failure When I run this, I see no linker errors on either macOS or Ubuntu 24.04 (using system GCC 13.3.0). Both CTest runs succeed. |
Building on Linux with LLVM / Clang fails when
The usage for "partial linking" is at the final link stage for producing a lib/binary, so that object files are merged and linked based on internal references. The distributed lib/binary should have no inherent restrictions.
This would work, but I'd prefer if we didn't settle on using an escape hatch, and instead have a tested build_bot config that uses |
I just repeated by experiment, patching
I agree that we should test this if we're going to support it. However, until we can figure out how to reproduce the errors you're seeing, we wouldn't have caught this issue immediately. An escape hatch lets us kick the can with a minimally invasive patch until we know how to test this. |
This PR was intended to improve and consolidate the MacOS and Linux static bundling. The bash script was simply a means to an end and easier for me to test. We could do everything in cmake if preferred. In it's current form, on MacOS, LibTool produces warnings which indicate we're doing something potentially bad, and the libHalide.a has unresolved symbols but contains all of LLVM so they can be resolved by the linker later. Downstream, linking the resulting lib appears to work. On Linux using LLVM/Clang, the toolchain check in build_static errors out so I haven't been able to verify if the libs still link, and they use a different mechanism for bundling (mri). |
I probed this with a minimal case: Apple's archives are insensitive to duplicate member names, so the warning points at potential ODR hazards, not unresolved symbols. The renaming step appears unnecessary with modern #!/bin/bash
set -exo pipefail
mkdir -p a b
echo 'int foo() {return 1;}' | clang -c -x c -o a/dup.o -
echo 'int bar() {return 2;}' | clang -c -x c -o b/dup.o -
libtool -static -o libdup.a a/dup.o b/dup.o
nm libdup.a
cat >main.c <<HERE
#include <stdio.h>
extern int foo();
extern int bar();
int main () { printf("foo = %d\nbar = %d\n", foo(), bar()); }
HERE
cc -o main main.c libdup.a
./main Here's the output of this script: % ./test.sh
+ mkdir -p a b
+ echo 'int foo() {return 1;}'
+ clang -c -x c -o a/dup.o -
+ echo 'int bar() {return 2;}'
+ clang -c -x c -o b/dup.o -
+ libtool -static -o libdup.a a/dup.o b/dup.o
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning duplicate member name 'dup.o' from 'a/dup.o(dup.o)' and 'b/dup.o(dup.o)'
+ nm libdup.a
dup.o:
0000000000000000 T _foo
0000000000000000 t ltmp0
0000000000000008 s ltmp1
dup.o:
0000000000000000 T _bar
0000000000000000 t ltmp0
0000000000000008 s ltmp1
+ cat
+ cc -o main main.c libdup.a
+ ./main
foo = 1
bar = 2
I think "has unresolved symbols" continues to be a red herring. |
The "unresolved symbols" indicate that object files haven't been linked together and still need to get resolved. LibTool just merges the archives, which means all of LLVM gets unnecessarily bundled into the libHalide.a that we distribute which increases binary size for our distribution. This also places the burden on the downstream consumer to resolve these unresolved symbols from within a library we provided, which doesn't seem ideal. It also means that linking is more expensive. |
Undefined symbols inside members are normal and only get resolved at the final link, so I don’t see that as a problem in itself.
It's a bit vacuous, but we can get smaller archives and faster linking by not shipping But stepping back, I think I'm confused. Are we still trying to fix a linker error? |
On main: I can't test Linux when compiling LLVM/Clang since I hit the cmake error for mismatched toolchain. So please add your patch for This PR: Was an improvement to consolidate Linux & MacOS bundle_static which reduced binary size with a pre-linked lib. |
Done. See #8799
Okay, so it's working. We've seen that the warnings
If it's just an optimization, we should hold off merging until we have a testing plan. |
Rework bundle_static to extract all object files from all static libs and do a partial link to resolve all internal references.
Added new tools/bundle_static.sh script for MacOS and Linux, which replaces the calls to
libtool
andmir
in BundleStatic.cmake.