Skip to content

Conversation

@imciner2
Copy link
Contributor

These APIs are new in macOS 15, but only allow saying if you want single threaded or multi-threaded operation, not specifying the actual number of threads to use in a multi-threaded setup.

There is a symbol called APPLE_NTHREADS in the accelerate library, which might be able to say how many threads it will use in multi-threaded mode? However, I can't find any information about that anywhere online (just it listed in various symbol tables for the library). If someone with a macOS 15+ machine can examine that symbol to see what it does/looks like, we could update the TODO in this PR potentially.

This is the LBT implementation of JuliaLinearAlgebra/AppleAccelerate.jl#88. Fixes #115.

cc @ViralBShah @giordano

@imciner2 imciner2 added the enhancement New feature or request label Oct 18, 2025
These APIs are new in macOS 15, but only allow saying if you want single
threaded or multi-threaded operation, not specifying the actual number
of threads to use in a multi-threaded setup.
@ViralBShah
Copy link
Member

How would I examine the value of such a variable?

@imciner2
Copy link
Contributor Author

I think it might be a function. I was just talking with @giordano and he tried this on a machine he has with 16 cores and got this:

julia> @ccall AppleAccelerate.libacc.APPLE_NTHREADS()::Int
16

If you can give that a try and see what it says, that would be nice.

@ViralBShah
Copy link
Member

ViralBShah commented Oct 20, 2025

I am on an M2 Max, and this is what I get:

julia> @ccall AppleAccelerate.libacc.APPLE_NTHREADS()::Int
12

julia> Sys.CPU_THREADS
8

Do we know when it was introduced?

@giordano
Copy link
Collaborator

For the record, I tested on M4 Max

@imciner2
Copy link
Contributor Author

Do we know when it was introduced?

Nope, I haven't been able to find anything about it other than a bunch of tbd symbol lists on GitHub, and one person who wrote a stub function for it. It was basically a shot in the dark that we could even call it properly.

In LBT we have to lookup the symbol in the symbol table anyway, so we should be safe to use it no matter the macOS version (basically just call it if we find it, otherwise report a default value).

@giordano
Copy link
Collaborator

Uhm, we should be able to find out by bisecting the SDKs hosted on GitHub.

@giordano
Copy link
Collaborator

giordano commented Oct 20, 2025

I am on an M2 Max, and this is what I get:

julia> @ccall AppleAccelerate.libacc.APPLE_NTHREADS()::Int
12

julia> Sys.CPU_THREADS
8

I'd need to double check, but I seem to remember our Sys.CPU_THREADS shows only the performance cores. According to Wikipedia, the M2 Max has 8 performance cores + 4 efficiency ones, for a total of 12, which is what APPLE_NTHREADS seems to return. Similarly, the M4 Max has 12 performance cores + 4 efficiency ones, and I get

julia> Sys.CPU_THREADS
12

julia> @ccall "/System/Library/Frameworks/Accelerate.framework/Accelerate".APPLE_NTHREADS()::Int
16

@imciner2
Copy link
Contributor Author

imciner2 commented Oct 20, 2025

I am on an M2 Max, and this is what I get:

julia> @ccall AppleAccelerate.libacc.APPLE_NTHREADS()::Int
12

julia> Sys.CPU_THREADS
8

Do we know when it was introduced?

According to Wikipedia, the M2 Max has 8 performance cores and 4 efficiency cores (although I think ARM calls them big/little). So the Sys.CPU_THREADS seems to say the number of big cores, but APPLE_NTHREADS seems to say the total number of cores (both big and little).

@ViralBShah
Copy link
Member

htop shows 12 cores.

@giordano
Copy link
Collaborator

Uhm, we should be able to find out by bisecting the SDKs hosted on GitHub.

% git clone --depth=1 https://github.com/nickfnblum/MacOSX-SDKs
% grep -r '_APPLE_NTHREADS' --include='*.tbd' MacOSX-SDKs 
MacOSX-SDKs/MacOSX11.3.sdk/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/vecLib.tbd:                       _APPLE_NTHREADS, _ATLU_DestroyThreadMemory, _BLASStateRelease, 
MacOSX-SDKs/MacOSX11.3.sdk/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.tbd:                       _APPLE_NTHREADS, _ATLU_DestroyThreadMemory, _BLASStateRelease, 
MacOSX-SDKs/MacOSX11.3.sdk/System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate.tbd:                       _APPLE_NTHREADS, _ATLU_DestroyThreadMemory, _BLASStateRelease, 
MacOSX-SDKs/MacOSX11.1.sdk/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/vecLib.tbd:                       _APPLE_NTHREADS, _ATLU_DestroyThreadMemory, _BLASStateRelease, 
MacOSX-SDKs/MacOSX11.1.sdk/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.tbd:                       _APPLE_NTHREADS, _ATLU_DestroyThreadMemory, _BLASStateRelease, 
MacOSX-SDKs/MacOSX11.1.sdk/System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate.tbd:                       _APPLE_NTHREADS, _ATLU_DestroyThreadMemory, _BLASStateRelease, 
MacOSX-SDKs/MacOSX11.0.sdk/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/vecLib.tbd:                       _APPLE_NTHREADS, _ATLU_DestroyThreadMemory, _BLASStateRelease, 
MacOSX-SDKs/MacOSX11.0.sdk/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.tbd:                       _APPLE_NTHREADS, _ATLU_DestroyThreadMemory, _BLASStateRelease, 
MacOSX-SDKs/MacOSX11.0.sdk/System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate.tbd:                       _APPLE_NTHREADS, _ATLU_DestroyThreadMemory, _BLASStateRelease, 
MacOSX-SDKs/MacOSX10.15.sdk/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/vecLib.tbd:                       _APPLE_NTHREADS, _ATLU_DestroyThreadMemory, _CAXPY, _CAXPY_, 
MacOSX-SDKs/MacOSX10.15.sdk/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.tbd:                       _APPLE_NTHREADS, _ATLU_DestroyThreadMemory, _CAXPY, _CAXPY_, 
MacOSX-SDKs/MacOSX10.15.sdk/System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate.tbd:                       _APPLE_NTHREADS, _ATLU_DestroyThreadMemory, _CAXPY, _CAXPY_, 
MacOSX-SDKs/MacOSX10.14.sdk/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.tbd:                       _APL_sgemm_LU, _APL_sgemm_QR, _APL_strsm, _APPLE_NTHREADS, 
MacOSX-SDKs/MacOSX10.13.sdk/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.tbd:                       _APL_sgemm_LU, _APL_sgemm_QR, _APL_strsm, _APPLE_NTHREADS, 
MacOSX-SDKs/MacOSX10.12.sdk/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.tbd:                       _APL_sgemm_LU, _APL_sgemm_QR, _APL_strsm, _APPLE_NTHREADS, 
MacOSX-SDKs/MacOSX10.11.sdk/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.tbd:                       _APPLE_NTHREADS, _ATLU_DestroyThreadMemory, _CAXPY, 
MacOSX-SDKs/MacOSX10.10.sdk/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.tbd:                       _APPLE_NTHREADS, _ATLU_DestroyThreadMemory, _CAXPY, 

I'd say this was introduced in macOS 10.10.

@ViralBShah
Copy link
Member

ViralBShah commented Oct 20, 2025

Mirrored APPLE_NTHREADS in AppleAccelerate: JuliaLinearAlgebra/AppleAccelerate.jl#89

src/threading.c Outdated
if(nthreads == ACCELERATE_BLAS_THREADING_MULTI_THREADED) {
// This number is arbitrary right now, but greater than 1 to mean multi-threaded.
// TODO: Can we guestimate the number of threads from the APPLE_NTHREADS symbol in accelerate?
max_threads = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with APPLE_NTHREADS in AppleAccelerate.jl

@imciner2 imciner2 force-pushed the im/accelerate branch 2 times, most recently from 6e84867 to 6fee1ab Compare October 21, 2025 01:57
@imciner2
Copy link
Contributor Author

Ok, added some tests for Apple Accelerate now (both loading the symbols and calling this threading API). This should be good for review and merge.

After merge, I want to hold off tagging until PRs #168 and #167 are merged, then we can do one release with all three items. (I think I am close to finishing those).

@ViralBShah
Copy link
Member

ViralBShah commented Oct 21, 2025

What happens on macos intel, where both MKL (installed) and Accelerate are available?

@imciner2
Copy link
Contributor Author

What happens on macos intel, where both MKL (installed) and Accelerate are available?

This is dependent on if someone loads either the AppleAccelerate.jl or MKL.jl package. By default, the forwards won't be available for either of them unless they are loaded, so just loading MKL.jl would only give MKL.

Loading MKL.jl would clear all forwards, but if you load AppleAccelerate.jl later it would replace the MKL forwards it could, but leave the rest. The threading API would still call both the MKL threading functions and the Accelerate threading functions though, because we only go by what library is loaded for those (so if a library is loaded, it is searched for the threading symbol when we actually want to use it - we don't keep a symbol map of those).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Number of threads with Accelerate

3 participants