-
Notifications
You must be signed in to change notification settings - Fork 47
Metal 4 #612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Metal 4 #612
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #612 +/- ##
==========================================
- Coverage 80.76% 75.78% -4.99%
==========================================
Files 61 68 +7
Lines 2693 2907 +214
==========================================
+ Hits 2175 2203 +28
- Misses 518 704 +186 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
#default_storage = "private" | ||
|
||
# when false, always use metal 3 even on a metal 4 platform | ||
#force_metal3 = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about a api_version
preference, determined automatically if unset, but overridable by setting "3"
or "4"
?
I'd also use string values; literals have given me issues in the past (if there's a typo in the user-provided TOML, stuff breaks).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted about the string values.
Metal 3 can be used alongside Metal 4, so maybe we should call it default_api_version
to make it clear that all it's doing is selecting the default API?
I think maybe that this preference should only affect whether global_queue
returns an MTLCommandQueue
or an MTL4CommandQueue
, and everything else that takes in a queue
should dispatch on the correct implementation based on which type was provided?
# command allocator | ||
# | ||
|
||
export MTL4CommandAllocator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How large is the Metal 4 API? Should we put it in a mtl4
top-level folder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of the Metal 4 API lives as new methods of Metal 3 Objects so maybe a mtl4 subfolder in lib/mtl would be more appropriate?
# Copy Operations (Blit functionality integrated into compute encoder in Metal 4) | ||
# function append_copy!(cce::MTL4ComputeCommandEncoder, dst::MTLBuffer, dstOffset::Integer, | ||
# src::MTLBuffer, srcOffset::Integer, size::Integer) | ||
# @objc [cce::id{MTL4ComputeCommandEncoder} copyFromBuffer:src::id{MTLBuffer} | ||
# sourceOffset:srcOffset::NSUInteger | ||
# toBuffer:dst::id{MTLBuffer} | ||
# destinationOffset:dstOffset::NSUInteger | ||
# size:size::NSUInteger]::Nothing | ||
# end | ||
|
||
# function append_copy!(cce::MTL4ComputeCommandEncoder, dst::MTLTexture, dstSlice::Integer, dstLevel::Integer, dstOrigin::MTLOrigin, | ||
# src::MTLBuffer, srcOffset::Integer, srcBytesPerRow::Integer, srcBytesPerImage::Integer, | ||
# size::MTLSize) | ||
# @objc [cce::id{MTL4ComputeCommandEncoder} copyFromBuffer:src::id{MTLBuffer} | ||
# sourceOffset:srcOffset::NSUInteger | ||
# sourceBytesPerRow:srcBytesPerRow::NSUInteger | ||
# sourceBytesPerImage:srcBytesPerImage::NSUInteger | ||
# sourceSize:size::MTLSize | ||
# toTexture:dst::id{MTLTexture} | ||
# destinationSlice:dstSlice::NSUInteger | ||
# destinationLevel:dstLevel::NSUInteger | ||
# destinationOrigin:dstOrigin::MTLOrigin]::Nothing | ||
# end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So these can be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No they exist with the new MTL4ComputerCommandEncoder
. I just wanted to lay the groundwork for custom kernels and then one piece of functionality using Metal 4 (unsafe_fill!
is what I landed on). This code hasn't been tested or anything but once synchronization is figured out I'll work on the rest of the Metal 4 MtlArray functionality.
|
||
@autoreleasepool function (kernel::HostKernel)(args...; groups=1, threads=1, | ||
queue=global_queue(device())) | ||
queue=use_metal4() ? global_queue4(device()) : global_queue(device())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
global_queue4
is weird, and I'd rather not have to do this dance everywhere. I'd rather we can use these regardless of the API version, so what about having global_queue(::MTLDevice)
return an appropriate object given the selected API? Given that preferences are determined at compile time, that should also be type stable.
src/compiler/execution.jl
Outdated
if !use_mtl4 | ||
roots = [kernel.f, args] | ||
MTL.on_completed(cmdbuf) do buf | ||
empty!(roots) | ||
foreach(free, argument_buffers) | ||
|
||
# Check for errors | ||
# XXX: we cannot do this nicely, e.g. throwing an `error` or reporting with `@error` | ||
# because we're not allowed to switch tasks from this contexts. | ||
if buf.status == MTL.MTLCommandBufferStatusError | ||
Core.println("ERROR: Failed to submit command buffer: $(buf.error.localizedDescription)") | ||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need an equivalent for this, or Julia can early-free resources.
Related:
Unlike the default behavior of MTLCommandBuffer, you may need to consider a resource’s retain count because each MTL4CommandBuffer instance doesn’t create strong references to resources. This is similar to creating an MTLCommandBuffer with the makeCommandBufferWithUnretainedReferences() method of an MTLCommandQueue.
cce = MTLComputeCommandEncoder(cmdbuf) | ||
if use_mtl4 | ||
allocator = MTL4CommandAllocator(device()) | ||
cmdbuf = MTL4CommandBuffer(device()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can reuse and repurpose each command buffer indefinitely by starting over, encoding new commands, and committing it again, instead of allocating a new buffer.
As a future TODO, we could probably use a task-local command buffer just like we have task-local queues now.
const use_metal4 = OncePerProcess{Bool}() do | ||
dev = device() | ||
force_metal3 = @load_preference("force_metal3", false) | ||
return functional() && !force_metal3 && supports_family(dev, MTL.MTLGPUFamilyMetal4) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd make this api_version::Int
then, parsing the api_version
String preferences here.
src/state.jl
Outdated
device!(dev::MTLDevice) = task_local_storage(:MTLDevice, dev) | ||
|
||
const global_queues = WeakKeyDict{MTLCommandQueue,Nothing}() | ||
const global_queues4 = WeakKeyDict{MTL4CommandQueue,Nothing}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the preference being compile-time determined, we don't need multiple of these.
If necessary (e.g. for type definitions or macro expansions), you can use @static if
. Otherwise a plain top-level if api_version == ...
should work.
Update as of 07/29 (also added to todo):
MTLSharedEvent
s (Synchronize usingMTLSharedEvent
s #633).To actually test the Metal 4 stuff, you need to be on the macOS 26 developer beta, and be using ObjectiveC.jl >
v3.4.2
For 8-bit integer fills, it works for smaller vectors, but not for bigger ones. I think it's because I can't figure out how to do synchronization like we do with Metal3.
For custom kernels, I don't know if the only issue is synchronization or if there's something else that's broken.
TODO:
MTLSharedEvent
s (Synchronize usingMTLSharedEvent
s #633)I'm kind of leaving this here for feedback/suggestions/help.