Skip to content

Conversation

@christiangnrd
Copy link
Member

Don't remove the file yet to avoid merge conflict with #627

@github-actions
Copy link
Contributor

github-actions bot commented Jul 20, 2025

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic main) to apply these changes.

Click here to view the suggested changes.
diff --git a/perf/runbenchmarks.jl b/perf/runbenchmarks.jl
index ba5e0d40..1d7901c5 100644
--- a/perf/runbenchmarks.jl
+++ b/perf/runbenchmarks.jl
@@ -1,6 +1,6 @@
 # benchmark suite execution and codespeed submission
 using Pkg
-Pkg.add(url="https://github.com/christiangnrd/GPUArrays.jl", rev="akreduce")
+Pkg.add(url = "https://github.com/christiangnrd/GPUArrays.jl", rev = "akreduce")
 
 using Metal
 
diff --git a/test/runtests.jl b/test/runtests.jl
index 4ee51134..fb376e4f 100644
--- a/test/runtests.jl
+++ b/test/runtests.jl
@@ -6,7 +6,7 @@ import REPL
 using Test
 
 using Pkg
-Pkg.add(url="https://github.com/christiangnrd/GPUArrays.jl", rev="akreduce")
+Pkg.add(url = "https://github.com/christiangnrd/GPUArrays.jl", rev = "akreduce")
 
 # Quit without erroring if Metal loaded without issues on unsupported platforms
 if !Sys.isapple()

@christiangnrd
Copy link
Member Author

Leaving the current mapreducedim! implementation present, we can transition in two parts. First once AK supports broadcasted reductions, and then remove implementations from this repo after AK supports >1 input dims.

@codecov
Copy link

codecov bot commented Jul 21, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.63%. Comparing base (1942968) to head (c0eddd1).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #628   +/-   ##
=======================================
  Coverage   80.63%   80.63%           
=======================================
  Files          61       61           
  Lines        2722     2722           
=======================================
  Hits         2195     2195           
  Misses        527      527           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metal Benchmarks

Benchmark suite Current: c0eddd1 Previous: 1942968 Ratio
latency/precompile 9830015416 ns 9844653958 ns 1.00
latency/ttfp 3989128875 ns 3972040229 ns 1.00
latency/import 1281988208 ns 1275530958.5 ns 1.01
integration/metaldevrt 830312.5 ns 828500 ns 1.00
integration/byval/slices=1 1532291.5 ns 1536750 ns 1.00
integration/byval/slices=3 8864917 ns 9632625 ns 0.92
integration/byval/reference 1535333 ns 1543583 ns 0.99
integration/byval/slices=2 2554083 ns 2621958.5 ns 0.97
kernel/indexing 582792 ns 567792 ns 1.03
kernel/indexing_checked 577208 ns 569292 ns 1.01
kernel/launch 9042 ns 9208 ns 0.98
array/construct 6125 ns 6625 ns 0.92
array/broadcast 579250 ns 583375 ns 0.99
array/random/randn/Float32 821167 ns 784333 ns 1.05
array/random/randn!/Float32 622625 ns 623250 ns 1.00
array/random/rand!/Int64 555395.5 ns 547458 ns 1.01
array/random/rand!/Float32 584125 ns 585291 ns 1.00
array/random/rand/Int64 777375 ns 771250 ns 1.01
array/random/rand/Float32 628375 ns 622687 ns 1.01
array/accumulate/Int64/1d 1261292 ns 1277104.5 ns 0.99
array/accumulate/Int64/dims=1 1800500 ns 1868333 ns 0.96
array/accumulate/Int64/dims=2 2165958.5 ns 2183625 ns 0.99
array/accumulate/Int64/dims=1L 11643104 ns 11737104 ns 0.99
array/accumulate/Int64/dims=2L 9718917 ns 9771416.5 ns 0.99
array/accumulate/Float32/1d 1141375 ns 1142833 ns 1.00
array/accumulate/Float32/dims=1 1562333.5 ns 1570458 ns 0.99
array/accumulate/Float32/dims=2 1865875 ns 1931625 ns 0.97
array/accumulate/Float32/dims=1L 9890916.5 ns 9864375 ns 1.00
array/accumulate/Float32/dims=2L 7298500 ns 7308021 ns 1.00
array/reductions/reduce/Int64/1d 1077583 ns 1373353.5 ns 0.78
array/reductions/reduce/Int64/dims=1 987500 ns 1069291.5 ns 0.92
array/reductions/reduce/Int64/dims=2 935145.5 ns 1193292 ns 0.78
array/reductions/reduce/Int64/dims=1L 2350750 ns 2113062.5 ns 1.11
array/reductions/reduce/Int64/dims=2L 2815291 ns 3456458 ns 0.81
array/reductions/reduce/Float32/1d 1029750 ns 971625 ns 1.06
array/reductions/reduce/Float32/dims=1 956125 ns 808458 ns 1.18
array/reductions/reduce/Float32/dims=2 870375 ns 768979 ns 1.13
array/reductions/reduce/Float32/dims=1L 1659354.5 ns 1739041 ns 0.95
array/reductions/reduce/Float32/dims=2L 2781167 ns 1772125 ns 1.57
array/reductions/mapreduce/Int64/1d 1000375 ns 1456146 ns 0.69
array/reductions/mapreduce/Int64/dims=1 936083 ns 1074875 ns 0.87
array/reductions/mapreduce/Int64/dims=2 873500 ns 1206417 ns 0.72
array/reductions/mapreduce/Int64/dims=1L 2346562.5 ns 2119292 ns 1.11
array/reductions/mapreduce/Int64/dims=2L 2844729 ns 3444375 ns 0.83
array/reductions/mapreduce/Float32/1d 1045959 ns 990792 ns 1.06
array/reductions/mapreduce/Float32/dims=1 947959 ns 810062.5 ns 1.17
array/reductions/mapreduce/Float32/dims=2 868041.5 ns 761104 ns 1.14
array/reductions/mapreduce/Float32/dims=1L 1668167 ns 1740812.5 ns 0.96
array/reductions/mapreduce/Float32/dims=2L 2815354.5 ns 1781292 ns 1.58
array/private/copyto!/gpu_to_gpu 636791 ns 651375 ns 0.98
array/private/copyto!/cpu_to_gpu 795791 ns 805542 ns 0.99
array/private/copyto!/gpu_to_cpu 811292 ns 817667 ns 0.99
array/private/iteration/findall/int 1657000 ns 1646500 ns 1.01
array/private/iteration/findall/bool 1451937.5 ns 1444584 ns 1.01
array/private/iteration/findfirst/int 2074750 ns 1754958.5 ns 1.18
array/private/iteration/findfirst/bool 1635145.5 ns 1703625 ns 0.96
array/private/iteration/scalar 5542583.5 ns 4772500 ns 1.16
array/private/iteration/logical 2734958 ns 2536917 ns 1.08
array/private/iteration/findmin/1d 1870167 ns 1815666 ns 1.03
array/private/iteration/findmin/2d 1891583.5 ns 1431750 ns 1.32
array/private/copy 573791.5 ns 538167 ns 1.07
array/shared/copyto!/gpu_to_gpu 83750 ns 86375 ns 0.97
array/shared/copyto!/cpu_to_gpu 82625 ns 86583 ns 0.95
array/shared/copyto!/gpu_to_cpu 91458 ns 84833 ns 1.08
array/shared/iteration/findall/int 1643437.5 ns 1609874.5 ns 1.02
array/shared/iteration/findall/bool 1471812.5 ns 1464354 ns 1.01
array/shared/iteration/findfirst/int 1830375 ns 1377750 ns 1.33
array/shared/iteration/findfirst/bool 1385917 ns 1319166 ns 1.05
array/shared/iteration/scalar 206917 ns 217500 ns 0.95
array/shared/iteration/logical 2750042 ns 2288708.5 ns 1.20
array/shared/iteration/findmin/1d 1607895.5 ns 1421750 ns 1.13
array/shared/iteration/findmin/2d 1917291.5 ns 1430854.5 ns 1.34
array/shared/copy 251042 ns 248666 ns 1.01
array/permutedims/4d 2442208 ns 2438438 ns 1.00
array/permutedims/2d 1184291.5 ns 1193250 ns 0.99
array/permutedims/3d 1737625 ns 1768458 ns 0.98
metal/synchronization/stream 19667 ns 19916 ns 0.99
metal/synchronization/context 20292 ns 20375 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@christiangnrd christiangnrd changed the title Switch to GPUArrays.jl reduction implementation [Do not merge] Switch to GPUArrays.jl reduction implementation Jul 23, 2025
@maleadt
Copy link
Member

maleadt commented Jul 29, 2025

Leaving the current mapreducedim! implementation present, we can transition in two parts. First once AK supports broadcasted reductions, and then remove implementations from this repo after AK supports >1 input dims.

I think I'd rather we do it in one pass, because the change needs to be made across back-ends.

@maleadt
Copy link
Member

maleadt commented Jul 30, 2025

In any case, despite some regressions the overall performance seems better here than over in CUDA.jl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants