fix: regression in non-fast scalar indexing support #760

avik-pal · 2025-08-13T23:12:18Z

fixes #759

Jacobian support for GPU Arrays has been restored
ForwardDiff.gradient now supports GPU Arrays

codecov · 2025-08-13T23:21:33Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.05%. Comparing base (463e830) to head (5b8ffab).

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #760      +/-   ##
==========================================
+ Coverage   89.79%   90.05%   +0.25%     
==========================================
  Files          11       12       +1     
  Lines        1039     1066      +27     
==========================================
+ Hits          933      960      +27     
  Misses        106      106

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ChrisRackauckas · 2025-08-14T03:45:56Z

ext/ForwardDiffGPUArraysCoreExt.jl

+    idxs = collect(
+        Iterators.drop(ForwardDiff.structural_eachindex(result), offset)
+    )[1:chunksize]
+    result[idxs] .= partial_fn.(Ref(dual), 1:chunksize)


Does this not have an inference issue due to losing static information about size? I would think this needs to be ntuple unless it can prove things about size.

It would still be type-stable, it would just have dynamism in the function that would slow it down a bit during the broadcast.

Here the chunksize is already an Int, so I don't think we will have any benefit of using an ntuple

Noted in #759 (comment), GPU is completely untested in ForwardDiff.jl, so this sets up the buildkite pipeline. I setup the backend and all, and just took a few tests from #760 to seed it. The point of this isn't really to be a comprehensive set of GPU tests but rather to update this repo to have the standard tools the other repos have so GPU doesn't regress again/more.

Project.toml

ext/ForwardDiffGPUArraysCoreExt.jl

Project.toml

ext/ForwardDiffGPUArraysCoreExt.jl

Project.toml

KristofferC · 2025-08-19T07:29:44Z

In #472, the seed! (etc) functions were written in a generic (non-typespecific) way that should have supported GPU arrays. This PR adds further specializations to seed! for some specific types by using extensions. But why does the previous approach not work anymore? That one seemed better since it supports non-fast scalar indexing for arrays that are not GPU arrays.

Has it been properly explored if the existing functions can be written in an alternative way that would support both fast and non-fast scalar arrays with the same generic code (which would avoid any new extensions)?

Edit: I had missed that #739 reverted some of #472.

devmotion · 2025-08-22T09:04:43Z

Yes, on the master branch seeding is (again) performed without broadcasting. Depending on the structural array type the set of indices are not readily available in an allocation-free broadcastable form (e.g. set of uppertriangular indices for UpperTriangular) and hence I reverted the code back to iterating over indices - without considering that it would break GPU compatibility.

If we want to avoid these allocations (and the broadcasting overhead) for non-GPU arrays, I don't immediately see how this issue could be solved by a generic implementation. Possibly the amount of code duplication could be reduced by introducing a helper function or branch that based on the type of the input array switches between broadcasting and iterating (presumably defaulting to iteration?), but even in this case it would be necessary to add an extension that ensures that GPU arrays use broadcasting.

Alternatively, we could default to using broadcasting (with the additional overhead of collecting the indices), and - as an additional optimization - only use iteration for a handful of selected base array types such as Array, UpperTriangular{_,<:Matrix}, etc. This wouldn't require an extension, but if we add such optimizations we would still require both an implementation with and one without broadcasting.

What are your thoughts @KristofferC?

avik-pal · 2025-08-31T17:50:25Z

bump on this

avik-pal · 2025-09-01T19:20:51Z

Testing this patch out with Lux.jl, it will still cause regressions in the cases where we have a wrapper over CuArray.

only use iteration for a handful of selected base array types such as Array

This seems like a good solution without causing regression on use-cases that was supported prior to #739. There's also ArrayInterface.fast_scalar_indexing but not sure how the maintainers feel about taking a dep on ArrayInterface.

devmotion · 2025-09-01T20:41:15Z

src/utils.jl

@@ -0,0 +1,15 @@
+# overload for array types that
+@inline supports_fast_scalar_indexing(::Array) = true


Is the @inline needed, did you encounter problems without it?

I think we might also want to extend this to

Suggested change

@inline supports_fast_scalar_indexing(::Array) = true

@inline supports_fast_scalar_indexing(::StridedArray) = true

StridedArray is too broad here

julia> SubArray{Float64, 2, JLArray{Float64, 2}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false} <: StridedArray true

But is that only a problem with how JLArray is defined? Does it also cover views of CuArrays?

If StridedArray is problematic, another more generic alternative would be DenseArray.

src/utils.jl

avik-pal · 2025-09-04T16:23:16Z

bump on this

devmotion

I was hoping we could achieve it with a bit less code but maybe that's not possible.

Can you also add tests of #739 and #743 for these new branches?

fix: project toml for julia pre 1.9 fix: support gradient + more test coverage chore: relax version chore: remove 1.6 support and bump min version to 1.10 fix: apply suggestions from code review Co-authored-by: David Widmann <[email protected]> fix: use a struct instead of closure fix: sizecheck chore: remove GPUArraysCore Co-authored-by: David Widmann <[email protected]> fix: revert _take chore: remove 1.8 checks chore: remove 0.1 Co-authored-by: David Widmann <[email protected]>

Co-authored-by: David Widmann <[email protected]>

avik-pal · 2025-09-11T22:10:09Z

Can you also add tests of #739 and #743 for these new branches?

The GPUArray backends don't support broadcasting with wrapped arrays nicely, so those tests will mostly fail.

julia> using CUDA, LinearAlgebra

julia> x = LowerTriangular(cu(rand(Float32, 4, 4)))
4×4 LowerTriangular{Float32, CuArray{Float32, 2, CUDA.DeviceMemory}}:
 0.960887   ⋅         ⋅          ⋅ 
 0.316333  0.612238   ⋅          ⋅ 
 0.236091  0.209854  0.0883058   ⋅ 
 0.370694  0.732681  0.0111619  0.270063

julia> x[diagind(x)] .= 10

And they won't have un-assigned elements.

julia> CuArray{Float32}(undef, 2, 3)
2×3 CuArray{Float32, 2, CUDA.DeviceMemory}:
 -5.81535f-36  1.25147f7  -2.50125f-11
 -1.98624      1.84662     1.95155

julia> JLArray{Float32}(undef, 3, 4)
3×4 JLArray{Float32, 2}:
 6.771f-42  6.771f-42  9.42f-43  0.0
 6.771f-42  6.771f-42  0.0       0.0
 6.771f-42  6.771f-42  0.0       4.5657f-41

ChrisRackauckas reviewed Aug 14, 2025

View reviewed changes

ChrisRackauckas mentioned this pull request Aug 14, 2025

Setup GPU testing infrastructure #761

Open

devmotion reviewed Aug 15, 2025

View reviewed changes

avik-pal force-pushed the ap/gpu_arrays branch 3 times, most recently from 19e8423 to da2efb7 Compare August 16, 2025 00:34

avik-pal requested review from devmotion and ChrisRackauckas August 16, 2025 00:57

devmotion reviewed Aug 17, 2025

View reviewed changes

avik-pal force-pushed the ap/gpu_arrays branch 2 times, most recently from 2536221 to 11540a0 Compare August 18, 2025 23:03

devmotion previously approved these changes Aug 19, 2025

View reviewed changes

devmotion reviewed Aug 19, 2025

View reviewed changes

Project.toml Outdated Show resolved Hide resolved

avik-pal dismissed devmotion’s stale review via 0b9132a August 19, 2025 22:27

avik-pal requested a review from devmotion August 22, 2025 00:53

avik-pal force-pushed the ap/gpu_arrays branch from 0b9132a to babf94f Compare August 31, 2025 00:14

avik-pal mentioned this pull request Aug 31, 2025

Fix remaining CUDA testing LuxDL/Lux.jl#1457

Open

4 tasks

avik-pal force-pushed the ap/gpu_arrays branch 2 times, most recently from d2e7730 to 6688848 Compare September 1, 2025 20:28

devmotion reviewed Sep 1, 2025

View reviewed changes

avik-pal requested a review from devmotion September 1, 2025 21:32

devmotion reviewed Sep 8, 2025

View reviewed changes

avik-pal and others added 4 commits September 11, 2025 17:58

feat: support wrapper types as well

095a9d1

Apply suggestions from code review

0ed3a82

Co-authored-by: David Widmann <[email protected]>

fix: revert StridedArray change

cbc1661

fix: remove inline

2c97323

avik-pal force-pushed the ap/gpu_arrays branch from bbe764b to 2c97323 Compare September 11, 2025 21:58

avik-pal mentioned this pull request Sep 15, 2025

fix: pin ForwardDiff and CUDA versions LuxDL/Lux.jl#1477

Merged

Merge branch 'master' into ap/gpu_arrays

5b8ffab

avik-pal requested a review from devmotion September 17, 2025 00:22

avik-pal mentioned this pull request Sep 24, 2025

fix: force a recent version of Lux to avoid ForwardDiff regression SciML/DiffEqFlux.jl#987

Merged

		@@ -0,0 +1,15 @@
		# overload for array types that
		@inline supports_fast_scalar_indexing(::Array) = true

fix: regression in non-fast scalar indexing support #760

Are you sure you want to change the base?

fix: regression in non-fast scalar indexing support #760

Conversation

avik-pal commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ChrisRackauckas Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

ChrisRackauckas Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

avik-pal Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

KristofferC commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devmotion commented Aug 22, 2025

Uh oh!

avik-pal commented Aug 31, 2025

Uh oh!

avik-pal commented Sep 1, 2025

Uh oh!

devmotion Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

avik-pal Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

devmotion Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

avik-pal commented Sep 4, 2025

Uh oh!

devmotion left a comment

Choose a reason for hiding this comment

Uh oh!

avik-pal commented Sep 11, 2025

Uh oh!

Uh oh!

avik-pal commented Aug 13, 2025 •

edited

Loading

codecov bot commented Aug 13, 2025 •

edited

Loading

KristofferC commented Aug 19, 2025 •

edited

Loading