Skip to content

Conversation

@majosm
Copy link
Collaborator

@majosm majosm commented Jan 27, 2026

WIP

cc @lukeolson

@majosm
Copy link
Collaborator Author

majosm commented Jan 27, 2026

@inducer I added a loopy kernel fallback for eager (code), and it's working when the array being multiplied is 1D, but I'm running into some trouble with array strides for higher dimensions. I set up the input array like this:

    u1 = actx.np.where(actx.np.less_equal(x, (a+b)/2), 0*x + 1, 0*x)
    u2 = actx.np.where(actx.np.greater_equal(x, (a+b)/2), 0*x + 1, 0*x)
    # u = actx.freeze_thaw(u1)  # 1D
    u = actx.freeze_thaw(actx.np.stack([u1, u2]).T)  # 2D

The strides are no longer C-like after transposing. Lazy doesn't seem to care about this, but eager doesn't like it; it makes my 1D diffusion solver blow up instantly. If I add order="F" to the GlobalArg(...) calls for "array" and "out", then it works fine. Is there a way I can fix this in general?

Edit: I guess the strides of u are only non-C-like in the eager case.

@majosm
Copy link
Collaborator Author

majosm commented Jan 29, 2026

FWIW, using a dense matrix + actx.einsum instead shows the same behavior.

@majosm
Copy link
Collaborator Author

majosm commented Feb 2, 2026

Upon closer inspection, it looks like the kernels may actually be OK. The problem occurs later in the test driver, when I do this:

        u = actx.freeze_thaw(u + dt*compiled_rhs(t, u))

(here u has order='F' and dt*compiled_rhs(t, u) has order='C' in the eager case). They don't add together correctly in eager, which seems to be due to inducer/pyopencl#681. So I guess the solution is just "don't mix orders".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant