Skip to content

Conversation

@BioTurboNick
Copy link
Contributor

Found additional locations that could benefit from the changes in #1176 , but I haven't tested these as yet.

@BioTurboNick
Copy link
Contributor Author

BioTurboNick commented Feb 3, 2025

More to come. These two don't look all that much like an improvement... I wonder if it's because the wrapping to vec and then back to a matrix isn't that helpful. Can't create it as a vector because it's what's provided as input to the function...

EDIT 2/12/25: Determined I wasn't using the right inputs to test the new path. Using a 100x110 array to test, which triggers the truncation path:

using Random
Random.seed!(100)
A,tau = LAPACK.geqrf!(rand(elty,100,110))
function f(A, tau)
    for i = 1:100
        LAPACK.orgqr!(A,tau)
    end
end
@benchmark f(A, tau) seconds = 60

Random.seed!(100)
A,tau = LAPACK.geqlf!(rand(elty,100,110))
function f(A, tau)
    for i = 1:100
        LAPACK.orgql!(A,tau)
    end
end
@benchmark f(A, tau) seconds = 60

orgqr! Before:
image
After:
image

orgql! Before:
image
After:
image

(more to come)

@dkarrasch
Copy link
Member

Test failures may be real?

@ViralBShah
Copy link
Member

@BioTurboNick Looks like a nice PR but seems to break some tests.

@BioTurboNick
Copy link
Contributor Author

Thanks! I'm mystified why it would work on linux-i686 and macos-aarch64 but not linux-x86_64 or windows-x86_64, though it's suggestive.

Linux segfaults and isn't helpful in figuring out the cause, but Windows provides a ReadOnlyMemoryError that is converted to a Julia exception, with a nice stack trace.

The proximate cause is in reshape(resize!(Z, ldz * m[]), ldz, m[]) at the end of stegr!. Nearest I can figure is that m[] can sometimes be larger than the second dimension of Z or of the length of w, which would explain why the conditional was needed in the original code. Let's see if that fixes it...

But still, that should only produce garbage values in the extra part, not a segfault? And why only on x86_64?

@codecov
Copy link

codecov bot commented Feb 12, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.92%. Comparing base (443aa0f) to head (ee7ef2d).
Report is 21 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1190      +/-   ##
==========================================
+ Coverage   91.89%   91.92%   +0.02%     
==========================================
  Files          34       34              
  Lines       15360    15365       +5     
==========================================
+ Hits        14115    14124       +9     
+ Misses       1245     1241       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@BioTurboNick
Copy link
Contributor Author

BioTurboNick commented Feb 12, 2025

Oy. Either the debug code caused it to go away, or the change in nightly from DEV.33 to DEV.38 did it. EDIT: Nope, not fixed by the newer nightlies. It's a Heisenbug.

@BioTurboNick
Copy link
Contributor Author

BioTurboNick commented Feb 12, 2025

Finally, a minimal reproducer:

a = SymTridiagonal(rand(12), rand(11)) # SymTridiagonal([1.0129540363709792, 0.5125544740879742, 1.4404024510136622, 0.9645500736079791, 0.8349894041911953, 1.0404785847552251, 0.9418430974856163, 0.1401583600348122, 0.38426770335791244, 0.4551768643291618, 0.4200328536585094, 1.8593998829530265e-7], [0.40739613651353346, 0.10643163002740841, 0.6222058356713083, 0.33352809216796453, 0.3799591152214755, 0.21073828255014115, 0.8158852900425236, 0.1683998539581063, 0.5002702262460241, 0.6123852845929411, 0.7701699043551441])"
A = copy(a); LAPACK.stegr!('V', 'A', A.dv, A.ev, 0.0, 0.0, 0, 0)

It seems that this is a bug in Julia (1.11 and nightly)? Seems to be producing undefined behavior. With various configurations of the end code I get a Julia crash with EXCEPTION_ACCESS_VIOLATION, or a Julia-handled ReadOnlyMemoryError(). And this is happening when the size actually isn't changing.

It seems some necessary ingredients are 1) initiating as a vector*; 2) passing the vector into the ccall; 3) reshaping the vector into a matrix (resize call not necessary, though can independently trigger a similar error).

If the array is initiated as a matrix, and then later reshaped with reshape(resize!(vec(Z), ldz * Zm), ldz, Zm), there's no problem at all.

And the issue goes away entirely if the offending line is wrapped in try/catch.

*As far as C is concerned, an array is just a pointer to the start of a chunk of memory, so should make no difference on the C side of things?

@BioTurboNick
Copy link
Contributor Author

Well, that was dumb. Changed it to a vector, didn't change the line that relied on looking up its second dimension.

@dkarrasch
Copy link
Member

IIUC, you solved the issue, right? So, there is no need to catch anything in the tests? I wasn't able to run tests locally, so I removed the try-catch-block and added a simple test to cover what was already missed before your PR. If tests pass, then I think this should be ready to go.

@BioTurboNick
Copy link
Contributor Author

BioTurboNick commented Feb 18, 2025

Ah thanks, yeah good to go.

@dkarrasch dkarrasch merged commit 57785c7 into JuliaLang:master Feb 18, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants