-
-
Notifications
You must be signed in to change notification settings - Fork 34
Description
The current implementation of rank(::QRPivoted)
(see below) is very inefficient. I will submit a PR shortly to fix this.
More detailed benchmarking here: https://ajinkya-k.github.io/hidden/2024-11-28-allocations-runtime/
function rank(A::QRPivoted; atol::Real=0, rtol::Real=min(size(A)...) * eps(real(float(one(eltype(A.Q))))) * iszero(atol))
m = min(size(A)...)
m == 0 && return 0
tol = max(atol, rtol*abs(A.R[1,1]))
return something(findfirst(i -> abs(A.R[i,i]) <= tol, 1:m), m+1) - 1
end
This is because of the repeated calls to A.R
inside findfirst
. Each call recomputes the factor R, causing too many allocations and slowing down the method. This is irrelevant for small matrices but becomes problematic for large matrices (see examples below).
The following is a faster implementation, because it calls A.factors instead of A.R and only collects the diagonal elements, which is all we need.
This cuts down the runtime by many orders of magnitude for larger matrices.
function rankfast(A::QRPivoted; atol::Real=0, rtol::Real=min(size(A)...) * eps(real(float(one(eltype(A.Q))))) * iszero(atol))
m = min(size(A)...)
m == 0 && return 0
rs = diag(getfield(A, :factors))
tol = max(atol, rtol*abs(rs[1]))
return something(findfirst(abs.(rs) .<= tol), m+1) - 1
end
Here's a sample comparison for a matrix of size 1000 x 100 and rank 50.
Note that the current method takes milliseconds, while the faster one works in microseconds
@be rank($Mqr)
Benchmark: 42 samples with 1 evaluation
min 294.166 μs (157 allocs: 3.971 MiB)
median 397.667 μs (157 allocs: 3.971 MiB)
mean 2.323 ms (157 allocs: 3.971 MiB, 13.20% gc time)
max 74.098 ms (157 allocs: 3.971 MiB, 99.40% gc time)
julia> @be rankfast($Mqr)
Benchmark: 10820 samples with 58 evaluations
min 79.034 ns (6 allocs: 1.047 KiB)
median 96.259 ns (6 allocs: 1.047 KiB)
mean 134.789 ns (6 allocs: 1.047 KiB, 0.18% gc time)
max 62.378 μs (6 allocs: 1.047 KiB, 99.07% gc time)
Here are also some plots of run time and bytes allocated for matrices of size 1000 x 100 as the rank increases from 1 to 100:
**Note: ** Y axis is log-scaled