Skip to content

Triangular solve and inversions have poor performance #1405

@jpdoane

Description

@jpdoane

Solving and inverting triangular matrices seems to be implemented inefficiently:

n=8
X = randn(n, n)
L = LowerTriangular(randn(n, n))
U = L'

# These are mathematically equivalent but differ in performance
@assert (X'/U)' ≈ L\X  
@btime $L \ $X          # 2.492 μs (2 allocations: 592 bytes)
@btime ($X' / $U)'      # 409.151 ns (2 allocations: 592 bytes)

# This also occurs when inverting triangular matrics
I_F64 = Float64.(collect(I(n)))

@btime inv($L)          # 2.551 μs (2 allocations: 592 bytes)
@btime inv($U)'         # 2.492 μs (2 allocations: 592 bytes)
@btime $L \ I           # 2.427 μs (2 allocations: 592 bytes)
@btime (I / $U)'        # 2.496 μs (2 allocations: 592 bytes)
@btime $L \ $I_F64      # 2.418 μs (2 allocations: 592 bytes)
@btime ($I_F64 / $U)'   # 351.958 ns (2 allocations: 592 bytes)


julia> versioninfo()
Julia Version 1.11.5
Commit 760b2e5b739 (2025-04-14 06:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 112 × Intel(R) Xeon(R) Gold 6348 CPU @ 2.60GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, icelake-server)
Threads: 32 default, 0 interactive, 16 GC (on 112 virtual cores)

This seems to be due to different BLAS functions being called under the hood: LinearAlgebra.generic_mattridiv! calls trtrs, while LinearAlgebra.generic_trimatdiv! call trsm, which appears to be much slower for some reason. Oddly, this Stack Overflow comment suggests that trtrs typically wraps trsm in most implementations and thus shouldn't be any faster, so thats a bit odd...

Likelrelated to issue 636

Suggested fix is replacing calls to trsm with trtrs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions