Switch from xPPTRF to xPOTRF to improve TurbSim speed on macOS#3123
Open
IrisMeasure wants to merge 1 commit intoOpenFAST:devfrom
Open
Switch from xPPTRF to xPOTRF to improve TurbSim speed on macOS#3123IrisMeasure wants to merge 1 commit intoOpenFAST:devfrom
IrisMeasure wants to merge 1 commit intoOpenFAST:devfrom
Conversation
bjonkman
reviewed
Dec 30, 2025
| @@ -1412,127 +1412,185 @@ END SUBROUTINE LAPACK_SPOSV | |||
| !> Compute the Cholesky factorization of a real symmetric positive definite matrix A stored in packed format. | |||
| !! use LAPACK_PPTRF (nwtc_lapack::lapack_pptrf) instead of this specific function. | |||
| SUBROUTINE LAPACK_DPPTRF (UPLO, N, AP, ErrStat, ErrMsg) | |||
Contributor
There was a problem hiding this comment.
The routines in NWTC_LAPACK.f90 are named for the LAPACK routine they calls. Since this change results in calling a different LAPACK routine, it seems like this should just be a new subroutine called `LAPACK_DPOTRF'. Though, it is a little tricky with the data conversion from packed to full matrix storage here since the function inputs are different than the LAPACK routine. Thoughts @andrew-platt, @deslaughter?
Collaborator
There was a problem hiding this comment.
I agree. We should add an interface for that. I haven't looked at the details though.
Collaborator
There was a problem hiding this comment.
I agree that a new LAPACK routine should be added and used directly in Turbsim
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Feature or improvement description
This PR rewrites the subroutines LAPACK_DPPTRF and LAPACK_SPPTRF in NWTC_LAPACK.f90, replacing the packed storage Cholesky decomposition (xPPTRF) with the full storage Cholesky decomposition (xPOTRF). To ensure compatibility with existing callers, the subroutine signature remains unchanged by using an internal wrapper to handle the conversion between packed and full storage formats.
This change results in a substantial speed improvement for TurbSim on macOS, with minimal additional memory overhead.
Related issue, if one exists
#3120
Impacted areas of the software
TurbSim
Test results, if applicable
(1) macOS
I compiled TurbSim using GCC 15.2.0 with the following build flags:
I used both versions of TurbSim to generate (i) Grid = 43 x 43, 120-second .bts file; (ii) Grid = 23 x 23, 600-second .bts file. The performance results (on macOS 26.2, M4 Pro) are shown below (Coh2h() is the caller of LAPACK_xPPTRF, and unit in seconds):
(i)
Coh2h()(ii)
Coh2h()Furthermore, the two version .bts files differ only in the metadata section, specifically at 0x42 ($n_{character}$ ) and the related $Character_i$ (typically version info and generated time), while the subsequent data sections are identical.
(2) Windows
I compiled TurbSim using IFORT (from Intel oneAPI 2024.2.1) and IFX (from Intel oneAPI 2025.0.1) with O2 optimization level. The performance results (on Windows 11 24H2, AMD 9950X) are shown below:
(i)
Coh2h()(ii)
Coh2h()After switching to SPOTRF, the computation speed of TurbSim on Windows has at least not decreased.
It should be noted that the .bts files generated by two versions of TurbSim (same compiler) are slightly different on Windows. However, in terms of engineering accuracy, this difference is negligible.