Skip to content

Conversation

@jayhawk-commits
Copy link
Contributor

@jayhawk-commits jayhawk-commits commented Oct 9, 2025

@jayhawk-commits jayhawk-commits changed the title [DO NOT MERGE] Test gcc-toolset-13 update Update workflows to gcc-toolset-13 and fix OpenBLAS build Oct 10, 2025
@jayhawk-commits
Copy link
Contributor Author

Build is past third-party deps, so signs are promising. Windows job is failing at environment setup steps due to infra issues.

@jayhawk-commits
Copy link
Contributor Author

The previously failing tests for loading the OpenBLAS library from turning DYNAMIC_ARCH on pass in the CI run.

@jayhawk-commits jayhawk-commits marked this pull request as ready for review October 10, 2025 23:36
Copy link
Contributor

@HereThereBeDragons HereThereBeDragons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the failure of rocm sanitycheck on linux seem to be a problem how the docker container is started and what group access it has.

INFO     test_rocm_sanity:test_rocm_sanity.py:24 ++ Run [None]$ /__w/TheRock/TheRock/build/bin/rocminfo
ERROR    test_rocm_sanity:test_rocm_sanity.py:29 Command failed!
ERROR    test_rocm_sanity:test_rocm_sanity.py:30 command stdout:
ERROR    test_rocm_sanity:test_rocm_sanity.py:32 ROCk module version 6.12.12 is loaded
ERROR    test_rocm_sanity:test_rocm_sanity.py:32 Unable to open /dev/kfd read-write: Permission denied
ERROR    test_rocm_sanity:test_rocm_sanity.py:32 Failed to get user name to check for video group membership

-DC_LAPACK=ON
-DTARGET=CORE2
-DDYNAMIC_ARCH=ON
-DDYNAMIC_OLDER=ON
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need dynamic_older?

from the changelog:

Version 0.3.1
01-Jul-2018
...
x86_64:
retired some older targets of DYNAMIC_ARCH builds to a new option DYNAMIC_OLDER,
	  this affects PENRYN,DUNNINGTON,OPTERON,OPTERON_SSE3,BOBCAT,ATOM and NANO
	  (which will still be supported via the slower PRESCOTT kernels when this option is not set)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't comment specifically about need, but I picked for breadth of CPU coverage for potential users of TheRock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MSVC seems to have issues with small matrix optimizations required to support older CPUs, so Windows build compatibility might dictate how far back the build flags can go.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can compile this with [amd]clang instead of msvc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment block has me concerned about trying that. https://github.com/ROCm/TheRock/blob/main/core/CMakeLists.txt#L2C1-L11C10

I'll see if removing DYNAMIC_OLDER fixes the issue. If not, I will try HASWELL which should be a minimum baseline for x86_64 CPUs from Intel and AMD from the past decade. I think I remember one build recipe picking that one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, looks like MSVC's limitation with DYNAMIC_ARCH is known OpenMathLib/OpenBLAS#2826 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My latest set of build flags, which limits DYNAMIC_ARCH to Linux, while still leveraging a TARGET that should support almost all AMD and Intel CPUs from the past decade is able to compile on both our Linux and Windows workflows now.

@jayhawk-commits
Copy link
Contributor Author

the failure of rocm sanitycheck on linux seem to be a problem how the docker container is started and what group access it has.

INFO     test_rocm_sanity:test_rocm_sanity.py:24 ++ Run [None]$ /__w/TheRock/TheRock/build/bin/rocminfo
ERROR    test_rocm_sanity:test_rocm_sanity.py:29 Command failed!
ERROR    test_rocm_sanity:test_rocm_sanity.py:30 command stdout:
ERROR    test_rocm_sanity:test_rocm_sanity.py:32 ROCk module version 6.12.12 is loaded
ERROR    test_rocm_sanity:test_rocm_sanity.py:32 Unable to open /dev/kfd read-write: Permission denied
ERROR    test_rocm_sanity:test_rocm_sanity.py:32 Failed to get user name to check for video group membership

I believe there were issues late last week where it was not picking pods with GPUs.

@jayhawk-commits
Copy link
Contributor Author

the failure of rocm sanitycheck on linux seem to be a problem how the docker container is started and what group access it has.

INFO     test_rocm_sanity:test_rocm_sanity.py:24 ++ Run [None]$ /__w/TheRock/TheRock/build/bin/rocminfo
ERROR    test_rocm_sanity:test_rocm_sanity.py:29 Command failed!
ERROR    test_rocm_sanity:test_rocm_sanity.py:30 command stdout:
ERROR    test_rocm_sanity:test_rocm_sanity.py:32 ROCk module version 6.12.12 is loaded
ERROR    test_rocm_sanity:test_rocm_sanity.py:32 Unable to open /dev/kfd read-write: Permission denied
ERROR    test_rocm_sanity:test_rocm_sanity.py:32 Failed to get user name to check for video group membership

I believe there were issues late last week where it was not picking pods with GPUs.

Sanity test failures did not occur just now, with some of the infra issues fixed, when I just re-ran the failures.

@jayhawk-commits
Copy link
Contributor Author

Seeing error below with Windows builds with the new flags:

2025-10-14T14:21:39.8009716Z [therock-host-blas configure] CMake Error in B:/build/third-party/host-blas/source/kernel/CMakeLists.txt:
2025-10-14T14:21:39.8009916Z [therock-host-blas configure]   MSVC_RUNTIME_LIBRARY value 'MultiThreadedDLL' not known for this ASM
2025-10-14T14:21:39.8009989Z [therock-host-blas configure]   compiler.

@jayhawk-commits jayhawk-commits merged commit f8df463 into main Oct 17, 2025
79 of 81 checks passed
@jayhawk-commits jayhawk-commits deleted the users/jayhawk-commits/update-gcc-toolset-13 branch October 17, 2025 18:38
@github-project-automation github-project-automation bot moved this from TODO to Done in TheRock Triage Oct 17, 2025
marbre pushed a commit that referenced this pull request Oct 24, 2025
- Addresses #1877 
- #1746 introduced an issue in the pytorch workflows, which aren't
included in pre-submit CI, so the issue was missed then.
- Switching back to `C_LAPACK` for now as quickest solution.
- `CMAKE_CROSSCOMPILING` is not needed for our current use case.

# Test Plan
- Trigger the pytorch workflows to verify fortran issue is not present.
jharryma pushed a commit that referenced this pull request Nov 5, 2025
- Update workflows to use new manylinux docker image produced by #1745
- For Linux, turn on `DYNAMIC_ARCH` for OpenBLAS build to resolve #83.
- `DYNAMIC_ARCH` support does not exist for MSVC and multi-arch support
for Windows seems shoddy overall. [Reference
1](OpenMathLib/OpenBLAS#2826 (comment))
and [Reference 2](OpenMathLib/OpenBLAS#3985)
- Remove usage of f2c version of an older lapack as it is considered
fallback option.
[Reference](OpenMathLib/OpenBLAS#4318 (comment))
- Patch amd-llvm compiler to be able to find stdc++ from gcc-toolset-13
during link time.
jharryma pushed a commit that referenced this pull request Nov 5, 2025
- Addresses #1877 
- #1746 introduced an issue in the pytorch workflows, which aren't
included in pre-submit CI, so the issue was missed then.
- Switching back to `C_LAPACK` for now as quickest solution.
- `CMAKE_CROSSCOMPILING` is not needed for our current use case.

# Test Plan
- Trigger the pytorch workflows to verify fortran issue is not present.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

OpenBLAS with -DDYNAMIC_ARCH=ON produces illegal ELF files

5 participants