-
Notifications
You must be signed in to change notification settings - Fork 128
Update workflows to gcc-toolset-13 and fix OpenBLAS build #1746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update workflows to gcc-toolset-13 and fix OpenBLAS build #1746
Conversation
|
Build is past third-party deps, so signs are promising. Windows job is failing at environment setup steps due to infra issues. |
|
The previously failing tests for loading the OpenBLAS library from turning |
HereThereBeDragons
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the failure of rocm sanitycheck on linux seem to be a problem how the docker container is started and what group access it has.
INFO test_rocm_sanity:test_rocm_sanity.py:24 ++ Run [None]$ /__w/TheRock/TheRock/build/bin/rocminfo
ERROR test_rocm_sanity:test_rocm_sanity.py:29 Command failed!
ERROR test_rocm_sanity:test_rocm_sanity.py:30 command stdout:
ERROR test_rocm_sanity:test_rocm_sanity.py:32 ROCk module version 6.12.12 is loaded
ERROR test_rocm_sanity:test_rocm_sanity.py:32 Unable to open /dev/kfd read-write: Permission denied
ERROR test_rocm_sanity:test_rocm_sanity.py:32 Failed to get user name to check for video group membership
third-party/host-blas/CMakeLists.txt
Outdated
| -DC_LAPACK=ON | ||
| -DTARGET=CORE2 | ||
| -DDYNAMIC_ARCH=ON | ||
| -DDYNAMIC_OLDER=ON |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need dynamic_older?
from the changelog:
Version 0.3.1
01-Jul-2018
...
x86_64:
retired some older targets of DYNAMIC_ARCH builds to a new option DYNAMIC_OLDER,
this affects PENRYN,DUNNINGTON,OPTERON,OPTERON_SSE3,BOBCAT,ATOM and NANO
(which will still be supported via the slower PRESCOTT kernels when this option is not set)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't comment specifically about need, but I picked for breadth of CPU coverage for potential users of TheRock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MSVC seems to have issues with small matrix optimizations required to support older CPUs, so Windows build compatibility might dictate how far back the build flags can go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can compile this with [amd]clang instead of msvc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment block has me concerned about trying that. https://github.com/ROCm/TheRock/blob/main/core/CMakeLists.txt#L2C1-L11C10
I'll see if removing DYNAMIC_OLDER fixes the issue. If not, I will try HASWELL which should be a minimum baseline for x86_64 CPUs from Intel and AMD from the past decade. I think I remember one build recipe picking that one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, looks like MSVC's limitation with DYNAMIC_ARCH is known OpenMathLib/OpenBLAS#2826 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My latest set of build flags, which limits DYNAMIC_ARCH to Linux, while still leveraging a TARGET that should support almost all AMD and Intel CPUs from the past decade is able to compile on both our Linux and Windows workflows now.
I believe there were issues late last week where it was not picking pods with GPUs. |
Sanity test failures did not occur just now, with some of the infra issues fixed, when I just re-ran the failures. |
|
Seeing error below with Windows builds with the new flags: |
- Addresses #1877 - #1746 introduced an issue in the pytorch workflows, which aren't included in pre-submit CI, so the issue was missed then. - Switching back to `C_LAPACK` for now as quickest solution. - `CMAKE_CROSSCOMPILING` is not needed for our current use case. # Test Plan - Trigger the pytorch workflows to verify fortran issue is not present.
- Update workflows to use new manylinux docker image produced by #1745 - For Linux, turn on `DYNAMIC_ARCH` for OpenBLAS build to resolve #83. - `DYNAMIC_ARCH` support does not exist for MSVC and multi-arch support for Windows seems shoddy overall. [Reference 1](OpenMathLib/OpenBLAS#2826 (comment)) and [Reference 2](OpenMathLib/OpenBLAS#3985) - Remove usage of f2c version of an older lapack as it is considered fallback option. [Reference](OpenMathLib/OpenBLAS#4318 (comment)) - Patch amd-llvm compiler to be able to find stdc++ from gcc-toolset-13 during link time.
- Addresses #1877 - #1746 introduced an issue in the pytorch workflows, which aren't included in pre-submit CI, so the issue was missed then. - Switching back to `C_LAPACK` for now as quickest solution. - `CMAKE_CROSSCOMPILING` is not needed for our current use case. # Test Plan - Trigger the pytorch workflows to verify fortran issue is not present.
DYNAMIC_ARCHfor OpenBLAS build to resolve OpenBLAS with -DDYNAMIC_ARCH=ON produces illegal ELF files #83.DYNAMIC_ARCHsupport does not exist for MSVC and multi-arch support for Windows seems shoddy overall. Reference 1 and Reference 2