Skip to content

Conversation

@PHHargrove
Copy link
Contributor

Background:

I conduct automated regression testing of Chapel using GASNet-EX for multi-locale support on a variety of systems.
Recently my testing on big-endian PPC64/Linux began failing (SEGV early in startup).
Based on the last-good/first-bad test dates, Brad Chamberlain identified Chapel's update of Qthreads from 1.20 to 1.21 as a likely cause.

I have confirmed that if I build the 1.21 release of Qthreads on a big-endian PPC64 it fails in make check, with a SEGV in nearly every test. However, make check passes fine with the 1.20 release.

This single-commit PR is necessary and sufficient to allow the hello_world test (among others) to PASS.
However, this is not sufficient to get a clean run of make check.
I, unfortunately, cannot devote additional time to addressing whatever else is wrong.
I hope to reach out via email to the Qthreads team (probably next week) to provide access to a BE PPC64 system.

Please note that I am making this contribution under the assumption that there is a desire to retain (restore, actually) support for BE PPC64/Linux. However, if the Qthreads team determines that dropping support for this target is the appropriate action, I will not lobby against such a decision.

Commits:

  • Restore fastcontext support for big-endian PPC64

    This commit restores support in fastcontext for the "function descriptor" of the 64-bit PowerPC ELF ABI, which is common to both AIX and Linux on big-endian PPC64. It appears to have been unintentionally removed along with AIX support in pull request Drop Various Rare OSs and Remove Unnecessary Config Checks #277 due to the (unfortunate) choice of QTHREAD_PPC_ABI_AIX to identify this ABI.

    Partially reverts 4f3031c

This commit restores support in fastcontext for the "function
descriptor" of the 64-bit PowerPC ELF ABI, which is common to both AIX
and Linux on big-endian PPC64.  It appears to have been
unintentionally removed along with AIX support in pull request sandialabs#277
due to the (unfortunate) choice of `QTHREAD_PPC_ABI_AIX` to identify
this ABI.

Partially reverts 4f3031c

Signed-off-by: Paul H. Hargrove <[email protected]>
@insertinterestingnamehere
Copy link
Collaborator

insertinterestingnamehere commented Oct 17, 2024

Thanks! That's a really interesting find! This patch looks great. I did double-check that this method of detecting endianness in the preprocessor works, and it seems to be fine for the compilers we support on power architectures.

The various CI failures that I'm seeing thus far are unrelated things that I'm still in the process of fixing. Don't worry about those.

Currently we technically support ppc, but it's not tested regularly and will likely be dropped some time in the not too far future. That's particularly true of big-endian configs. I'm happy to merge this, but I'm not sure if it's worthwhile devoting more of anyone's time to fixing any lingering test suite errors if the fix isn't obvious (especially if the tests are mostly passing now). There are lots of other relatively pressing bugs on other platforms that we do want to support long-term. If there's some kind of urgent need on your end I'm happy to look into it though.

@insertinterestingnamehere insertinterestingnamehere changed the base branch from main to release-1.22-pre October 17, 2024 19:55
@insertinterestingnamehere insertinterestingnamehere merged commit 5fca9c6 into sandialabs:release-1.22-pre Oct 17, 2024
222 of 261 checks passed
@bradcray
Copy link

bradcray commented Oct 17, 2024

@PHHargrove: Thanks for investigating this and submitting the fix!

@insertinterestingnamehere : From my / Chapel's perspective, I wouldn't consider anything here urgent. I'm not aware of Chapel users who are currently interested in or invested in BE systems, so am not deeply concerned about lack of support and think it's more a question for the Qthreads team about whether to invest more time here, based on your priorities and goals, than for us to say what you should do. Tellingly, we also don't do any BE testing. Interestingly, I also don't think we run make check (perhaps to our detriment), so if the problem were in the check itself rather than the library, we might just be fine (he said in an optimistic and possibly naive way); and even if that turned out not to be the case, Chapel users on such systems could opt out of Qthreads going forward (and we could invest more effort to automatically avoid Qthreads on BE systems if that was the case).

[edit: I've filed https://github.com/chapel-lang/chapel/issues/26109 as a placeholder so if, years from now, someone is hitting an issue on a BE system and searches our repo, we have some breadcrumbs back here.]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants