Provide parmmg with complete local vector including halos by stephankramer · Pull Request #228 · mesh-adaptation/animate

stephankramer · 2026-02-23T14:38:33Z

Following on from #226

This changes to_petsc_local_numbering to take in a local vector (i.e. a sequential vector that includes halo DOFs) and returns a local vector in the "natural ordering" of the dmplex topological points. This is what the dmplex parmmg interface expects from the metric input vector. Should not change anything in serial.

Appears to fix the warnings about indefinite metric in parmmg tests - which I think were caused by halo entries that were copied, out-of-bounds, from the global vector we were providing previously

Closes #181

Hopefully also fixes #215

joewallwork

Thanks @stephankramer, I just have a suggestion and a small request then we can merge this into #226. Once we do that, I'll open a PR to remove to_petsc_local_numbering from Firedrake.

We should try and come up with some useful unit tests but that can be done as a follow-up.

animate/cython/numbering.pyx

This changes to_petsc_local_numbering to take in a local vector (i.e. a sequential vector that includes halo DOFs) and returns a local vector in the "natural ordering" of the dmplex topological points. This is what the dmplex parmmg interface expects from the metric input vector. Should not change anything in serial. Appears to fix the warnings about indefinite metric in parmmg tests Closes #181 Hopefully also fixes #215

stephankramer · 2026-02-25T13:57:53Z

Hm unfortunately does not seem to fix #215
It does get rid of the indefinite metric warnings

stephankramer · 2026-02-26T10:46:31Z

Seems to actually have gotten worse! The time out seems to happen consistently now. Investigating...

Hangs occur in python system exit when a final PETSc garbage clean up is underway. Using PETSc.garbage_view() to look at potential places where things potentially go out of sync, the serialised checkpointing might be a potential culprit. Adding a barrier after the initial collective checkpoint write, because it's not clear to me that it's safe for rank 0 to immediately open it again for reading if maybe other ranks are not yet ready writing. Also adding an explicit gc.collect() after the diverging code path to ensure consistency.

stephankramer · 2026-02-27T12:17:15Z

Right, this seems to have worked. Unfortunately this is potentially a voodoo fix.

As I described on #215, the hangs occur in the final PETSc garbage collection that is called on python system exit (so after pytest has already finished, with all tests passing). Looking at historical firedrake issue with this: https://github.com/firedrakeproject/firedrake/issues?q=is%3Aissue%20hang%20garbage these failures can be notoriously hard/irregular to reproduce, as it depends on at which times and order the different processes decide to call garbage collection which is very unpredictable and may depend on timing issues. Indeed trying to reproduce the hangs in the CI container, it appears the issue never occurs running any of the parallel tests individually. And only occurs having already run all serial tests (which I suppose affects the cache), and then running all (or a substantial subset) of the parallel tests. In my tests it seemed it was required for the serialised mmg3d to be included for the hang to occur. So it may well be that the fact that the changes in this PR to to_petsc_local_numbering appear to make the CI fail consistently (when running all tests in order), as opposed to 50% of the times on main, is just a fluke. Therefore also my "fix" to try to make the behaviour in the serialised checkpoints a bit more consistent - by adding a barrier and an explicit garbage collect call - might be a fluke, and there could still be an issue somewhere in the parallel code paths.

I'll try to run the CI with this fix a few time and if it now passes consistently, I propose to merge this with the fix, as is. I still believe the changes to to_petsc_local_numbering are correct and needed for correct behaviour with parmmg3d. @joewallwork if you could have another look: I've addressed your previous review and expanded the numbering test to parallel

stephankramer · 2026-02-27T16:09:15Z

CI has passed successfully four times in a row now, which is better than main

joewallwork

Nice, this is great @stephankramer! Thanks for your work on this.

All of my comments and suggestions are minor, mostly to do with wording etc and for consistency with the approach elsewhere in the package.

animate/cython/numbering.pyx

test/test_numbering.py

animate/adapt.py

joewallwork · 2026-02-27T16:57:09Z

test/test_numbering.py

+    # Check that this is the case:
+    owned = f.dat.data_ro.flatten()
+    halos = f.dat.data_ro_with_halos[owned.size:].flatten()
+    np.testing.assert_equal(owned, np.arange(owned.size) + rank_fraction)


Is there a reason you didn't use

Suggested change

np.testing.assert_equal(owned, np.arange(owned.size) + rank_fraction)

assert np.allclose(owned, np.arange(owned.size) + rank_fraction)

like we do elsewhere?

You get a much more informative message when the test fails. With np.allclose(...) the expression just evaluates the False and it just says AssertionError with np.testing it prints the actual and desired arrays (or part of it if long) and some stats on the differences.

Happy to change to np.allclose for consistency, or here actually np.all(a==b) would be the equivalent for np.testing.assert_equal which tests strictly - but np.testing is pretty standard in my view. Not a big fan of pytest.approx btw, it seems a little opaque

Fair enough. Happy to stick with this then.

animate/adapt.py

test/test_numbering.py

Co-authored-by: Joe Wallwork <22053413+joewallwork@users.noreply.github.com>

joewallwork

Thanks for the updates, Stephan. Bar one typo this is good to go!

test/test_numbering.py

joewallwork · 2026-03-02T14:42:55Z

test/test_numbering.py

+    # Check that this is the case:
+    owned = f.dat.data_ro.flatten()
+    halos = f.dat.data_ro_with_halos[owned.size:].flatten()
+    np.testing.assert_equal(owned, np.arange(owned.size) + rank_fraction)


Fair enough. Happy to stick with this then.

Co-authored-by: Joe Wallwork <22053413+joewallwork@users.noreply.github.com>

stephankramer · 2026-03-02T14:45:12Z

Ah thanks for catching that, all addressed now

stephankramer requested a review from joewallwork as a code owner February 23, 2026 14:38

joewallwork changed the base branch from main to 225_to-petsc-local-numbering February 23, 2026 15:27

stephankramer added the bug Something isn't working label Feb 23, 2026

joewallwork requested changes Feb 23, 2026

View reviewed changes

animate/cython/numbering.pyx Outdated Show resolved Hide resolved

animate/cython/numbering.pyx Outdated Show resolved Hide resolved

stephankramer mentioned this pull request Feb 24, 2026

Move to_petsc_local_numbering into Animate #226

Merged

Base automatically changed from 225_to-petsc-local-numbering to main February 24, 2026 15:28

stephankramer force-pushed the fix-metric-ordering branch from c9fb638 to 1099564 Compare February 25, 2026 11:24

stephankramer added 3 commits February 25, 2026 11:27

Address review comments

2c2dcfa

Extend numbering test to parallel

49a5595

stephankramer force-pushed the fix-metric-ordering branch from 1099564 to 49a5595 Compare February 25, 2026 11:27

stephankramer added 2 commits February 26, 2026 10:48

Missing vec destroy in test.

58696e0

stephankramer force-pushed the fix-metric-ordering branch from aed513c to ed0fe44 Compare February 27, 2026 11:37

stephankramer requested a review from joewallwork February 27, 2026 12:17

stephankramer mentioned this pull request Feb 27, 2026

[5.2] Adaptive mesh refinement: functional parallel 3D implementation g-adopt/g-adopt#437

Open

joewallwork requested changes Feb 27, 2026

View reviewed changes

stephankramer and others added 3 commits March 2, 2026 11:18

Apply suggestions from code review

422ce99

Co-authored-by: Joe Wallwork <22053413+joewallwork@users.noreply.github.com>

Address review.

d9f891f

Add comment

801a24d

joewallwork approved these changes Mar 2, 2026

View reviewed changes

Update test/test_numbering.py

4e59db8

Co-authored-by: Joe Wallwork <22053413+joewallwork@users.noreply.github.com>

stephankramer merged commit a9300ff into main Mar 2, 2026
1 check passed

	np.testing.assert_equal(owned, np.arange(owned.size) + rank_fraction)
	assert np.allclose(owned, np.arange(owned.size) + rank_fraction)

Conversation

stephankramer commented Feb 23, 2026

Uh oh!

joewallwork left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

stephankramer commented Feb 25, 2026

Uh oh!

stephankramer commented Feb 26, 2026

Uh oh!

stephankramer commented Feb 27, 2026

Uh oh!

stephankramer commented Feb 27, 2026

Uh oh!

joewallwork left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

joewallwork Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

stephankramer Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

joewallwork Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

joewallwork left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

joewallwork Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

stephankramer commented Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants