Strategies for Optimizing GPU-based EOS & Opacity Table Lookups in AMReX #4744

lwJi · 2025-10-31T20:44:31Z

lwJi
Oct 31, 2025

I'm working on an AMReX-based simulation where our physics kernels, running on the GPU, require frequent lookups from large, read-only Equation of State (EOS) and opacity tables (e.g., 2D or 3D tables, ~500MB size for each table).

Getting this right is critical for performance, and I'm facing the classic challenge of balancing memory management, data locality, and efficient computation. The two main hurdles are:

Efficient Loading: What is the best way to get these large tables into GPU memory so they are globally accessible to all GPU threads running AMReX kernels?
Fast Interpolation: Once the data is on the GPU, what is the most performant way to perform interpolation (e.g., bilinear, trilinear) inside a kernel launched with amrex::launch or amrex::MFIter without stalling the pipeline?

I'm looking for advice or best practices from the community on a few specific points:

Data Management: What is the recommended AMReX approach for this? Should the tables be stored in Gpu::ManagedVector, Gpu::DeviceVector, or perhaps allocated via amrex::The_Arena()? How do you handle initialization and copy-to-device?
Kernel Access: What's the cleanest way to pass this table data to a kernel? Should I be passing raw pointers, or is it better to wrap the table in a struct or Gpu::Device object that also contains its metadata (dims, min/max bounds, etc.)?
Texture Memory: Has anyone had success using GPU texture memory for this? It seems ideally suited for this problem (hardware-accelerated interpolation, optimized caching for spatial locality), but I'm unsure how to integrate it cleanly with AMReX's data paradigm. Is this a recommended approach?
Interpolation Code: Are there standard, GPU-optimized interpolation utilities within AMReX that I should be using, or is this typically a case of "roll your own" device-side functions?

Answered by WeiqunZhang

Oct 31, 2025

If you want texture memory, you will need to roll your own. Otherwise, you might want to have a look at amrex::TableData https://amrex-codes.github.io/amrex/doxygen/classamrex_1_1TableData.html#details.

If you run your jobs on a lot of nodes, having every process read 500 MB of data all at the same time might be bad for file system performance. Here is an example of limiting the number of processes doing I/O at the same time.

amrex/Tests/FillBoundaryComparison/main.cpp

Line 33 in 3c1f1c7

int nAtOnce = std::min(ParallelDescriptor::NProcs(), 32);

View full answer

WeiqunZhang · 2025-10-31T22:28:55Z

WeiqunZhang
Oct 31, 2025
Maintainer

If you want texture memory, you will need to roll your own. Otherwise, you might want to have a look at amrex::TableData https://amrex-codes.github.io/amrex/doxygen/classamrex_1_1TableData.html#details.

If you run your jobs on a lot of nodes, having every process read 500 MB of data all at the same time might be bad for file system performance. Here is an example of limiting the number of processes doing I/O at the same time.

amrex/Tests/FillBoundaryComparison/main.cpp

Line 33 in 3c1f1c7

int nAtOnce = std::min(ParallelDescriptor::NProcs(), 32);

6 replies

lwJi Nov 7, 2025
Author

@WeiqunZhang , could you comment on the strategy which read the table on one process and then broadcast the table to the rest of the processes?

zingale Nov 7, 2025
Collaborator

we do this for our equation of state table:

https://github.com/AMReX-Astro/Microphysics/blob/main/EOS/helmholtz/actual_eos.H#L1418

WeiqunZhang Nov 7, 2025
Maintainer

Yes, it's a good solution. You don't have to do the parallel I/O way I showed you, given that it's a one-time cost.

lwJi Nov 7, 2025
Author

we do this for our equation of state table:

https://github.com/AMReX-Astro/Microphysics/blob/main/EOS/helmholtz/actual_eos.H#L1418

Thanks @zingale

lwJi Nov 7, 2025
Author

Yes, it's a good solution. You don't have to do the parallel I/O way I showed you, given that it's a one-time cost.

Thanks @WeiqunZhang

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Strategies for Optimizing GPU-based EOS & Opacity Table Lookups in AMReX #4744

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 6 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Strategies for Optimizing GPU-based EOS & Opacity Table Lookups in AMReX #4744

Uh oh!

lwJi Oct 31, 2025

Replies: 1 comment · 6 replies

Uh oh!

WeiqunZhang Oct 31, 2025 Maintainer

Uh oh!

lwJi Nov 7, 2025 Author

Uh oh!

zingale Nov 7, 2025 Collaborator

Uh oh!

WeiqunZhang Nov 7, 2025 Maintainer

Uh oh!

lwJi Nov 7, 2025 Author

Uh oh!

lwJi Nov 7, 2025 Author

lwJi
Oct 31, 2025

Replies: 1 comment 6 replies

WeiqunZhang
Oct 31, 2025
Maintainer

lwJi Nov 7, 2025
Author

zingale Nov 7, 2025
Collaborator

WeiqunZhang Nov 7, 2025
Maintainer

lwJi Nov 7, 2025
Author

lwJi Nov 7, 2025
Author