Skip to content

mini-ramses frontend#5380

Open
James11222 wants to merge 5 commits intoyt-project:mainfrom
James11222:mini-ramses
Open

mini-ramses frontend#5380
James11222 wants to merge 5 commits intoyt-project:mainfrom
James11222:mini-ramses

Conversation

@James11222
Copy link

PR Summary

This pull request includes a new fully functional frontend for mini-ramses. With the assistance of github copilot I was able to learn from the valuable insights of Corentin and managed to get a tested, fully functional (and I think just as fast as ramses) frontend working for mini-ramses. I tested on cosmological and non-cosmological simulations and it appears to be working identically to the RAMSES frontend. Currently I have it reading in all particle data (dark matter, star, sink) and AMR data (grav, hydro). I have not ensured compatibility with MHD or RT mini-ramses simulations yet.

Example plots made with fixed resolution buffers and particle data showcase it working in action.

merger cosmo_dmo

Fixes Issue: #5215

PR Checklist

  • New features are documented, with docstrings and narrative docs
  • Adds a test for any bugs fixed. Adds tests for new features.

Copilot AI and others added 5 commits February 17, 2026 14:59
…I/O handler, and fields

Key differences from RAMSES:
- Stream-based binary I/O (no Fortran record markers)
- info.txt has nfile first (not ncpu)
- File naming: amr.NNNNN, hydro.NNNNN (not amr_NNNNN.outNNNNN)
- Separate particle files per type (part, star, sink, tree, trac)
- Float32 primitive variables in hydro output
- Compact AMR format (ckey + refined per grid)
- Multi-integer Hilbert keys with different state diagrams

Co-authored-by: James11222 <49885659+James11222@users.noreply.github.com>
…parameter

Co-authored-by: James11222 <49885659+James11222@users.noreply.github.com>
…d-reader

Add mini-ramses frontend reader
Copilot AI review requested due to automatic review settings February 17, 2026 23:09
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new mini_ramses frontend in yt, including dataset/index/IO plumbing, field definitions, and a synthetic-output test suite, and registers the frontend in yt.frontends.

Changes:

  • Added MiniRAMSESDataset/MiniRAMSESIndex/domain subset + file sanitizer for mini-ramses outputs.
  • Added an IO handler for fluid (hydro/gravity) and particle stream-binary reads, plus basic field definitions.
  • Added pytest-based synthetic-output tests and registered mini_ramses in the frontend package list.

Reviewed changes

Copilot reviewed 7 out of 9 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
yt/frontends/mini_ramses/data_structures.py Core dataset/index/octree construction, file validation, and fluid data extraction logic.
yt/frontends/mini_ramses/io.py Fluid selection and particle IO routines for mini-ramses stream binary files.
yt/frontends/mini_ramses/fields.py Declares known fluid/particle fields and a derived temperature field.
yt/frontends/mini_ramses/definitions.py Filename patterns/regex and field alias groupings for mini-ramses.
yt/frontends/mini_ramses/api.py Exposes the frontend’s public API symbols.
yt/frontends/mini_ramses/tests/test_mini_ramses.py Synthetic mini-ramses output writer and basic dataset/sanitizer tests.
yt/frontends/mini_ramses/tests/__init__.py Test package marker for the new frontend.
yt/frontends/__init__.py Registers mini_ramses in the frontend namespace.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


import os
import struct
import tempfile
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tempfile is imported but never used, which will fail Ruff (F401) under the repo's lint configuration. Please remove the unused import (or use it if intended).

Suggested change
import tempfile

Copilot uses AI. Check for mistakes.
Comment on lines +199 to +203
def fill(self, fn, fields, selector, field_list):
"""Read and fill field data from a mini-ramses binary file.

Parameters
----------
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring block includes blank lines with trailing whitespace (e.g. immediately after the opening docstring). Ruff/pycodestyle will flag this as W293/W291. Please strip trailing whitespace throughout this file (running the repo formatter/linter should handle it).

Copilot uses AI. Check for mistakes.
Comment on lines +188 to +192
self, base_region, domain, ds, over_refine_factor=1, num_ghost_zones=0
):
super().__init__(
base_region, domain, ds, over_refine_factor, num_ghost_zones
)
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MiniRAMSESDomainSubset.__init__ passes over_refine_factor positionally into OctreeSubset(..., num_zones=...). For RAMSES-style octs this should be num_zones=2 (2×2×2 cells per oct); passing 1 sets num_zones=1 and can break reshaping/ghost-zone logic. Consider renaming the parameter to num_zones (default 2) and/or passing by keyword to avoid this mismatch.

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +23
# Separate fields by type
ftypes = {f[0] for f in fields}

for chunk in chunks:
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are blank lines containing whitespace in this block (e.g. after ftypes = ...). Ruff/pycodestyle will flag this as W293/W291. Please strip trailing whitespace (running the configured formatter/linter should fix it automatically).

Copilot uses AI. Check for mistakes.
self.base_region,
self._base_domain,
self.ds,
self._num_ghost_zones,
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retrieve_ghost_zones passes self._num_ghost_zones as the 4th positional argument to MiniRAMSESDomainSubset(...), which maps to num_zones in OctreeSubset.__init__. This will incorrectly set num_zones to the previous ghost-zone count. Pass the existing self._num_zones/num_zones=2 and set num_ghost_zones=ngz separately.

Suggested change
self._num_ghost_zones,
self._num_zones,

Copilot uses AI. Check for mistakes.
Comment on lines +680 to +683
self.min_level = params.get("levelmin", 1)
self.max_level = params.get("levelmax", 1)
self.domain_dimensions = np.ones(3, dtype="int64") * 2 ** (
self.min_level
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MiniRAMSESDataset sets min_level/max_level directly from info.txt and then uses domain_dimensions = 2**min_level. In the RAMSES frontend, ds.min_level is stored as levelmin - 1 and ds.max_level is stored relative to ds.min_level to keep octree level math consistent. Aligning to that convention will make ds.index.max_level, level stats, and cell-size calculations consistent across octree frontends.

Suggested change
self.min_level = params.get("levelmin", 1)
self.max_level = params.get("levelmax", 1)
self.domain_dimensions = np.ones(3, dtype="int64") * 2 ** (
self.min_level
levelmin = params.get("levelmin", 1)
levelmax = params.get("levelmax", 1)
# Align level bookkeeping with the RAMSES frontend convention:
# - store min_level as levelmin - 1
# - store max_level relative to min_level
self.min_level = levelmin - 1
self.max_level = levelmax - self.min_level
# The coarse grid domain has 2**levelmin cells per dimension.
# Since levelmin = self.min_level + 1, use (self.min_level + 1) here
# to keep domain_dimensions consistent with the RAMSES frontend.
self.domain_dimensions = np.ones(3, dtype="int64") * 2 ** (
self.min_level + 1

Copilot uses AI. Check for mistakes.
Comment on lines +199 to +203
def test_load_dataset(self, tmp_path):
outdir = _create_mini_ramses_output(str(tmp_path))
ds = MiniRAMSESDataset(outdir)
assert ds is not None
assert ds.dimensionality == 3
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests validate dataset detection and basic attributes, but they never actually access a fluid/particle field through yt data objects to exercise the IO handler + selection plumbing end-to-end. Adding at least one assertion that ds.all_data()[('mini-ramses','Density')] (and a particle field like ('io','particle_mass')) can be read and returns an array of the expected size would significantly improve coverage of the new frontend.

Copilot uses AI. Check for mistakes.
twotondim = 2**ndim

with open(domain.amr_fn, "rb") as f:
ndim_file = struct.unpack("i", f.read(4))[0]
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ndim_file is unpacked from the AMR header but never used. This will trigger Ruff F841, and also misses a cheap sanity check (e.g. verify it matches ds.dimensionality). Either validate it or unpack into _ to explicitly ignore it.

Suggested change
ndim_file = struct.unpack("i", f.read(4))[0]
ndim_file = struct.unpack("i", f.read(4))[0]
if ndim_file != ndim:
mylog.warning(
"Dimensionality mismatch in AMR header for domain %s: "
"file reports %dD, dataset expects %dD.",
domain.domain_id,
ndim_file,
ndim,
)

Copilot uses AI. Check for mistakes.
)

# Read refinement map (twotondim int32 values)
refined = np.array(
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refined is read for every grid but never used, which will fail Ruff (F841). If refinement info is intentionally ignored, unpack into _ (or delete the variable) to satisfy lint. Otherwise, consider using it for validation (e.g. ensure refinement flags are consistent with the octs present at the next level).

Suggested change
refined = np.array(
_refined = np.array(

Copilot uses AI. Check for mistakes.
Comment on lines +334 to +335
for icell in range(twotondim):
raw_data[fname][cell_offset + icell] = oct_data[fidx, icell]
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nested Python loops copying oct_data into raw_data (for fname... / for icell...) will be very slow for large outputs. This can be vectorized by slice-assigning oct_data[fidx, :] into raw_data[fname][cell_offset:cell_offset + twotondim], and ideally you should avoid reading/copying data for fields that weren't requested by the caller.

Suggested change
for icell in range(twotondim):
raw_data[fname][cell_offset + icell] = oct_data[fidx, icell]
# Vectorized copy: assign all cells for this field in one slice
raw_data[fname][cell_offset:cell_offset + twotondim] = oct_data[fidx, :]

Copilot uses AI. Check for mistakes.
Copy link
Member

@cphyc cphyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I'm on holidays but ping me if I haven't reviewed this come early March, I personally really want this included and shipped with the next release of yt.
Also, do you have a small-ish dataset (<1GiB) you could share so that we could host it on https://yt-project.org/data/?

@James11222
Copy link
Author

ya definitely, I can certainly get a nice small sample dataset to be hosted for you in the next week or so.

@cphyc cphyc added code frontends Things related to specific frontends new feature Something fun and new! labels Feb 18, 2026
Copy link
Contributor

@chrishavlin chrishavlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome! a couple of very quick notes on the new test file you added:

  1. super helpful to have the test itself mock up a dataset to use in the test. thank you!
  2. you'll want to exclude this test from the old nose testing framework that yt still uses for some of its tests: update the nose_ignores file at the yt root directory as well as tests/tests.yaml (add the filename to the the end of tests.yaml where it has a list of files to ignore)

Copy link
Member

@cphyc cphyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start!

However, I think there are a few pieces that would benefit from refactorisation so that the reader is more future-proof and flexible.

In addition, you're doing many small reads from the file + struct.unpack calls, which could (and probably) should be replaced by np.fromfile reads for performance.

One final decision is related to https://github.com/yt-project/yt/pull/5380/changes#r2864451870: do you want to have one octree per domain, or one global octree (as you currently have)?

The former option is more efficient, the second is easier and avoids a few minor plotting artefacts.

Comment on lines +111 to +113
self.amr_fn = os.path.join(basename, f"amr.{icpu_str}")
self.hydro_fn = os.path.join(basename, f"hydro.{icpu_str}")
self.grav_fn = os.path.join(basename, f"grav.{icpu_str}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This specific pattern makes it slightly more complicated to later add support for rt later.

In the RAMSES dataset, I've used a "FieldFileHandler" (see https://github.com/yt-project/yt/blob/main/yt/frontends/ramses/field_handlers.py) that is specialised for hydro, grav and rt files.
This allows us to handle each file type separately and to factorize code.


def _read_amr_header(self):
"""Read the AMR header from the stream-based binary file."""
if not os.path.exists(self.amr_fn):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this case really useful? As in, can a dataset be read without any AMR file?

self._level_count = np.zeros(nlevels, dtype="int64")
for ilevel in range(levelmin - 1, nlevelmax):
idx = ilevel - (self.ds.min_level - 1)
if 0 <= idx < nlevels:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this ever be false if levelmin and nlevelmax are properly set for the dataset? This condition seems like a tautology.

Comment on lines +156 to +160
self._level_offsets = np.zeros(nlevels, dtype="int64")
cumsum = 0
for i in range(nlevels):
self._level_offsets[i] = cumsum
cumsum += self._level_count[i]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self._level_offsets = np.zeros(nlevels, dtype="int64")
cumsum = 0
for i in range(nlevels):
self._level_offsets[i] = cumsum
cumsum += self._level_count[i]
self._level_offsets = self._level_count.cumsum() - self._level_count[0]

Comment on lines +162 to +178
def _read_hydro_header(self):
"""Read the hydro header to get nvar."""
if not os.path.exists(self.hydro_fn):
return 0
with open(self.hydro_fn, "rb") as f:
ndim = struct.unpack("i", f.read(4))[0]
nvar = struct.unpack("i", f.read(4))[0]
return nvar

def _read_grav_header(self):
"""Read the gravity header to get nvar."""
if not os.path.exists(self.grav_fn):
return 0
with open(self.grav_fn, "rb") as f:
ndim = struct.unpack("i", f.read(4))[0]
nvar = struct.unpack("i", f.read(4))[0]
return nvar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: If you move the handling of hydro/grav to a class of its own, you'll be able to prevent the repetition here

Comment on lines +149 to +159
"particle_position_x": "pos_x",
"particle_position_y": "pos_y",
"particle_position_z": "pos_z",
"particle_velocity_x": "vel_x",
"particle_velocity_y": "vel_y",
"particle_velocity_z": "vel_z",
"particle_mass": "mass",
"particle_refinement_level": "level",
"particle_identity": "birth_id",
"particle_metallicity": "metallicity",
"particle_birth_time": "birth_date",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definition (and the other mappings) belong in definitions.py

return [], {}


def _detect_optional_particle_fields(header_fn, prefix):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very similar to what you already used in _detect_particle_fields in data_structure.py. Could you refactor?


# Select the requested cells
for ftype, fname in fields:
if fname in all_data:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect that _read_all_cell_data has the promise to read everything, so it shouldn't be necessary to check here that fname is in all_data.

Comment on lines +324 to +337
for igrid in range(ncache):
# Each oct: nvar fields * twotondim cells * 4 bytes
# Layout is (nvar, twotondim) - field-major
oct_data = np.frombuffer(
f.read(4 * twotondim * f_nvar),
dtype="<f4"
).reshape(f_nvar, twotondim)

# Store each field's data for this oct's cells
for fname, fidx in field_indices.items():
for icell in range(twotondim):
raw_data[fname][cell_offset + icell] = oct_data[fidx, icell]

cell_offset += twotondim
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest reading everything in one go.
Note that the code below is only valid is field_indices is sorted by values (such that the field names are in the same order as they appear on file).

                # Since we're reading everything, we can do it in one pass:
                oct_data_t = np.dtype([
                    fname, (">f4", twotondim)
                    for fname in field_indices.keys()
                ])
                oct_data = np.fromfile(f, oct_data_t, ncache)
                
                # Store each field's data for this oct's cells
                for fname, fidx in field_indices.items():
                    raw_data[fname] = oct_data[fname].flatten()

root_nodes = sum(int(d.level_count[0]) for d in self.domains)
mylog.info("Allocating %s octs", domain_counts.sum())

self.oct_handler = RAMSESOctreeContainer(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RAMSESOctreeContainer is a specification for RAMSES so that it can handle missing octs at levelmin. However, here, since you're adding everything into a single oct, you can probably use the (slightly more efficient) OctreeContainer.

Alternatively, you can also use the same strategy as RAMSES and have each domain have its own octree, in which case you'd need to RAMSESOctreeContainer. The main advantage of such an approach is that, when you iterate over chunks (each of the subfiles), you only need to build the octree of that file and not the others. That will simplify parallelisation greatly (in a distant future!). Otherwise, every parallel task would need to read the entirety of the AMR files, which duplicates efforts and kills performance.

The main disadvantage is that operations that require interaction with the oct (such as particle deposition) will only have access to the oct of their domain, so you'll have small artefacts at the boundary between domains (I think this is what we're seeing e.g. in #1573).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

code frontends Things related to specific frontends new feature Something fun and new!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants