Multiscale ome-zarr v3 writer with sharding support by AlanMWatson · Pull Request #89 · mesoSPIM/mesoSPIM-control

AlanMWatson · 2025-10-24T22:13:12Z

A builtin ome-zarr v3 writer for mesoSPIM.

Support:

On-the-fly assembly of multiscales during acquisition
High-performance multi threading keeps pace with data acquisition
Compression
Sharding
Chunksize can be adjusted with each multiscale
2D downsampling for anisotropic data. Then 3D downsampling after multiscales converge to isotropic.

Notes:

Each tile is stored in its own ome-zarr dataset with a similar naming convention to other data acquisition methods
Defaults were tested on several systems using the demo config and long duration scans >3 hours.

nvladimus · 2025-10-27T12:53:04Z

Dear Alan,
Thank you for this PR, very impressive!! 💯
I managed to test in demo-hardware mode, with two tiles, it worked nicely overall. I found a few issues so far:

There seems to be no transformation information (tile positions in zarr.json file), each tile is writtent in its own ZARR file, with no connection between them.
the OME-NGGF validator found some errors, namely No dimension_names for dataset 0 (etc)
the ReussMouseBrain-2x2Tiles-2Ch-2Arms dataset, which we can use as a reference for BigStitcher flavor of meta-information, packs each tile as a separate ZARR inside a root ZARR folder: Reuss-MousBrain2x2tiles-2ch.ome.zarr/s0-t0.zarr/i (i=0, 1, 2, 3)

I will look into these issues more closely, jut wanted to give a quick feedback.

… level.

… data is dumped together

AlanMWatson · 2025-10-27T16:37:49Z

Thank you Nikita!

There seems to be no transformation information (tile positions in zarr.json file), each tile is writtent in its own ZARR file, with no connection between them.

I’ve added stage coordinates to the metadata for each array/scale in the multiscale tile. I tested by loading individual tiles in Neuroglancer, and they show the correct relative placement in the acquisition grid.

the OME-NGGF validator found some errors, namely No dimension_names for dataset 0 (etc)

This was nested incorrectly; I’ve fixed it.
ome-zarr-models validate now returns: “✅ Valid OME-Zarr”.

the ReussMouseBrain-2x2Tiles-2Ch-2Arms dataset, which we can use as a reference for BigStitcher flavor of meta-information, packs each tile as a separate ZARR inside a root ZARR folder: Reuss-MousBrain2x2tiles-2ch.ome.zarr/s0-t0.zarr/i (i=0, 1, 2, 3)

What do you think is the best strategy for representing the collection of tiles? It isn’t clear to me whether OME-Zarr defines a single, recommended pattern for this - if at all. With the stage coordinates embedded as described above, the tiles now connect spatially. However, there is still no metadata that defines these tiles as a collection (i.e. part of the same logical grid)... We could write zarr.json group data in the root of the acquisition directory that points to each multiscale ome-zarr for each tile. Maybe something like this:

{
  "image_collection": {
    "version": "0.1-experimental",
    "images": [
      {"path": "r0_c0.ome.zarr"},
      {"path": "r0_c1.ome.zarr"},
      {"path": "r1_c0.ome.zarr"},
      {"path": "r1_c1.ome.zarr"}
    ]
  }
}

I think BigStitcher will want to see a more detailed XML which could be produced by mesoSPIM and dropped in the same place. It needs to also define "Translation to Regular Grid". Following this model, and maybe more consistant with resuability of the ome-zarr tile data, the translation coordinate transforms could be written to the root zarr.json and removed from individual tiles. That might look something like this:


{
  "axes": [
    {"name": "z", "type": "space", "unit": "micrometer"},
    {"name": "y", "type": "space", "unit": "micrometer"},
    {"name": "x", "type": "space", "unit": "micrometer"}
  ],
  "image_collection": {
    "version": "0.1-experimental",
    "axes": ["z","y","x"],
    "unit": "micrometer",
    "images": [
      {
        "path": "r0_c0.zarr",
        "coordinateTransformations": [
          {"type": "translation", "translation": [0.0, 0.0, 0.0]}
        ]
      },
      {
        "path": "r0_c1.zarr",
        "coordinateTransformations": [
          {"type": "translation", "translation": [0.0, 0.0, 666.0]}
        ]
      },
      {
        "path": "r1_c0.zarr",
        "coordinateTransformations": [
          {"type": "translation", "translation": [0.0, 666.0, 0.0]}
        ]
      },
      {
        "path": "r1_c1.zarr",
        "coordinateTransformations": [
          {"type": "translation", "translation": [0.0, 666.0, 666.0]}
        ]
      }
    ]
  }
}

nvladimus · 2025-10-28T09:46:16Z

Dear Alan,
Thanks for these updates! From what I can see in the ReussMouseBrain-2x2Tiles-2Ch-2Arms dataset, BigStitcher currently exports with the following ZARR structure:

root: dataset.ome.zarr group + dataset.xml
unique combinations of channels (488, 561, c=2), illuminations (left/right, i=2), and tiles T=4 create N=c*i*T=16 setups: dataset.ome.zarr/s0-t0.zarr, .. dataset.zarr/s15-t0.zarr. Parsing of these into correct channel/illumination names and tile positions happens via the BigStitcher-specific dataset.xml file.
each setup eg dataset.ome.zarr/s0-t0.zarr is a group that has has children /0 .. /3 that are downsampled levels of this setup, with the following meta-info (see below). I presume that the small pixel translations in higher-order levels (>=1) are due to averaging offsets, so all pyramids are aligned across scales.
each setup has always two zero nested sub-folders eg dataset.ome.zarr/s0-t0.zarr/3/0/0/z/y/x , I presume /0/0 are placeholders for time and channel (always 0 in this version). @StephanPreibisch can you weigh in on this?

I started a new branch for testing ome_zarr_writer, so you can see what I changed, since I cannot push my commits directly into this PR. It currently writes dataset.ome.zarr/s0-t0.zarr/0/c/z/y/x format, but for some reason I cannot get rid of c in this path. It looks like this c/ folder is part of Zarr 3 specification?

@StephanPreibisch do you expect any changes to this structure in the near future?
@m-albert is this compatible with multi-view stitcher?

# BigStitcher meta-info on setup level, eg  `dataset.zarr/s0-t0.zarr` 
{
  "multiscales": [
    {
      "name": "/",
      "version": "0.4",
      "axes": [
        {
          "type": "time",
          "name": "t",
          "unit": "millisecond",
          "discrete": false
        },
        {
          "type": "channel",
          "name": "c",
          "discrete": false
        },
        {
          "type": "space",
          "name": "z",
          "unit": "micrometer",
          "discrete": false
        },
        {
          "type": "space",
          "name": "y",
          "unit": "micrometer",
          "discrete": false
        },
        {
          "type": "space",
          "name": "x",
          "unit": "micrometer",
          "discrete": false
        }
      ],
      "datasets": [
        {
          "path": "0",
          "coordinateTransformations": [
            {
              "scale": [1.0, 1.0, 1.0, 1.0, 1.0],
              "type": "scale"
            },
            {
              "translation": [0.0, 0.0, 0.0, 0.0, 0.0],
              "type": "translation"
            }
          ]
        },
        {
          "path": "1",
          "coordinateTransformations": [
            {
              "scale": [1.0, 1.0, 1.0, 2.0, 2.0],
              "type": "scale"
            },
            {
              "translation": [0.0, 0.0, 0.0, 0.5, 0.5],
              "type": "translation"
            }
          ]
        },
        {
          "path": "2",
          "coordinateTransformations": [
            {
              "scale": [1.0, 1.0, 2.0, 4.0, 4.0],
              "type": "scale"
            },
            {
              "translation": [0.0, 0.0, 0.5, 1.5, 1.5],
              "type": "translation"
            }
          ]
        },
        {
          "path": "3",
          "coordinateTransformations": [
            {
              "scale": [1.0, 1.0, 4.0, 8.0, 8.0],
              "type": "scale"
            },
            {
              "translation": [0.0, 0.0, 1.5, 3.5, 3.5],
              "type": "translation"
            }
          ]
        }
      ],
      "coordinateTransformations": [
        {
          "scale": [1.0, 1.0, 1.0, 1.0, 1.0],
          "type": "scale"
        }
      ],
      "basePath": "",
      "paths": ["0", "1", "2", "3"],
      "units": ["micrometer", "micrometer", "micrometer", "micrometer"]
    }
  ]
}

# BigStitcher meta-info for `dataset.zarr/s0-t0.zarr/0/` array level
{
  "shape": [1, 1, 615, 2048, 2048],
  "chunks": [1, 1, 64, 128, 128],
  "fill_value": 0,
  "dtype": ">u2",
  "filters": [],
  "dimension_separator": "/",
  "zarr_format": 2,
  "order": "C",
  "compressor": {
    "id": "zstd",
    "level": 3
  }
}

…_writer

AlanMWatson · 2025-10-28T17:13:01Z

Nikitta,

It currently writes dataset.ome.zarr/s0-t0.zarr/0/c/z/y/x format

It is still writing each tile in a separate ...tile{}_dataset.ome.zarr/s{}-t{}.zarr/... structure. The tiles are already formatted as fully defined ome-zarrs. I suggest that if we want to use this strucuture, that we nest all of the s{}-t{}.zarr named tiles in a single <some_name>.ome.zarr folder - which is what the BigStitcher example is doing.

for some reason I cannot get rid of c in this path. It looks like this c/ folder is part of Zarr 3 specification?

This is what I am finding as well. 'chunk key namespace'

AlanMWatson · 2025-10-28T21:19:35Z

Nikitta,

I suggest that if we want to use this strucuture, that we nest all of the s{}-t{}.zarr named tiles in a single <some_name>.ome.zarr folder - which is what the BigStitcher example is doing.

I pushed a fix that will nest the multiscale tiles in a single directory. Todo: Write zarr.json group metadata in this folder.

m-albert · 2025-10-29T11:19:16Z

Thanks for the ping @nvladimus!

First of all, this is amazing @AlanMWatson! It'll be super convenient if mesospim data is produced as OME-Zarr directly, certainly from the perspective of downstream processing and visualization.

@m-albert is this compatible with multi-view stitcher?

For processing the mesospim data (that you had uploaded) using this notebook, it was required to parse the tile transformations from the bigstitcher-xml file (you probably produced it using bigstitcher)? This is because while the OME-Zarr metadata of the tiles (the same as you copied here above) do contain translation transforms, those actually don't contain the actual translations of the tiles (the non-zero entries there relate to the small offsets of the different resolution levels).

So I think it'd be useful to include the tile translations in the tile OME-Zarr metadata (propagated to the resolution levels, i.e. just adding the offsets). That way, tools that support OME-Zarr transformation metadata can place the tiles properly.

I.e. for a tile positioned at (z,y,x) = (100, 200, 300), the metadata would be:

      "datasets": [
        {
          "path": "0",
          "coordinateTransformations": [
            {
              "scale": [1.0, 1.0, 1.0, 1.0, 1.0],
              "type": "scale"
            },
            {
              "translation": [0.0, 0.0, 100, 200,  300],
              "type": "translation"
            }
          ]
        },
        {
          "path": "1",
          "coordinateTransformations": [
            {
              "scale": [1.0, 1.0, 1.0, 2.0, 2.0],
              "type": "scale"
            },
            {
              "translation": [0.0, 0.0, 100, 200.5, 300.5],
              "type": "translation"
            }
          ]
        },
    ...

It isn’t clear to me whether OME-Zarr defines a single, recommended pattern for this - if at all. With the stage coordinates embedded as described above, the tiles now connect spatially. However, there is still no metadata that defines these tiles as a collection (i.e. part of the same logical grid)...

@AlanMWatson Representing collections in OME-Zarr is under active development! It's currently about to be an "RFC" (I think it stands for request for comments), worked on here: ome/ngff#343

I think BigStitcher will want to see a more detailed XML which could be produced by mesoSPIM and dropped in the same place. It needs to also define "Translation to Regular Grid". Following this model, and maybe more consistant with resuability of the ome-zarr tile data, the translation coordinate transforms could be written to the root zarr.json and removed from individual tiles. That might look something like this:

As far as I know, bigstitcher does not read pure OME-Zarr yet but only in combination with xml. Most likely bigstitcher ignores the tiles transformations in the OME-Zarr files (to be tested), so adding these there should not impact loading datasets in bigstitcher.

nvladimus · 2025-10-29T11:41:03Z

Dear @AlanMWatson and @m-albert!
Thank you for the updates! I think we are close to a working solution. I looked up exaSPIM data structure example, tiles are named eg as tile_000000_ch_488.zarr, sit in the same .ome.zarr directory, and each of them has tczyx axes structure. The parsing into tile positions, channel names and illiminations is done via customized BigStitcher-style XML file. We just need to see if BigStitcher chokes when there is c/ subfolder in the path due to ZARR v3 chunk name space.

AlanMWatson · 2025-10-29T12:32:46Z

Thanks @m-albert for the valuable feedback!

So I think it'd be useful to include the tile translations in the tile OME-Zarr metadata (propagated to the resolution levels, i.e. just adding the offsets). That way, tools that support OME-Zarr transformation metadata can place the tiles properly.

We are now including the stage positions in the coordinateTransformations, and I have tested it with multiview-stitcher and it works to automatically place the tiles on the grid!

@nvladimus ,

We can include t,c axes in all arrays by default. It should not be required by ngff spec, as I understand it, but if it adds a layer of compatibility then it may be a good idea. What do you think?

As for Bigstitcher, we should see if it supports the v0.5 spec (zarr v3). If so, the 'c' directory should be fine.

I did say that I would implement zarr v2 (0.4 spec) and did not get to it yet.

…ming into 1 function

AlanMWatson · 2025-10-31T18:54:55Z

Hi @nvladimus,

I noticed that '*_meta.txt' files for both omezarr and h5 were not writing correctly with the latest version of this branch. Also MAX IPs were not being saved. I pushed fixes and also consolidated the file naming into a single function to make it easer to manage.

nvladimus · 2025-11-07T14:45:24Z

I think I nailed the XML writer finally (took longer than I expected), the XML header file is now generated for OME ZARR v0.4 dataset, for drag-n-drop compatibility with BigStitcher. Branch: ome_zarr_writer,

nvladimus · 2025-11-10T13:39:23Z

Hi, @AlanMWatson!
I am ready merge this PR to release/candidate-py12 branch. Given that you are working on big changes in the plugin branch, shall I still do it? Or wait for your new plugin PR that will include this OME-ZARR PR?

AlanMWatson · 2025-11-10T15:25:43Z

Hi @nvladimus,

That's great! I think ultimately whether to merge is up to you and based on your timeframe for the release. I think the plugin branch is currently working as a proof of concept, and to your point, I will try to integrate the ome zarr wrtiter asap and also reproduce a RAW writer. The branch needs some testing and discussion about the best way to structure these plugins for broad compatibility. Do you want me to submit it as a PR as is and continue development within the PR, or should I wait until it is more complete? Thanks!

nvladimus · 2025-11-11T08:42:22Z

Merged into branch release/candidate-py312, will start testing on mesoSPIM hardware.

AlanMWatson and others added 8 commits October 24, 2025 13:19

Working Implementation of multiscale ome-zarr writer

58d8090

async finalize and config notes

767a08d

Fix no replace with underscore in metadata file

2cefafa

Cleanup omezarr writer

1a5141d

cleanup

cbca098

Restore MesoSPIM.bat

3da4732

mem efficiency for downsample

e0defc3

serve local files for OME-ZARR validator

7c0f20a

AlanMWatson and others added 3 commits October 27, 2025 09:44

Add stage coordinates to ome-zarr coordinate transforms

1b5bf71

dimension_names for each array were nested in attributes. Move to top…

c6271e1

… level.

writes tiles into single OME-ZARR file, but tiling is unresolved, all…

7542324

… data is dumped together

Merge branch 'pr/89' into ome_zarr_writer

c5225b0

nvladimus and others added 4 commits October 28, 2025 11:46

save each setup (raw in acq table) into separate ZARR inside root ZARR

12a011b

drop 's' from the names of levels, for BigStitcher compatibility

887b916

Memory efficiency during down sampling

21dcfb5

Merge remote-tracking branch 'mesoSPIM/ome_zarr_writer' into ome_zarr…

93e3b0f

…_writer

Nest multscale tiles in a single ome.zarr directory

8d0327f

establish zarr group and write metadata to

d24602d

nvladimus mentioned this pull request Oct 29, 2025

Which OME ZARR version is supported? JaneliaSciComp/BigStitcher#160

Open

AlanMWatson added 2 commits October 31, 2025 14:46

Restore metadata and MAX IP writing for omezarr, h5 and place file na…

e316f73

…ming into 1 function

Fix h5 naming

0e20e7f

AlanMWatson added 3 commits November 3, 2025 16:49

Support for ome-zarr 0.4 (zarr v2) and 0.5 (zarr v3).

576cd03

Add option to turn off multiscale generation and tweak config defaults

e6f6334

Simplify close method

e323257

nvladimus closed this Nov 11, 2025

Conversation

AlanMWatson commented Oct 24, 2025

Uh oh!

nvladimus commented Oct 27, 2025

Uh oh!

AlanMWatson commented Oct 27, 2025

Uh oh!

nvladimus commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlanMWatson commented Oct 28, 2025

Uh oh!

AlanMWatson commented Oct 28, 2025

Uh oh!

m-albert commented Oct 29, 2025

Uh oh!

nvladimus commented Oct 29, 2025

Uh oh!

AlanMWatson commented Oct 29, 2025

Uh oh!

AlanMWatson commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nvladimus commented Nov 7, 2025

Uh oh!

nvladimus commented Nov 10, 2025

Uh oh!

AlanMWatson commented Nov 10, 2025

Uh oh!

nvladimus commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nvladimus commented Oct 28, 2025 •

edited

Loading

AlanMWatson commented Oct 31, 2025 •

edited

Loading