Skip to content

Multiscale ome-zarr v3 writer with sharding support#89

Closed
AlanMWatson wants to merge 23 commits intomesoSPIM:masterfrom
CBI-PITT:ome_zarr_writer
Closed

Multiscale ome-zarr v3 writer with sharding support#89
AlanMWatson wants to merge 23 commits intomesoSPIM:masterfrom
CBI-PITT:ome_zarr_writer

Conversation

@AlanMWatson
Copy link
Contributor

A builtin ome-zarr v3 writer for mesoSPIM.

Support:

  • On-the-fly assembly of multiscales during acquisition
  • High-performance multi threading keeps pace with data acquisition
  • Compression
  • Sharding
  • Chunksize can be adjusted with each multiscale
  • 2D downsampling for anisotropic data. Then 3D downsampling after multiscales converge to isotropic.

Notes:

  • Each tile is stored in its own ome-zarr dataset with a similar naming convention to other data acquisition methods
  • Defaults were tested on several systems using the demo config and long duration scans >3 hours.

@nvladimus
Copy link
Member

Dear Alan,
Thank you for this PR, very impressive!! 💯
I managed to test in demo-hardware mode, with two tiles, it worked nicely overall. I found a few issues so far:

  • There seems to be no transformation information (tile positions in zarr.json file), each tile is writtent in its own ZARR file, with no connection between them.
  • the OME-NGGF validator found some errors, namely No dimension_names for dataset 0 (etc)
  • the ReussMouseBrain-2x2Tiles-2Ch-2Arms dataset, which we can use as a reference for BigStitcher flavor of meta-information, packs each tile as a separate ZARR inside a root ZARR folder: Reuss-MousBrain2x2tiles-2ch.ome.zarr/s0-t0.zarr/i (i=0, 1, 2, 3)

I will look into these issues more closely, jut wanted to give a quick feedback.

@AlanMWatson
Copy link
Contributor Author

Thank you Nikita!

  • There seems to be no transformation information (tile positions in zarr.json file), each tile is writtent in its own ZARR file, with no connection between them.

I’ve added stage coordinates to the metadata for each array/scale in the multiscale tile. I tested by loading individual tiles in Neuroglancer, and they show the correct relative placement in the acquisition grid.

This was nested incorrectly; I’ve fixed it.
ome-zarr-models validate now returns: “✅ Valid OME-Zarr”.

  • the ReussMouseBrain-2x2Tiles-2Ch-2Arms dataset, which we can use as a reference for BigStitcher flavor of meta-information, packs each tile as a separate ZARR inside a root ZARR folder: Reuss-MousBrain2x2tiles-2ch.ome.zarr/s0-t0.zarr/i (i=0, 1, 2, 3)

What do you think is the best strategy for representing the collection of tiles? It isn’t clear to me whether OME-Zarr defines a single, recommended pattern for this - if at all. With the stage coordinates embedded as described above, the tiles now connect spatially. However, there is still no metadata that defines these tiles as a collection (i.e. part of the same logical grid)... We could write zarr.json group data in the root of the acquisition directory that points to each multiscale ome-zarr for each tile. Maybe something like this:

{
  "image_collection": {
    "version": "0.1-experimental",
    "images": [
      {"path": "r0_c0.ome.zarr"},
      {"path": "r0_c1.ome.zarr"},
      {"path": "r1_c0.ome.zarr"},
      {"path": "r1_c1.ome.zarr"}
    ]
  }
}

I think BigStitcher will want to see a more detailed XML which could be produced by mesoSPIM and dropped in the same place. It needs to also define "Translation to Regular Grid". Following this model, and maybe more consistant with resuability of the ome-zarr tile data, the translation coordinate transforms could be written to the root zarr.json and removed from individual tiles. That might look something like this:


{
  "axes": [
    {"name": "z", "type": "space", "unit": "micrometer"},
    {"name": "y", "type": "space", "unit": "micrometer"},
    {"name": "x", "type": "space", "unit": "micrometer"}
  ],
  "image_collection": {
    "version": "0.1-experimental",
    "axes": ["z","y","x"],
    "unit": "micrometer",
    "images": [
      {
        "path": "r0_c0.zarr",
        "coordinateTransformations": [
          {"type": "translation", "translation": [0.0, 0.0, 0.0]}
        ]
      },
      {
        "path": "r0_c1.zarr",
        "coordinateTransformations": [
          {"type": "translation", "translation": [0.0, 0.0, 666.0]}
        ]
      },
      {
        "path": "r1_c0.zarr",
        "coordinateTransformations": [
          {"type": "translation", "translation": [0.0, 666.0, 0.0]}
        ]
      },
      {
        "path": "r1_c1.zarr",
        "coordinateTransformations": [
          {"type": "translation", "translation": [0.0, 666.0, 666.0]}
        ]
      }
    ]
  }
}

@nvladimus
Copy link
Member

nvladimus commented Oct 28, 2025

Dear Alan,
Thanks for these updates! From what I can see in the ReussMouseBrain-2x2Tiles-2Ch-2Arms dataset, BigStitcher currently exports with the following ZARR structure:

  • root: dataset.ome.zarr group + dataset.xml
  • unique combinations of channels (488, 561, c=2), illuminations (left/right, i=2), and tiles T=4 create N=c*i*T=16 setups: dataset.ome.zarr/s0-t0.zarr, .. dataset.zarr/s15-t0.zarr. Parsing of these into correct channel/illumination names and tile positions happens via the BigStitcher-specific dataset.xml file.
  • each setup eg dataset.ome.zarr/s0-t0.zarr is a group that has has children /0 .. /3 that are downsampled levels of this setup, with the following meta-info (see below). I presume that the small pixel translations in higher-order levels (>=1) are due to averaging offsets, so all pyramids are aligned across scales.
  • each setup has always two zero nested sub-folders eg dataset.ome.zarr/s0-t0.zarr/3/0/0/z/y/x , I presume /0/0 are placeholders for time and channel (always 0 in this version). @StephanPreibisch can you weigh in on this?

I started a new branch for testing ome_zarr_writer, so you can see what I changed, since I cannot push my commits directly into this PR. It currently writes dataset.ome.zarr/s0-t0.zarr/0/c/z/y/x format, but for some reason I cannot get rid of c in this path. It looks like this c/ folder is part of Zarr 3 specification?

@StephanPreibisch do you expect any changes to this structure in the near future?
@m-albert is this compatible with multi-view stitcher?

# BigStitcher meta-info on setup level, eg  `dataset.zarr/s0-t0.zarr` 
{
  "multiscales": [
    {
      "name": "/",
      "version": "0.4",
      "axes": [
        {
          "type": "time",
          "name": "t",
          "unit": "millisecond",
          "discrete": false
        },
        {
          "type": "channel",
          "name": "c",
          "discrete": false
        },
        {
          "type": "space",
          "name": "z",
          "unit": "micrometer",
          "discrete": false
        },
        {
          "type": "space",
          "name": "y",
          "unit": "micrometer",
          "discrete": false
        },
        {
          "type": "space",
          "name": "x",
          "unit": "micrometer",
          "discrete": false
        }
      ],
      "datasets": [
        {
          "path": "0",
          "coordinateTransformations": [
            {
              "scale": [1.0, 1.0, 1.0, 1.0, 1.0],
              "type": "scale"
            },
            {
              "translation": [0.0, 0.0, 0.0, 0.0, 0.0],
              "type": "translation"
            }
          ]
        },
        {
          "path": "1",
          "coordinateTransformations": [
            {
              "scale": [1.0, 1.0, 1.0, 2.0, 2.0],
              "type": "scale"
            },
            {
              "translation": [0.0, 0.0, 0.0, 0.5, 0.5],
              "type": "translation"
            }
          ]
        },
        {
          "path": "2",
          "coordinateTransformations": [
            {
              "scale": [1.0, 1.0, 2.0, 4.0, 4.0],
              "type": "scale"
            },
            {
              "translation": [0.0, 0.0, 0.5, 1.5, 1.5],
              "type": "translation"
            }
          ]
        },
        {
          "path": "3",
          "coordinateTransformations": [
            {
              "scale": [1.0, 1.0, 4.0, 8.0, 8.0],
              "type": "scale"
            },
            {
              "translation": [0.0, 0.0, 1.5, 3.5, 3.5],
              "type": "translation"
            }
          ]
        }
      ],
      "coordinateTransformations": [
        {
          "scale": [1.0, 1.0, 1.0, 1.0, 1.0],
          "type": "scale"
        }
      ],
      "basePath": "",
      "paths": ["0", "1", "2", "3"],
      "units": ["micrometer", "micrometer", "micrometer", "micrometer"]
    }
  ]
}
# BigStitcher meta-info for `dataset.zarr/s0-t0.zarr/0/` array level
{
  "shape": [1, 1, 615, 2048, 2048],
  "chunks": [1, 1, 64, 128, 128],
  "fill_value": 0,
  "dtype": ">u2",
  "filters": [],
  "dimension_separator": "/",
  "zarr_format": 2,
  "order": "C",
  "compressor": {
    "id": "zstd",
    "level": 3
  }
}

@AlanMWatson
Copy link
Contributor Author

Nikitta,

It currently writes dataset.ome.zarr/s0-t0.zarr/0/c/z/y/x format

It is still writing each tile in a separate ...tile{}_dataset.ome.zarr/s{}-t{}.zarr/... structure. The tiles are already formatted as fully defined ome-zarrs. I suggest that if we want to use this strucuture, that we nest all of the s{}-t{}.zarr named tiles in a single <some_name>.ome.zarr folder - which is what the BigStitcher example is doing.

for some reason I cannot get rid of c in this path. It looks like this c/ folder is part of Zarr 3 specification?

This is what I am finding as well. 'chunk key namespace'

@AlanMWatson
Copy link
Contributor Author

Nikitta,

I suggest that if we want to use this strucuture, that we nest all of the s{}-t{}.zarr named tiles in a single <some_name>.ome.zarr folder - which is what the BigStitcher example is doing.

I pushed a fix that will nest the multiscale tiles in a single directory. Todo: Write zarr.json group metadata in this folder.

@m-albert
Copy link

Thanks for the ping @nvladimus!

First of all, this is amazing @AlanMWatson! It'll be super convenient if mesospim data is produced as OME-Zarr directly, certainly from the perspective of downstream processing and visualization.

@m-albert is this compatible with multi-view stitcher?

For processing the mesospim data (that you had uploaded) using this notebook, it was required to parse the tile transformations from the bigstitcher-xml file (you probably produced it using bigstitcher)? This is because while the OME-Zarr metadata of the tiles (the same as you copied here above) do contain translation transforms, those actually don't contain the actual translations of the tiles (the non-zero entries there relate to the small offsets of the different resolution levels).

So I think it'd be useful to include the tile translations in the tile OME-Zarr metadata (propagated to the resolution levels, i.e. just adding the offsets). That way, tools that support OME-Zarr transformation metadata can place the tiles properly.

I.e. for a tile positioned at (z,y,x) = (100, 200, 300), the metadata would be:

      "datasets": [
        {
          "path": "0",
          "coordinateTransformations": [
            {
              "scale": [1.0, 1.0, 1.0, 1.0, 1.0],
              "type": "scale"
            },
            {
              "translation": [0.0, 0.0, 100, 200,  300],
              "type": "translation"
            }
          ]
        },
        {
          "path": "1",
          "coordinateTransformations": [
            {
              "scale": [1.0, 1.0, 1.0, 2.0, 2.0],
              "type": "scale"
            },
            {
              "translation": [0.0, 0.0, 100, 200.5, 300.5],
              "type": "translation"
            }
          ]
        },
    ...

It isn’t clear to me whether OME-Zarr defines a single, recommended pattern for this - if at all. With the stage coordinates embedded as described above, the tiles now connect spatially. However, there is still no metadata that defines these tiles as a collection (i.e. part of the same logical grid)...

@AlanMWatson Representing collections in OME-Zarr is under active development! It's currently about to be an "RFC" (I think it stands for request for comments), worked on here: ome/ngff#343

I think BigStitcher will want to see a more detailed XML which could be produced by mesoSPIM and dropped in the same place. It needs to also define "Translation to Regular Grid". Following this model, and maybe more consistant with resuability of the ome-zarr tile data, the translation coordinate transforms could be written to the root zarr.json and removed from individual tiles. That might look something like this:

As far as I know, bigstitcher does not read pure OME-Zarr yet but only in combination with xml. Most likely bigstitcher ignores the tiles transformations in the OME-Zarr files (to be tested), so adding these there should not impact loading datasets in bigstitcher.

@nvladimus
Copy link
Member

Dear @AlanMWatson and @m-albert!
Thank you for the updates! I think we are close to a working solution. I looked up exaSPIM data structure example, tiles are named eg as tile_000000_ch_488.zarr, sit in the same .ome.zarr directory, and each of them has tczyx axes structure. The parsing into tile positions, channel names and illiminations is done via customized BigStitcher-style XML file. We just need to see if BigStitcher chokes when there is c/ subfolder in the path due to ZARR v3 chunk name space.

@AlanMWatson
Copy link
Contributor Author

Thanks @m-albert for the valuable feedback!

So I think it'd be useful to include the tile translations in the tile OME-Zarr metadata (propagated to the resolution levels, i.e. just adding the offsets). That way, tools that support OME-Zarr transformation metadata can place the tiles properly.

We are now including the stage positions in the coordinateTransformations, and I have tested it with multiview-stitcher and it works to automatically place the tiles on the grid!

@nvladimus ,

We can include t,c axes in all arrays by default. It should not be required by ngff spec, as I understand it, but if it adds a layer of compatibility then it may be a good idea. What do you think?

As for Bigstitcher, we should see if it supports the v0.5 spec (zarr v3). If so, the 'c' directory should be fine.

I did say that I would implement zarr v2 (0.4 spec) and did not get to it yet.

@AlanMWatson
Copy link
Contributor Author

AlanMWatson commented Oct 31, 2025

Hi @nvladimus,

I noticed that '*_meta.txt' files for both omezarr and h5 were not writing correctly with the latest version of this branch. Also MAX IPs were not being saved. I pushed fixes and also consolidated the file naming into a single function to make it easer to manage.

@nvladimus
Copy link
Member

I think I nailed the XML writer finally (took longer than I expected), the XML header file is now generated for OME ZARR v0.4 dataset, for drag-n-drop compatibility with BigStitcher. Branch: ome_zarr_writer,

@nvladimus
Copy link
Member

Hi, @AlanMWatson!
I am ready merge this PR to release/candidate-py12 branch. Given that you are working on big changes in the plugin branch, shall I still do it? Or wait for your new plugin PR that will include this OME-ZARR PR?

@AlanMWatson
Copy link
Contributor Author

Hi @nvladimus,

That's great! I think ultimately whether to merge is up to you and based on your timeframe for the release. I think the plugin branch is currently working as a proof of concept, and to your point, I will try to integrate the ome zarr wrtiter asap and also reproduce a RAW writer. The branch needs some testing and discussion about the best way to structure these plugins for broad compatibility. Do you want me to submit it as a PR as is and continue development within the PR, or should I wait until it is more complete? Thanks!

@nvladimus
Copy link
Member

Merged into branch release/candidate-py312, will start testing on mesoSPIM hardware.

@nvladimus nvladimus closed this Nov 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants