@@ -19,26 +19,57 @@ Xarray ``Dataset`` objects.
1919
2020Second, from Xarray's point of view, the key difference between
2121NetCDF and Zarr is that all NetCDF arrays have *dimension names * while Zarr
22- arrays do not. Therefore, in order to store NetCDF data in Zarr, Xarray must
23- somehow encode and decode the name of each array's dimensions.
24-
25- To accomplish this, Xarray developers decided to define a special Zarr array
26- attribute: ``_ARRAY_DIMENSIONS ``. The value of this attribute is a list of
27- dimension names (strings), for example ``["time", "lon", "lat"] ``. When writing
28- data to Zarr, Xarray sets this attribute on all variables based on the variable
29- dimensions. When reading a Zarr group, Xarray looks for this attribute on all
30- arrays, raising an error if it can't be found. The attribute is used to define
31- the variable dimension names and then removed from the attributes dictionary
32- returned to the user.
33-
34- Because of these choices, Xarray cannot read arbitrary array data, but only
35- Zarr data with valid ``_ARRAY_DIMENSIONS `` or
36- `NCZarr <https://docs.unidata.ucar.edu/nug/current/nczarr_head.html >`_ attributes
37- on each array (NCZarr dimension names are defined in the ``.zarray `` file).
38-
39- After decoding the ``_ARRAY_DIMENSIONS `` or NCZarr attribute and assigning the variable
40- dimensions, Xarray proceeds to [optionally] decode each variable using its
41- standard CF decoding machinery used for NetCDF data (see :py:func: `decode_cf `).
22+ arrays do not. In Zarr v2, Xarray uses an ad-hoc convention to encode and decode
23+ the name of each array's dimensions. However, starting with Zarr v3, the
24+ ``dimension_names `` attribute provides a formal convention for storing the
25+ NetCDF data model in Zarr.
26+
27+ Dimension Encoding in Zarr Formats
28+ -----------------------------------
29+
30+ Xarray encodes array dimensions differently depending on the Zarr format version:
31+
32+ **Zarr V2 Format: **
33+ Xarray uses a special Zarr array attribute: ``_ARRAY_DIMENSIONS ``. The value of this
34+ attribute is a list of dimension names (strings), for example ``["time", "lon", "lat"] ``.
35+ When writing data to Zarr V2, Xarray sets this attribute on all variables based on the
36+ variable dimensions. This attribute is visible when accessing arrays directly with
37+ zarr-python.
38+
39+ **Zarr V3 Format: **
40+ Xarray uses the native ``dimension_names `` field in the array metadata. This is part
41+ of the official Zarr V3 specification and is not stored as a regular attribute.
42+ When accessing arrays with zarr-python, this information is available in the array's
43+ metadata but not in the attributes dictionary.
44+
45+ When reading a Zarr group, Xarray looks for dimension information in the appropriate
46+ location based on the format version, raising an error if it can't be found. The
47+ dimension information is used to define the variable dimension names and then
48+ (for Zarr V2) removed from the attributes dictionary returned to the user.
49+
50+ CF Conventions
51+ --------------
52+
53+ Xarray uses its standard CF encoding/decoding functionality for handling metadata
54+ (see :py:func: `decode_cf `). This includes encoding concepts such as dimensions and
55+ coordinates. The ``coordinates `` attribute, which lists coordinate variables
56+ (e.g., ``"yc xc" `` for spatial coordinates), is one part of the broader CF conventions
57+ used to describe metadata in NetCDF and Zarr.
58+
59+ Compatibility and Reading
60+ -------------------------
61+
62+ Because of these encoding choices, Xarray cannot read arbitrary Zarr arrays, but only
63+ Zarr data with valid dimension metadata. Xarray supports:
64+
65+ - Zarr V2 arrays with ``_ARRAY_DIMENSIONS `` attributes
66+ - Zarr V3 arrays with ``dimension_names `` metadata
67+ - `NCZarr <https://docs.unidata.ucar.edu/nug/current/nczarr_head.html >`_ format
68+ (dimension names are defined in the ``.zarray `` file)
69+
70+ After decoding the dimension information and assigning the variable dimensions,
71+ Xarray proceeds to [optionally] decode each variable using its standard CF decoding
72+ machinery used for NetCDF data.
4273
4374Finally, it's worth noting that Xarray writes (and attempts to read)
4475"consolidated metadata" by default (the ``.zmetadata `` file), which is another
@@ -49,34 +80,63 @@ warning about poor performance when reading non-consolidated stores unless they
4980explicitly set ``consolidated=False ``. See :ref: `io.zarr.consolidated_metadata `
5081for more details.
5182
52- As a concrete example, here we write a tutorial dataset to Zarr and then
53- re-open it directly with Zarr:
83+ Examples: Zarr Format Differences
84+ ----------------------------------
85+
86+ The following examples demonstrate how dimension and coordinate encoding differs
87+ between Zarr format versions. We'll use the same tutorial dataset but write it
88+ in different formats to show what users will see when accessing the files directly
89+ with zarr-python.
90+
91+ **Example 1: Zarr V2 Format **
5492
5593.. jupyter-execute ::
5694
5795 import os
5896 import xarray as xr
5997 import zarr
6098
99+ # Load tutorial dataset and write as Zarr V2
61100 ds = xr.tutorial.load_dataset("rasm")
62- ds.to_zarr("rasm.zarr", mode="w", consolidated=False)
63- os.listdir("rasm.zarr")
101+ ds.to_zarr("rasm_v2.zarr", mode="w", consolidated=False, zarr_format=2)
102+
103+ # Open with zarr-python and examine attributes
104+ zgroup = zarr.open("rasm_v2.zarr")
105+ print("Zarr V2 - Tair attributes:")
106+ tair_attrs = dict(zgroup["Tair"].attrs)
107+ for key, value in tair_attrs.items():
108+ print(f" '{key}': {repr(value)}")
64109
65110.. jupyter-execute ::
111+ :hide-code:
66112
67- zgroup = zarr.open("rasm.zarr")
68- zgroup.tree()
113+ import shutil
114+ shutil.rmtree("rasm_v2.zarr")
115+
116+ **Example 2: Zarr V3 Format **
69117
70118.. jupyter-execute ::
71119
72- dict(zgroup["Tair"].attrs)
120+ # Write the same dataset as Zarr V3
121+ ds.to_zarr("rasm_v3.zarr", mode="w", consolidated=False, zarr_format=3)
122+
123+ # Open with zarr-python and examine attributes
124+ zgroup = zarr.open("rasm_v3.zarr")
125+ print("Zarr V3 - Tair attributes:")
126+ tair_attrs = dict(zgroup["Tair"].attrs)
127+ for key, value in tair_attrs.items():
128+ print(f" '{key}': {repr(value)}")
129+
130+ # For Zarr V3, dimension information is in metadata
131+ tair_array = zgroup["Tair"]
132+ print(f"\n Zarr V3 - dimension_names in metadata: {tair_array.metadata.dimension_names}")
73133
74134.. jupyter-execute ::
75135 :hide-code:
76136
77137 import shutil
138+ shutil.rmtree("rasm_v3.zarr")
78139
79- shutil.rmtree("rasm.zarr")
80140
81141Chunk Key Encoding
82142------------------
0 commit comments