-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Description
What happened?
When reading+writing a file with waterlevel observations downloaded from CMEMS, I get a "UnicodeEncodeError: 'utf-8' codec can't encode characters in position 4-5: surrogates not allowed"
when using the new default engine (h5netcdf)
What did you expect to happen?
This was not the case before, when engine netcdf4 was the default. I can also easily work around this by setting the engine. However, I see the benefits of h5netcdf and I guess this file should be no issue for that engine. Hopefully this can be resolved, either in xarray or h5netcdf.
Minimal Complete Verifiable Example
import xarray as xr
file_cmems = r"dfmtools_cmems_ssh_retrieve_data_temporary_file_6.nc"
ds = xr.open_dataset(file_cmems, engine='h5netcdf')
ds.to_netcdf("temp_file.nc")
File to test with: dfmtools_cmems_ssh_retrieve_data_temporary_file_6.zip
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
UnicodeEncodeError Traceback (most recent call last)
File ~\AppData\Local\miniforge3\envs\dfm_tools_env\Lib\site-packages\spyder_kernels\customize\utils.py:209, in exec_encapsulate_locals(code_ast, globals, locals, exec_fun, filename)
207 if filename is None:
208 filename = "<stdin>"
--> 209 exec_fun(compile(code_ast, filename, "exec"), globals, None)
210 finally:
211 if use_locals_hack:
212 # Cleanup code
File c:\data\checkouts\dfm_tools\tests\untitled2.py:10
8 file_cmems = r"c:\Users\veenstra\Downloads\dfmtools_cmems_ssh_retrieve_data_temporary_file_6.nc"
9 ds = xr.open_dataset(file_cmems, engine='h5netcdf')
---> 10 ds.to_netcdf("temp_file.nc")
File ~\AppData\Local\miniforge3\envs\dfm_tools_env\Lib\site-packages\xarray\core\dataset.py:2102, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf, auto_complex)
2099 encoding = {}
2100 from xarray.backends.api import to_netcdf
-> 2102 return to_netcdf( # type: ignore[return-value] # mypy cannot resolve the overloads:(
2103 self,
2104 path,
2105 mode=mode,
2106 format=format,
2107 group=group,
2108 engine=engine,
2109 encoding=encoding,
2110 unlimited_dims=unlimited_dims,
2111 compute=compute,
2112 multifile=False,
2113 invalid_netcdf=invalid_netcdf,
2114 auto_complex=auto_complex,
2115 )
File ~\AppData\Local\miniforge3\envs\dfm_tools_env\Lib\site-packages\xarray\backends\api.py:2107, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf, auto_complex)
2102 # TODO: figure out how to refactor this logic (here and in save_mfdataset)
2103 # to avoid this mess of conditionals
2104 try:
2105 # TODO: allow this work (setting up the file for writing array data)
2106 # to be parallelized with dask
-> 2107 dump_to_store(
2108 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
2109 )
2110 if autoclose:
2111 store.close()
File ~\AppData\Local\miniforge3\envs\dfm_tools_env\Lib\site-packages\xarray\backends\api.py:2157, in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
2154 if encoder:
2155 variables, attrs = encoder(variables, attrs)
-> 2157 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
File ~\AppData\Local\miniforge3\envs\dfm_tools_env\Lib\site-packages\xarray\backends\common.py:527, in AbstractWritableDataStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
523 writer = ArrayWriter()
525 variables, attributes = self.encode(variables, attributes)
--> 527 self.set_attributes(attributes)
528 self.set_dimensions(variables, unlimited_dims=unlimited_dims)
529 self.set_variables(
530 variables, check_encoding_set, writer, unlimited_dims=unlimited_dims
531 )
File ~\AppData\Local\miniforge3\envs\dfm_tools_env\Lib\site-packages\xarray\backends\common.py:544, in AbstractWritableDataStore.set_attributes(self, attributes)
534 """
535 This provides a centralized method to set the dataset attributes on the
536 data store.
(...)
541 Dictionary of key/value (attribute name / attribute) pairs
542 """
543 for k, v in attributes.items():
--> 544 self.set_attribute(k, v)
File ~\AppData\Local\miniforge3\envs\dfm_tools_env\Lib\site-packages\xarray\backends\netCDF4_.py:555, in NetCDF4DataStore.set_attribute(self, key, value)
553 self.ds.setncattr_string(key, value)
554 else:
--> 555 self.ds.setncattr(key, value)
File src\\netCDF4\\_netCDF4.pyx:3087, in netCDF4._netCDF4.Dataset.setncattr()
File src\\netCDF4\\_netCDF4.pyx:1858, in netCDF4._netCDF4._set_att()
File src\\netCDF4\\_netCDF4.pyx:6733, in netCDF4._netCDF4._strencode()
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 4-5: surrogates not allowed
Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.12.8 | packaged by conda-forge | (main, Dec 5 2024, 14:06:27) [MSC v.1942 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 11
machine: AMD64
processor: Intel64 Family 6 Model 154 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: ('Dutch_Netherlands', '1252')
libhdf5: 1.14.4
libnetcdf: 4.9.2
xarray: 2025.9.1
pandas: 2.2.3
numpy: 2.2.6
scipy: 1.15.1
netCDF4: 1.7.2
pydap: 3.5.3
h5netcdf: 1.5.0
h5py: 3.12.1
zarr: 2.18.4
cftime: 1.6.4.post1
nc_time_axis: None
iris: None
bottleneck: 1.4.2
dask: 2025.1.0
distributed: None
matplotlib: 3.10.0
cartopy: None
seaborn: None
numbagg: None
fsspec: 2024.12.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 75.8.0
pip: 25.0
conda: None
pytest: 8.3.4
mypy: None
IPython: 8.31.0
sphinx: 8.1.3
Metadata
Metadata
Assignees
Labels
No labels