Skip to content

Conversation

@sophiamaedler
Copy link

I was trying to initialize a spatialdata object directly from S3 as done in the tests here:

from upath import UPath
import spatialdata as sd
test = UPath( "s3://spatialdata/spatialdata-sandbox/merfish.zarr", endpoint_url="https://s3.embl.de", anon=True )
sd.read_zarr(test)

Was failing with:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[1], line 4
      2 import spatialdata as sd
      3 test = UPath( "s3://spatialdata/spatialdata-sandbox/merfish.zarr", endpoint_url="https://s3.embl.de/", anon=True )
----> 4 sd.read_zarr(test)

File ~/src/spatialdata/_io/io_zarr.py:282, in read_zarr(store, selection, on_bad_files)
    272     attrs = None
    274 sdata = SpatialData(
    275     images=images,
    276     labels=labels,
   (...)    
    281 )
--> 282 sdata.path = _create_upath(_store)
    283 return sdata

File ~/src/spatialdata/_core/spatialdata.py:590, in SpatialData.path(self, value)
    588     self._path = value
    589 else:
--> 590     raise TypeError("Path must be `None`, a `str` or a `Path` object.")
    592 if not self.is_self_contained():
    593     logger.info(
    594         "The SpatialData object is not self-contained "
    595         "(i.e. it contains some elements that are Dask-backed "
    596         "from locations outside {self.path})."
    597     )

TypeError: Path must be `None`, a `str` or a `Path` object.

The implemented changes fix the issues and result in the sdata object being successfully read from S3.

The code now returns:

SpatialData object, with associated Zarr store: s3://spatialdata/spatialdata-sandbox/merfish.zarr
├── Images
│     └── 'rasterized': DataArray[cyx] (1, 522, 575)
├── Points
│     └── 'single_molecule': DataFrame with shape: (<Delayed>, 3) (2D points)
├── Shapes
│     ├── 'anatomical': GeoDataFrame shape: (6, 1) (2D shapes)
│     └── 'cells': GeoDataFrame shape: (2389, 2) (2D shapes)
└── Tables
      └── 'table': AnnData (2389, 268)
with coordinate systems:
    ▸ 'global', with elements:
        rasterized (Images), single_molecule (Points), anatomical (Shapes), cells (Shapes)
with the following Dask-backed elements not being self-contained:
    ▸ rasterized: [path/spatialdata/spatialdata-sandbox/merfish.zarr/images/rasterized]
    ▸ single_molecule: [path/spatialdata/spatialdata-sandbox/merfish.zarr/points/single_molecule/points.parquet/part.0.parquet]

@sophiamaedler
Copy link
Author

actions failing due to changes introduced in previous commit: c514a0b
I can take a look to see if I can figure out the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant