-
-
Notifications
You must be signed in to change notification settings - Fork 51
Description
(Edited by @m-albert)
In the presence of pyarrow, dask by default assumes dataframes of type object to be pyarrow strings (see dask/dask#10139 (comment)).
This creates problems revealed by failing tests (e.g. test_dask_image/test_ndmeasure/test_find_objects.py::test_3d_find_objects)
dask-image/dask_image/ndmeasure/_utils/_find_objects.py
Lines 68 to 70 in 67540af
| meta = dd.utils.make_meta([(i, object) for i in range(ndim)]) | |
| if isinstance(df1, Delayed): | |
| df1 = dd.from_delayed(df1, meta=meta) |
dd.from_delayed(df1, meta=meta).compute().dtypes
Working install:
0 object
1 object
2 object
dtype: object
Failing install:
0 string[pyarrow]
1 string[pyarrow]
2 string[pyarrow]
dtype: object
The failing test had come up when releasing v2023.08.0 in conda-forge/dask-image-feedstock#14.
@jakirkham found that pyarrow is installed with the conda distribution of dask, but not when installing over pip, where it just part of the [complete] target.
Also @jakirkham found that the above described conflicting behaviour can be turned off using the dask configuration.
He did this for the tests performed by the dask-image conda feedstock on v2023.08.0.