Skip to content

Conversation

@selmanozleyen
Copy link
Member

@selmanozleyen selmanozleyen commented Jun 30, 2025

Hi @timtreis @ilan-gold ,

I am working on this PR also and thought maybe it will be smart to have the code I use there supported here because they seem useful at first glance. So I wrote and tested subset_sdata_by_table_mask in this PR.

@LucaMarconato I can't ask for reviews other than @ilan-gold do you know why?


Some notes

  • For PointsModel (GeoDataFrame) it assumes the index is instance_id always
  • For Label2DModel the image is assumed to have the instance_ids as values themselves
  • For Label2DModel when the element is a xr.DataTree it assumes the keys are the different scales
  • I didn't use the relational operations to get the shapes and points for this reason because they assume the merge is on the element indices themselves
  • I didn't add support for AnnData because we would have to also document the return types and stuff in the case of AnnData input which doesn't seem reasonable to me if its going to make it 1-1 same as scanpy.pp.filter_cells

Code excerpt from tests to demonstrate the usage:

sdata = concatenate(
        {
            "labels": blobs_annotating_element("blobs_labels"),
            "shapes": blobs_annotating_element("blobs_circles"),
            "points": blobs_annotating_element("blobs_points"),
            "multiscale_labels": blobs_annotating_element("blobs_multiscale_labels"),
        },
        concatenate_tables=True,
    )
    third_elems = sdata.tables["table"].obs["instance_id"] == 3
    subset_sdata = subset_sdata_by_table_mask(sdata, "table", third_elems)

    labels_remaining_ids = set(np.unique(subset_sdata.labels["blobs_labels-labels"].data.compute())) - {0}
    assert labels_remaining_ids == {3}

    for scale in subset_sdata.labels["blobs_multiscale_labels-multiscale_labels"]:
        ms_labels_remaining_ids = set(
            np.unique(subset_sdata.labels["blobs_multiscale_labels-multiscale_labels"][scale].image.compute())
        ) - {0}
        assert ms_labels_remaining_ids == {3}

    points_remaining_ids = set(np.unique(subset_sdata.points["blobs_points-points"]["instance_id"].compute())) - {0}
    assert points_remaining_ids == {3}

    shapes_remaining_ids = set(np.unique(subset_sdata.shapes["blobs_circles-shapes"].index)) - {0}
    assert shapes_remaining_ids == {3}

@codecov
Copy link

codecov bot commented Jun 30, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.19%. Comparing base (8c3e518) to head (b21a0a1).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #946      +/-   ##
==========================================
+ Coverage   92.11%   92.19%   +0.08%     
==========================================
  Files          48       48              
  Lines        7440     7494      +54     
==========================================
+ Hits         6853     6909      +56     
+ Misses        587      585       -2     
Files with missing lines Coverage Δ
src/spatialdata/__init__.py 96.42% <ø> (ø)
src/spatialdata/_core/query/relational_query.py 92.78% <100.00%> (+1.28%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@timtreis timtreis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this PR make use of

and ?

from spatialdata.datasets import blobs_annotating_element


def test_filter_labels2dmodel_by_instance_ids():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parametrise, don't loop over inputs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the loop here is for the multiple scales. Or did you mean something else?

@selmanozleyen
Copy link
Member Author

selmanozleyen commented Jul 10, 2025

@timtreis About using def match_sdata_to_table( and the other. I noticed for shapes the index is assumed to be the instance_id but this doesn't match how the blobs are filled and would fail in the tests. It was documented that way so I assumed this was intentional but match_sdata_to_table wouldn't (and didn't) pass the current tests because it assumed the element index was the instance_id.

@selmanozleyen
Copy link
Member Author

@timtreis now I make use of match_element_to_table. Other than one comment all seems resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants