Skip to content

does rle_to_mask to mask support UTF-8 encoded string? #1952

@Kallinteris-Andreas

Description

@Kallinteris-Andreas

Search before asking

  • I have searched the Supervision issues and found no similar feature requests.

Question

Summary of RLE formats supported by COCO dataset by Gemini2.5-pro:

"""

Structure of the RLE Output

The output of the mask.encode() function is a dictionary with the following structure:

  • 'size': A list of two integers [height, width] representing the dimensions of the mask.
  • 'counts': This field contains the run-length encoded data. It can be one of two types:
    • A list of integers: This is an uncompressed RLE. The list represents the lengths of alternating runs of 0s and 1s. For example, [2, 3, 1, 1] for a mask [0, 0, 1, 1, 1, 0, 1] means two 0s, followed by three 1s, one 0, and one 1. If the first pixel is a 1, the RLE starts with a 0 count for the initial run of 0s.
    • A UTF-8 encoded string: This is a compressed RLE, which is more space-efficient. The pycocotools library handles the compression and decompression of this string.

"""

In particular my dataset use UTF-8 encoded string, because it is outputed by pycocotools

Does the sv.DetectionDataset.from_coco function support loading a dataset with UTF-8 encoded RLE segmentation masks?

From what I can tell it calls the rle_to_mask function, which is the function that handles the conversion does not support it

def rle_to_mask(
rle: npt.NDArray[np.int_] | list[int], resolution_wh: tuple[int, int]
) -> npt.NDArray[np.bool_]:
"""
Converts run-length encoding (RLE) to a binary mask.
Args:
rle (Union[npt.NDArray[np.int_], List[int]]): The 1D RLE array, the format
used in the COCO dataset (column-wise encoding, values of an array with
even indices represent the number of pixels assigned as background,
values of an array with odd indices represent the number of pixels
assigned as foreground object).
resolution_wh (Tuple[int, int]): The width (w) and height (h)
of the desired binary mask.
Returns:
The generated 2D Boolean mask of shape `(h, w)`, where the foreground object is
marked with `True`'s and the rest is filled with `False`'s.
Raises:
AssertionError: If the sum of pixels encoded in RLE differs from the
number of pixels in the expected mask (computed based on resolution_wh).
Examples:
```python
import supervision as sv
sv.rle_to_mask([5, 2, 2, 2, 5], (4, 4))
# array([
# [False, False, False, False],
# [False, True, True, False],
# [False, True, True, False],
# [False, False, False, False],
# ])
```
"""
if isinstance(rle, list):
rle = np.array(rle, dtype=int)
width, height = resolution_wh
assert width * height == np.sum(rle), (
"the sum of the number of pixels in the RLE must be the same "
"as the number of pixels in the expected mask"
)
zero_one_values = np.zeros(shape=(rle.size, 1), dtype=np.uint8)
zero_one_values[1::2] = 1
decoded_rle = np.repeat(zero_one_values, rle, axis=0)
decoded_rle = np.append(
decoded_rle, np.zeros(width * height - len(decoded_rle), dtype=np.uint8)
)
return decoded_rle.reshape((height, width), order="F")

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions