Skip to content

Difference between CoreML and Pytorch inference #2620

@onurtore

Description

@onurtore

Hi!

I am trying to understand the discrepancy between the CoreML and the PyTorch inference we have. I have at least 10% differences for specific heads and test cases and I need to find the reason.

This is a screenshot of model's input using Netron app:

Image

The MUL and ADD are vector, (The current CoreML does not support vectors for Bias operation, however I changed it, see the issue here: #2619)

The values for MUL and ADD are:
MUL: 0.01464911736547947, 0.015123673714697361, 0.015288766473531723
ADD: -1.8590586185455322, -1.7242575883865356, -1.5922027826309204

I am adding the code I used to do inference on development environment:

def test_on_single_image(multihead_model: nn.Module, image_path: Path) -> List[float]:
    """
    Default method to test the model on a single image.
    """
    img_bgr = cv2.imread(image_path)
    if img_bgr is None:
        raise RuntimeError(f"Could not read image: {image_path}")
    img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
    img_rgb = cv2.resize(img_rgb, (224, 224), cv2.INTER_LINEAR)
    # Convert to tensor and add batch dimension
    img_tensor = (
        torch.from_numpy(img_rgb).permute(2, 0, 1).unsqueeze(0)
    )  # Shape: 1,3,H,W
    img_tensor = img_tensor.float() / 255.0
    for c in range(3):
        img_tensor[0, c, :, :] = (img_tensor[0, c, :, :] - MEAN[c]) / STD[c]

    outputs = multihead_model(img_tensor)
    predicted = outputs
    return predicted

The values for MEAN and STD are:
MEAN = [0.49767, 0.4471, 0.4084]
STD = [0.2677, 0.2593, 0.2565]

img_tensor[0, 0, 0, 0] = 1.2319, which is same with the colorImage__biased__ (It is 1.2319052), however the second value is 1.1586595 which is different than img_tensor[0,0,0,1], which is 1.0561.

I compare the order of colorImage__biased__ and the img_tensor to check whether the discrepancy comes from row/channel order but they are not. The values img_tensor[0,1,0,1] and img_tensor[0,1,0, 1]  are also different.

The only thing I belive can be different is the interpolation method used by CoreML. The OpenCV uses cv2.INTER_LINEAR, but I do not know what apple inference uses. I changed my interpolation method to different algorithms but the results are still different.

What are the possible reasons for this discrepancy?

The input image used in the example is:
Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionResponse providing clarification needed. Will not be assigned to a release. (type)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions