Difference between  CoreML and Pytorch inference

Hi!

I am trying to understand the discrepancy between the CoreML and the PyTorch inference we have.  I have at least 10% differences for specific heads and test cases and I need to find the reason.

This is a screenshot of model's input using Netron app:

<img width="719" height="548" alt="Image" src="https://github.com/user-attachments/assets/2734eac3-1f77-42b8-bc38-e44e136990e0" />

The `MUL` and `ADD` are vector, (The current CoreML does not support vectors for `Bias` operation, however I changed it, see the issue here: https://github.com/apple/coremltools/issues/2619)

The values for `MUL` and `ADD` are:
MUL: 0.01464911736547947, 0.015123673714697361, 0.015288766473531723
ADD: -1.8590586185455322, -1.7242575883865356, -1.5922027826309204

I am adding the code I used to do inference on development environment:

```
def test_on_single_image(multihead_model: nn.Module, image_path: Path) -> List[float]:
    """
    Default method to test the model on a single image.
    """
    img_bgr = cv2.imread(image_path)
    if img_bgr is None:
        raise RuntimeError(f"Could not read image: {image_path}")
    img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
    img_rgb = cv2.resize(img_rgb, (224, 224), cv2.INTER_LINEAR)
    # Convert to tensor and add batch dimension
    img_tensor = (
        torch.from_numpy(img_rgb).permute(2, 0, 1).unsqueeze(0)
    )  # Shape: 1,3,H,W
    img_tensor = img_tensor.float() / 255.0
    for c in range(3):
        img_tensor[0, c, :, :] = (img_tensor[0, c, :, :] - MEAN[c]) / STD[c]

    outputs = multihead_model(img_tensor)
    predicted = outputs
    return predicted
```
The values for MEAN and STD are:
MEAN  = [0.49767, 0.4471, 0.4084]
STD = [0.2677, 0.2593, 0.2565]

`img_tensor[0, 0, 0, 0] = 1.2319`, which is same with the `colorImage__biased__`  (It is 1.2319052), however the second value is `1.1586595` which is different than `img_tensor[0,0,0,1]`, which is `1.0561`.

I compare the order of `colorImage__biased__ `and the `img_tensor` to check whether the discrepancy comes  from row/channel order but they are not. The values `img_tensor[0,1,0,1] ` and ` img_tensor[0,1,0, 1]`  are also different.

The only thing I belive can be different is the  interpolation method used by CoreML. The OpenCV uses `cv2.INTER_LINEAR`, but I do not know what apple inference uses. I changed my interpolation method to different algorithms but the results are still different.

What are the possible reasons for this discrepancy?

The input image used in the example is:
![Image](https://github.com/user-attachments/assets/fd901b07-9cd8-4729-9c10-729ce62ace18)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Difference between CoreML and Pytorch inference #2620

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Difference between CoreML and Pytorch inference #2620

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions