Pass "fuse_bn_into_conv" results in problematic graph and affects results.

When using `fuse_bn_into_conv` the optimizer causes a generation of redundant nodes which result in problematic model behavior. I am aware of the issue #133, however (1) I observed non-crashing model behavior that is different from the original model, (2) models with constant nodes being also affected (not only in initializer nodes) and (3) the issue opened provides no details about which models show this and to what extent. So I will provide a thorough detail of the bugs encountered here.

### 1. Extent Of Models Affected:
Classification and text models seemed to work quite well, however some object detection models were affected. I used the ILSVRC 2017 object detection dataset for this effort and compared top-K results for vision models, calculating F1 scores (using an `IoU` threshold of `0.5` to `0.9`), precision and recall for object detection.

For classification models, `EfficientNet-Lite4` of (`opset=11`) was very slightly affected, with a `top-10` label prediction difference of `1%`  between the original and the optimized model.

For Object Detection models, `YOLOv3` (`opset=11`) presented differences in F1 Score, Precision, and Recall of up to `1.5%`, while `Tiny YOLOv3-11` and `SSD` also had small differences.

In addition, I found that `SSD` label accuracy is affected to a very small extent, with top-1 label accuracy to differentiate by 0.04% across the original and the optimized model using the "fuse_bn_into_conv" pass.

### 2. Overall Behavior:
As mentioned in #133, this problem occurs quite commonly in the presence of initializers.
However, it occurs in the presence of the **weight** and the absence of *bias* initializer, specifically.
When bias initializers are detected in the `Conv` node (both weight and bias initializers are there), the process is skipped.

For example, node `conv2d_10` in `TinyYOLOv3`, `opset=11`.
This is the original node structure (the node is in the centre):
<kbd>
![Image](https://github.com/user-attachments/assets/7145365f-bbfe-454b-9ffb-ad23e4c925bf)
</kbd>

And this is how it is transformed using the pass:
<kbd>
![Image](https://github.com/user-attachments/assets/746d3d01-2781-4e56-9400-a6bde828cde1)
</kbd>


Notice how it remains unaffected because `Conv` contains the bias initialized, but it is still affected if it contains a weight initializer only. I observed the same behavior in other models (e.g., EfficientNet-Lite4). In particular there, I observed that at `BatchNormalization` node **succeeding** `Conv` can also present such behavior.

Given my understanding, wherever optimizer encounters a BatchNormalization node and its parameters are fixed, it updates the weights of the preceding `Conv` node and removes the BatchNormalization node. Instead, this bug results in taking the BatchNormalization and expanding its formula into multiple, separate nodes.

The original `EfficientNet-Lite4` on the left, and the optimized version on the right:
<kbd>
<img width="1844" height="1578" alt="Image" src="https://github.com/user-attachments/assets/a35b8ba2-0cd0-416a-9706-0aefe3900bae" />
</kbd>



In addition, we discovered an issue related to duplicate initializer associated with this pass, which we reported separately (#174).

**Important Note:** This bug finding is part of a research project, in which me and my collaborators searched extensively for issues related to the ONNX optimizer. We found the tool to be quite robust across a large majority of models of different types, and we will report the results soon. However we considered valuable to report all the bugs we found here. To the developers of the optimizer, keep up the fantastic work you do!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pass "fuse_bn_into_conv" results in problematic graph and affects results. #178

1. Extent Of Models Affected:

2. Overall Behavior:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pass "fuse_bn_into_conv" results in problematic graph and affects results. #178

Description

1. Extent Of Models Affected:

2. Overall Behavior:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions