-
Notifications
You must be signed in to change notification settings - Fork 101
Description
When using fuse_bn_into_conv
the optimizer causes a generation of redundant nodes which result in problematic model behavior. I am aware of the issue #133, however (1) I observed non-crashing model behavior that is different from the original model, (2) models with constant nodes being also affected (not only in initializer nodes) and (3) the issue opened provides no details about which models show this and to what extent. So I will provide a thorough detail of the bugs encountered here.
1. Extent Of Models Affected:
Classification and text models seemed to work quite well, however some object detection models were affected. I used the ILSVRC 2017 object detection dataset for this effort and compared top-K results for vision models, calculating F1 scores (using an IoU
threshold of 0.5
to 0.9
), precision and recall for object detection.
For classification models, EfficientNet-Lite4
of (opset=11
) was very slightly affected, with a top-10
label prediction difference of 1%
between the original and the optimized model.
For Object Detection models, YOLOv3
(opset=11
) presented differences in F1 Score, Precision, and Recall of up to 1.5%
, while Tiny YOLOv3-11
and SSD
also had small differences.
In addition, I found that SSD
label accuracy is affected to a very small extent, with top-1 label accuracy to differentiate by 0.04% across the original and the optimized model using the "fuse_bn_into_conv" pass.
2. Overall Behavior:
As mentioned in #133, this problem occurs quite commonly in the presence of initializers.
However, it occurs in the presence of the weight and the absence of bias initializer, specifically.
When bias initializers are detected in the Conv
node (both weight and bias initializers are there), the process is skipped.
For example, node conv2d_10
in TinyYOLOv3
, opset=11
.
This is the original node structure (the node is in the centre):
And this is how it is transformed using the pass:
Notice how it remains unaffected because Conv
contains the bias initialized, but it is still affected if it contains a weight initializer only. I observed the same behavior in other models (e.g., EfficientNet-Lite4). In particular there, I observed that at BatchNormalization
node succeeding Conv
can also present such behavior.
Given my understanding, wherever optimizer encounters a BatchNormalization node and its parameters are fixed, it updates the weights of the preceding Conv
node and removes the BatchNormalization node. Instead, this bug results in taking the BatchNormalization and expanding its formula into multiple, separate nodes.
The original EfficientNet-Lite4
on the left, and the optimized version on the right:
In addition, we discovered an issue related to duplicate initializer associated with this pass, which we reported separately (#174).
Important Note: This bug finding is part of a research project, in which me and my collaborators searched extensively for issues related to the ONNX optimizer. We found the tool to be quite robust across a large majority of models of different types, and we will report the results soon. However we considered valuable to report all the bugs we found here. To the developers of the optimizer, keep up the fantastic work you do!