Does INT8 Quantized Convolution Still Involve Floating-Point Operations？

When I use quantized::conv2d in my model I noticed that a quantized convolution layer still keeps its scale parameter as a floating-point value. I think this scale is used to requantize the accumulated gemm output back to INT8. I would like to confirm: 

Does the quantized convolution operator perform any floating-point computations internally, or is the entire operation carried out in pure INT8/INT32 arithmetic?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Does INT8 Quantized Convolution Still Involve Floating-Point Operations？ #4474

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Does INT8 Quantized Convolution Still Involve Floating-Point Operations？ #4474

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions