May I ask if it is possible for us to use onnx to improve the speed of inference, and how to call the encode function in the onnx model?