You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Earlier this year I filed #10248 floating the possibility of developing a TensorFlow Lite ("TFLite") execution provider. In particular my interest is taking advantage of Google's Coral Edge TPU devices, which are only supported through a TFLite "delegate" (similar to an execution provider) provided by libedgetpu.
I am looking for input from the ONNX Runtime team on whether this idea sounds reasonable, and some advice on how it might be accomplished.
Here's my understanding:
Converting ONNX models to TFLite requires using two Python scripts, first onnx-tensorflow to convert the .onnx to TensorFlow .pb protobuf frozen graph, then tf.lite.TFLiteConverter.from_frozen_graph() to convert to a .tflite flatbuffers file.
The execution provider would have to be able to invoke the Python interpreter. I do not know of any existing execution providers that execute subprocesses. How can this be done in C++?
It looks like the CoreML provider does something similar with writing converted models out to disk and then loading them back in.
GetCapability() would examine each node in the graph for whether TFLite can execute it.
I don't know how to best answer this question. The definitive answer lies in what the two tools above are able to translate.
Also, TFLite does its own graph partitioning to delegate subgraph execution to hardware devices like the Coral, using the GraphPartitionHelper. Possibly ORT would only want to give TFLite subgraphs that the (hardware accelerated) TFLite delegate can execute, so ORT can determine assign other subgraphs to other EPs. However this would require some way of tracing TFLite subgraphs backward through the conversion process to the original ONNX nodes.
I asked over at the TensorFlow Forum whether there are APIs that could help.
Compile() would take each FusedNodeAndGraph and convert/load it into TFLite (tflite::FlatBufferModel::BuildFromFile()) and create a tflite::Interpreter from the model. Then it would generate a NodeComputeInfo for each one that populates the interpreter's input buffers, calls interpreter->Invoke(), then copies from the interpreter's output buffers to ORT's buffers.
I am assuming that a TFLite EP could also follow the CoreML EP in how it sets up its allocators and does not set up a kernel registry.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Earlier this year I filed #10248 floating the possibility of developing a TensorFlow Lite ("TFLite") execution provider. In particular my interest is taking advantage of Google's Coral Edge TPU devices, which are only supported through a TFLite "delegate" (similar to an execution provider) provided by libedgetpu.
I am looking for input from the ONNX Runtime team on whether this idea sounds reasonable, and some advice on how it might be accomplished.
Here's my understanding:
Converting ONNX models to TFLite requires using two Python scripts, first onnx-tensorflow to convert the .onnx to TensorFlow .pb protobuf frozen graph, then
tf.lite.TFLiteConverter.from_frozen_graph()to convert to a .tflite flatbuffers file.The execution provider would have to be able to invoke the Python interpreter. I do not know of any existing execution providers that execute subprocesses. How can this be done in C++?
It looks like the CoreML provider does something similar with writing converted models out to disk and then loading them back in.
GetCapability()would examine each node in the graph for whether TFLite can execute it.I don't know how to best answer this question. The definitive answer lies in what the two tools above are able to translate.
There is a documented list of operations supported by TFLite but I don't know how to go about checking an ORT node against this list. TFLite also has an
OpResolverclass.Also, TFLite does its own graph partitioning to delegate subgraph execution to hardware devices like the Coral, using the
GraphPartitionHelper. Possibly ORT would only want to give TFLite subgraphs that the (hardware accelerated) TFLite delegate can execute, so ORT can determine assign other subgraphs to other EPs. However this would require some way of tracing TFLite subgraphs backward through the conversion process to the original ONNX nodes.I asked over at the TensorFlow Forum whether there are APIs that could help.
Compile()would take eachFusedNodeAndGraphand convert/load it into TFLite (tflite::FlatBufferModel::BuildFromFile()) and create atflite::Interpreterfrom the model. Then it would generate aNodeComputeInfofor each one that populates the interpreter's input buffers, callsinterpreter->Invoke(), then copies from the interpreter's output buffers to ORT's buffers.I am assuming that a TFLite EP could also follow the CoreML EP in how it sets up its allocators and does not set up a kernel registry.
Beta Was this translation helpful? Give feedback.
All reactions