Hello, thank you for your Audio2Video and Video2Video features!
You mention that these two features are powered by OmniV2V. I read the paper and find that OmniV2V needs instruction prompt, text prompt, and image prompt to achieve instruction-based editing.
However, in your inference example, there is no instruction prompt. In hymm_sp/text_encoder/init.py, there is still no instruction prompt.
I wonder how to use instruction prompt to achieve instruction-based editing.
Thank you!