Hi,
I tried to reproduce the evaluation result in your paper "Music-Oriented Dance Video Synthesis with Pose Perceptual Loss".
Yet, I do not understand how to get the cross-modal evaluation numbers from this implementation.
I have finished training this model and the dance generative model.
My question is that how should I input the skeleton sequence generated from "Self-supervised Dance Video Synthesis Conditioned on Music" into this cross-modal metric model, and acquire the scores?
Thank you in advance!