Describe the bug
I attempted to use VSA in Wan2.1-14B-I2V-720P with a high sparsification rate of 0.95. However, during the inference process, H20 inference only accelerated by 3 times, while H100 inference accelerated by 2 times. Is this normal?
Reproduction
models: Wan2.1-14B-I2V-720P
Environment
pytorch==2.8.0
vsa==0.03