Hello,
I was trying to reproduce evaluation results that you presented in the paper (VBench-2.0) for videos generated by Sora model. I took the videos from your google drive. Unfortunately I'm seeing some differences, largest difference in score for a single dimension is around 2.5 (on a scale 0 - 100). I've tried running the same setup on different kinds GPUs (H100, A100, L40) and got different results on each GPU, which makes me think this may be some differences in kernels offered for different GPUs.
Could you please share what kind of GPU did you use for your evaluation? @Jacky-hate @zhengdian1