-
Notifications
You must be signed in to change notification settings - Fork 59
Open
Description
I test the given checkpoint SIT-XL/2 using command specified in README.
with cfg=1.5: torchrun --nnodes=1 --nproc_per_node=8 sample_ddp.py ODE --cfg-scale 1.5 --model SiT-XL/2 --num-fid-samples 50000
without cfg: torchrun --nnodes=1 --nproc_per_node=8 sample_ddp.py ODE --model SiT-XL/2 --num-fid-samples 50000
In my testing, the FID for cfg=1.0 (w/o cfg) is 9.5 (I try several times and all > 9.3). But in paper, the SIT-XL/2 7M (w/o cfg) achieve FID=8.3. I wonder where will cause this performance misalignment of sampling from pretrained SIT models without cfg.
| FID (this repo) | FID (paper) | |
|---|---|---|
| SIT-XL/2 (w/o. cfg) | 9.5 | 8.3 |
| SIT-XL/2 (w/. cfg=1.5) | 2.08 | 2.06 |
zelaki, Jingwei-Liao and sjtuytc
Metadata
Metadata
Assignees
Labels
No labels