[CVPR'25] Towards More General Video-based Deepfake Detection through Facial Component Guided Adaptation for Foundation Model (DFD-FCG)
Yue-Hua Han
1,3,4
Β Β
Tai-Ming Huang
1,3,4
Β Β
Kai-Lung Hua
2,4
Β Β
Jun-Cheng Chen
1
1Academia Sinica,Β
2Microsoft,Β
3National Taiwan University,Β
4National Taiwan University of Science and TechnologyΒ
Β
- Training + Evaluation Code
- Model Weights
- Inference Code
- HeyGen Evaluation Dataset
- June 08: We have released the model checkpoint and provided inference code for single videos! Checkout this section for further details!
# conda environment
conda env create -f environment.ymlThe structure of the pre-processed datasets for our project, the video files (*.avi) have been processed to only retain the aligned face. We use soft-links (ln -s) to manage and link the folders containing pre-processed videos on different drives.
datasets
βββ cdf
β βββ FAKE
β β βββ videos
β β βββ *.avi
β βββ REAL
β β βββ videos
β β βββ *.avi
β βββ csv_files
β βββ test_fake.csv
β βββ test_real.csv
βββ dfdc
β βββ csv_files
β β βββ test.csv
β βββ videos
βββ dfo
β βββ FAKE
β β βββ videos
β β βββ *.avi
β βββ REAL
β β βββ videos
β β βββ *.avi
β βββ csv_files
β βββ test_fake.csv
β βββ test_real.csv
βββ ffpp
β βββ DF
β β βββ c23
β β β βββ videos
β β β βββ *.avi
β β βββ c40
β β β βββ videos
β β β βββ *.avi
β β βββ raw
β β βββ videos
β β β βββ *.avi
β βββ F2F ...
β βββ FS ...
β βββ FSh ...
β βββ NT ...
β βββ real ...
β βββ csv_files
β βββ test.json
β βββ train.json
β βββ val.json
|
βββ robustness
βββ BW
β βββ 1
β β βββ DF
β β β βββ c23
β β β βββ videos
β β β βββ *.avi
β β βββ F2F ...
β β βββ FS ...
β β βββ FSh ...
β β βββ NT ...
β β βββ real ...
β β β
β β βββ csv_files
β β βββ test.json
β β βββ train.json
β β βββ val.json
β β
β β
β β
. .
. .
. .This phase performs the required pre-processing for our method, this includes facial alignment (using the mean face from LRW) and facial cropping.
# First, fetch all the landmarks & bboxes of the video frames.
python -m src.preprocess.fetch_landmark_bbox \
--root-dir="/storage/FaceForensicC23" \ # The root folder of the dataset
--video-dir="videos" \ # The root folder of the videos
--fdata-dir="frame_data" \ # The folder to save the extracted frame data
--glob-exp="*/*" \ # The glob expression to search through the root video folder
--split-num=1 \ # Split the dataset into several parts for parallel process.
--part-num=1 \ # The part of dataset to process for parallel process.
--batch=1 \ # The batch size for the 2D-FAN face data extraction. (suggestion: 1)
--max-res=800 # The maximum resolution for either side of the image
# Then, crop all the faces from the original videos.
python -m src.preprocess.crop_main_face \
--root-dir="/storage/FaceForensicC23/" \ # The root folder of the dataset
--video-dir="videos" \ # The root folder of the videos
--fdata-dir="frame_data" \ # The folder to fetch the frame data for landmarks and bboxes
--glob-exp="*/*" \ # The glob expression to search through the root video folder
--crop-dir="cropped" \ # The folder to save the cropped videos
--crop-width=150 \ # The width for the cropped videos
--crop-height=150 \ # The height for the cropped videos
--mean-face="./misc/20words_mean_face.npy" # The mean face for face aligned cropping.
--replace \ # Control whether to replace existing cropped videos
--workers=1 # Number of works to perform parallel process (default: cpu / 2 )This phase requires the pre-processed facial landmarks to perform facial cropping, please refer to the Generic Pre-processing for further detail.
# First, we add perturbation to all the videos.
python -m src.preprocess.phase1_apply_all_to_videos \
--dts-root="/storage/FaceForensicC23" \ # The root folder of the dataset
--vid-dir="videos" \ # The root folder of the videos
--rob-dir="robustness" \ # The folder to save the perturbed videos
--glob-exp="*/*.mp4" \ # The glob expression to search through the root video folder
--split=1 \ # Split the dataset into several parts for parallel process.
--part=1 \ # The part of dataset to process for parallel process.
--workers=1 # Number of works to perform parallel process (default: cpu / 2 )
# Then, crop all the faces from the perturbed videos.
python -m src.preprocess.phase2_face_crop_all_videos \
(setup/run/clean) # the three phase operations
--root-dir="/storage/FaceForensicC23/" \ # The root folder of the dataset
--rob-dir="videos" \ # The root folder of the robustness videos
--fd-dir="frame_data" \ # The folder to fetch the frame data for landmarks and bboxes
--glob-exp="*/*/*/*.mp4" \ # The glob expression to search through the root video folder
--crop-dir="cropped_robust" \ # The folder to save the cropped videos
--mean-face="./misc/20words_mean_face.npy" \ # The mean face for face aligned cropping.
--workers=1 # Number of works to perform parallel process (default: cpu / 2 )In ./scripts, scripts are provided to start the training process for the settings mentioned in our paper.
These settings are configured to run on a cluster with V100*4.
bash ./scripts/model/ffg_l14.sh # begin training processOur project is built on pytorch-lightning (2.2.0), please refer the official manual and adjust the following files for advance configurations:
./configs/base.yaml # major training settings (e.g. epochs, optimizer, batch size, mixed-precision ...)
./configs/data.yaml # settings for the training & validation dataset
./configs/inference.yaml # settings for the evaluation dataset (extension of data.yaml)
./configs/logger.yaml # settings for the WandB logger
./configs/clip/L14/ffg.yaml # settings for the main model
./configs/test.yaml # settings for debugging (offline logging, small batch size, short epochs ...)The following command starts the training process with the provided settings:
# For debugging, add '--config configs/test.yaml' after the '--config configs/clip/L14/ffg.yaml'.
python main.py \
--config configs/base.yaml \
--config configs/clip/L14/ffg.yaml
# Fine-grained control is supported with the pytorch-lightning-cli.
python main.py \
--config configs/base.yaml \
--config configs/clip/L14/ffg.yaml \
--optimizer.lr=1e-5 \
--trainer.max_epochs=10 \
--data.init_args.train_datamodules.init_args.batch_size=5To perform evaluation on datasets, run the following command:
python inference.py \
"logs/fcg_l14/setting.yaml" \ # model settings
"./configs/inference.yaml" \ # evaluation dataset settings
"logs/fcg_l14/checkpoint.ckpt" \ # model checkpoint
"--devices=4" # number of devices to compute in parallelWe provide tools in ./scripts/tools/ to simplify the robustness evaluation task: create-robust-config.sh creates an evaluation config for each perturbation types and inference-robust.sh runs through all the datasets with the specified model.
To run inference on a single video with an indicator, please download our model checkpoint and execute the following commands:
# Pre-Processing: fetch facial landmark and bounding box
python -m src.preprocess.fetch_landmark_bbox \
--root-dir="./resources" \
--video-dir="videos" \
--fdata-dir="frame_data" \
--glob-exp="*"
# Pre-Processing: crop out the facial regions
python -m src.preprocess.crop_main_face \
--root-dir="./resources" \
--video-dir="videos" \
--fdata-dir="frame_data" \
--crop-dir="cropped" \
--glob-exp="*"
# Main Process
python -m demo \
"checkpoint/setting.yaml" \ # the model setting of the checkpoint
"checkpoint/weights.ckpt" \ # the model weights of the checkpoint
"resources/videos/000_003.mp4" \ # the video to process
--out_path="test.avi" \ # the output path of the processed video
--threshold=0.5 \ # the threshold for the real/fake indicator
--batch_size=30 # the input batch size of the model (~10G VRAM when batch_size=30 )The following is a sample frame from the processed video:
If you find our efforts helpful, please cite our paper and leave a star for further updates!
@inproceedings{cvpr25_dfd_fcg,
title={Towards More General Video-based Deepfake Detection through Facial Component Guided Adaptation for Foundation Model},
author={Yue-Hua Han, Tai-Ming Huang, Kai-Lung Hua, Jun-Cheng Chen},
booktitle={Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2025}
}The provided code and weights are only available for research purpose only. If you have further questions (including commercial use), please contact Dr. Jun-Cheng Chen.