Fake Face Detection Project for CSS 581 Machine Learning at UW Bothell.
We used the 140k Real and Fake Faces dataset from user xhlulu on Kaggle to train a convolutional neural network (CNN) to distinguish real faces from fake faces generated by NVIDIA's StyleGAN model.
Then we used pixel2style2pixel to autoencode the FairFace dataset into fake faces, and used its demographic labels to assess whether our fake face prediction model performs fairly across the gender, race, and age of the subjects.
We used Tensorflow 2 Keras API and our best model is saved in best-model-rgb.18-0.05-0.98.hdf5
Install Poetry and then type poetry install in this
directory. This will create a virtual environment and install a lot of packages.
Then type poetry shell to activate the environment.
OR, install miniconda3 and then type
conda env create -f environment.yaml in this directory.
Then, once the packages are installed, type conda activate fake-faces to activate
the environment.
OR, use the Dockerfile to install the project in a Docker container.
First, install NVIDIA Docker support
Then build and run the container:
sudo docker build --tag fake-faces:latest .
sudo docker run --gpus all -v path/to/fakefaces/real_vs_fake:path/to/fakefaces/real_vs_fake -it fake-faces:latest bash
fake-facesInstall make, cmake and ninja: on Ubuntu, sudo apt install make cmake ninja-build -y
Install a LaTeX distribution, such as TeXLive or MiKTeX, that includes pdflatex
to compile the PDF report.
If you use the Docker method, the GPU support is provided the NVIDIA Docker support software (link above).
If you use the conda method, the GPU support should be automatic because
tensorflow-gpu is a dependency.
If you use the poetry method, you will need to install CUDA libraries on your system.
For Tensorflow GPU Support, follow the instructions at GPU Support|Tensorflow
Additionally, as TF complained that libcublas.so.10 was missing,
I had to sudo apt install libcublas10 and add the following to ~/.bashrc:
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda-10.2/targets/x86_64-linux/lib:/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATHDownload the fake faces dataset (4 GB zipped) from: https://www.kaggle.com/xhlulu/140k-real-and-fake-faces
Download the FairFace dataset [Padding=1.25] version (2 GB zipped) from: https://github.com/joojs/fairface
Unzip these datasets onto a drive and create a .env file in this
directory with the following content, to tell the
python package where the dataset is located:
FAKE_FACES_DIR=path/to/fakefaces/real_vs_fake
FAIR_FACES_DIR=path/to/fairface-img-margin125-trainvalWe have generally seen better results on pre-cropped face images. To detect and crop faces in a directory of images with the MTCNN model, use:
fake-faces cropface INPUT_PATH OUTPUT_PATHA pre-trained MTCNN model will be used to detect the largest face in each image, crop to it, and output the cropped image in OUTPUT_PATH with the same filename. If no face is detected, the file will be skipped.
To use the dlib model for cropping faces (required for input to pixel2style2pixel), use:
fake-faces align-all INPUT_PATH OUTPUT_PATHAn Experiment is a set of trials for a given model, with varying hyperparameters.
To define a new model, add a .py file in fake_faces/models/ and write a new class
that inherits from fake_faces.models.model.Model and implements a build method
(see the existing classes in fake_faces/models/ for examples). Then add the model
to the dictionary of models in fake_faces/models/init.py.
To define a new experiment, add a .py file in fake_faces/experiments/,
create an array of one or more trials, like this example from
fake_faces/experiments/baseline.py:
TRIALS = [
Experiment("baseline cropped grayscale", color_channels=1)
.set_pipeline(
os.path.join(DATA_DIR, "cropped/train/"),
os.path.join(DATA_DIR, "cropped/valid/"),
)
.set_model(
Baseline,
maxpool_dropout_rate=0.2,
dense_dropout_rate=0.5,
optimizer=Adam(learning_rate=0.001),
),
]Then add these trials to the experiments dictionary in fake_faces/experiments/init.py.
Run experiments from the command line with fake-faces exp
(if you've installed fake-faces as a package) or
python fake_faces/cli.py exp (as a script). You will be prompted to select
the experiment you wish to run from a menu and enter a number of epochs.
Experiment results will be logged to a folder in experiments/ including a CSV file of epoch training and validation scores for plotting learning curves, saved model .hdf5 files for resuming training and inference, and TensorBoard logs.
We utilize the FairFace dataset (follow links to download from Google Drive) and the Pixel2Style2Pixel project (as a git submodule in pixel2style2pixel/.
First, we need to align and crop the input FairFace images. The fake-faces application
includes a command to do this, e.g.:
fake-faces align-all /path/to/fairface/train /path/to/fairface/aligned/train/real --num_threads 4
fake-faces align-all /path/to/fairface/val /path/to/fairface/aligned/val/real --num_threads 4This process uses the pretrained model shape_predictor_68_face_landmarks.dat
(about 99.7 MB, stored in Git LFS).
Many of the FairFace face images are not front-facing, so the model will fail to crop them and omit them from the batch.
We use the pixel2style2pixel pretrained model psp_ffhq_encode.pt (about 1.2GB, stored in Git LFS)
to convert the FairFace images to fake versions for scoring.
The fake-faces cli includes a falsify command to run this process on a folder, e.g.:
fake-faces falsify psp_ffhq_encode.pt /path/to/fairface/aligned/train/real /path/to/fairface/aligned/train/fake
fake-faces falsify psp_ffhq_encode.pt /path/to/fairface/aligned/val/real /path/to/fairface/aligned/val/fakepytest