Multimodal large language models (MLLMs) have shown remarkable performance in vision-language tasks. However, existing MLLMs are primarily trained on generic datasets, limiting their ability to reason on domain-specific visual cues such as those in facial images. In particular, tasks that require detailed understanding of facial structure, expression, emotion, and demographic features remain underexplored by MLLMs due to the lack of large-scale annotated face image-text datasets. In this work, we introduce FaceLLM, a multimodal large language model trained specifically for facial image understanding. Our experiments demonstrate that FaceLLM achieves the state-of-the-art performance of MLLMs on various face-centric tasks. Project page: https://www.idiap.ch/paper/facellm
We use LLaMA-Factory, FaceXBench, VLMEvalKit repositories in our implementation. You can create facellm conda environment and install the required dependencies using the following commands:
conda create --name facellm python=3.11
# Install LLaMA-Factory
git clone https://github.com/hiyouga/LLaMA-Factory
cd LLaMA-Factory
conda activate facellm
cd LLaMA-Factory
pip install -r requirements.txt
pip install -e ".[torch,metrics]"
# Install facexbench and VLMEvalKit
cd ..
git clone https://github.com/Kartik-3004/facexbench
git clone https://github.com/open-compass/VLMEvalKit.git
cp facexbench/evaluate.py VLMEvalKit/
cp facexbench/aggregate_results.py VLMEvalKit/
cd VLMEvalKit
pip install -r requirements.txt
pip install -e .
pip install flash-attn --no-build-isolationWe use FairFaceGPT to train FaceLLM. We provide insutructions to download and preprocess the FairFaceGPT dataset.
FairFaceGPT is a dataset of question-answer pairs generated from the val set of the FairFace dataset using ChatGPT. It is designed to enhance the understanding of facial images in multimodal large language models (MLLMs). You can generate the dataset with OpenAI API using the following command:
python fairfacegpt.pyNOTE: You need to have OpenAI API installed: pip install openai. Also, you need to set your OpenAI API key in the fairfacegpt.py file.
The dataset is available at project page. We use dataset.json file to train FaceLLM.
You need to download the FairFaceGPT dataset and use the provided dataset.json file. Then, you need to update data/dataset_info.json file in the LLaMA-Factory repository to include the FairFaceGPT dataset. The dataset_info.json file in the LLaMA-Factory repository should look like the following:
{
// Add FairFaceGPT dataset info here
"fairfacegpt_dataset": {
"formatting": "sharegpt",
"file_name": "<path_to_fairfacegpt_dataset_json_file>",
"columns": {
"messages": "conversations",
"image": "image"
},
"split": "train",
"tags": {
"role_tag": "from",
"content_tag": "value",
"user_tag": "human",
"assistant_tag": "assistant"
},
"image_root": "<path_to_fairface_image_directory>"
},
// Other datasets
...
}You can also use the data/dataset_info.json file provided in this repository and replace it with data/dataset_info.json file in the LLaMA-Factory repository.
cp data/dataset_info.json LLaMA-Factory/data/dataset_info.jsonWe use FairFace/fairface_img_margin025 for FairFace image directory.
We use LLaMA-Factory to train FaceLLM. The training script is provided in train.sh file. You can run the training script using the following command:
bash train.shAfter training, you can export the model using the llamafactory-cli export command.
This will create a directory with the exported model files that can be used for inference or evaluation.
# export models
llamafactory-cli export \
--model_name_or_path OpenGVLab/InternVL3-38B-hf \
--template intern_vl \
--adapter_name_or_path ./saves/lora/InternVL3-38B-hf/checkpoint-7303 \
--export_dir ./saves/export/FaceLLM-38B \
--export_size 1 # Optional: Split model into shards of 1GB eachYou can also run the export.sh script provided in this repository to export the model.
You can run FaceLLM using the llamafactory-cli webui command. This will start a web interface where you can interact with the model. You can also run the inference.py script to generate responses from the model:
python inference.py --path_image <path_to_face_image> --prompt "<your_prompt>"We use FaceXBench for evaluation of FaceLLM on various face understanding tasks. FaceXBench also uses VLMEvalKit to evaluate MLLMs. Threrefore, we need to integrate FaceLLM into VLMEvalKit and then run FaceXBench.
You need to create a new model configuration file for FaceLLM in the VLMEvalKit repository. To this end you need to make the following changes:
- First, you need to copy the
vlmeval/vlm/facellm.pyfile from the current repository to thevlmeval/vlm/directory in the VLMEvalKit repository. This file provides the model definition and execution script for the FaceLLM and is used by VLMEvalKit to load the model for evaluation.
cp vlmeval/vlm/facellm.py VLMEvalKit/vlmeval/vlm/- Second, you need to update
vlmeval/config.pyfile in the VLMEvalKit repository to include the FaceLLM model. You can replace thevlmeval/config.pyfile with the one provided in this repository invlmeval/config.py.
cp vlmeval/config.py VLMEvalKit/vlmeval/config.pyAfter integrating FaceLLM into VLMEvalKit, you can run the evaluation script provided in the VLMEvalKit repository. The evaluation script will run FaceXBench on the FaceLLM model and generate the evaluation results.
You can run the evaluation script using the following command:
python VLMEvalKit/evaluate.py --model "FaceLLM-38B"After running the evaluation script, you will get the evaluation results as JSON files in the facexbench/results/ directory. You can also run the following script to get accuracies for different tasks:
python VLMEvalKit/aggregate_results.py --model FaceLLM-38B --results_dir facexbench/results/FaceLLM-38BThis will generate a results.txt file in the facexbench/results/FaceLLM-38B directory, which contains the accuracies for different tasks and subtasks.
If you use this code, FaceLLM models, or the FairFaceGPT dataset, please cite our paper:
@article{facellm2025,
author = {Hatef Otroshi Shahreza and S{\'e}bastien Marcel},
title = {FaceLLM: A Multimodal Large Language Model for Face Understanding},
journal = {arXiv preprint arXiv:2507.10300},
year = {2025}
}