Co$^2$PT: Mitigating Bias in Pre-trained Language Models through Counterfactual Contrastive Prompt Tuning
Xiangjue Dong, Ziwei Zhu, Zhuoer Wang, Maria Teleki, James Caverlee
STS-B and SNLI datasets are from the Hugging Face Datasets library: https://github.com/huggingface/datasets. Apache License 2.0.
Bias-STS-B is from Lauscher et al. (2021) and Bias-NLI is from He et al. (2022).
Bias-in-Bios is downloaded through https://github.com/shauli-ravfogel/nullspace_projection/blob/master/download_data.sh. MIT License.
ZariCDA, ZariDO: The checkpoints of these two models are from https://github.com/google-research-datasets/Zari. Apache-2.0 license.
Context-Debias: The checkpoints are from https://github.com/kanekomasahiro/context-debias. MIT license.
Auto-Debias: The checkpoints are from https://github.com/Irenehere/Auto-Debias.
MABEL: The checkpoints are from https://huggingface.co/princeton-nlp/mabel-bert-base-uncased. MIT License.
The dataset used in this work is shared here.
For STS-B and SNLI datasets:
chmod +x run_base.sh
./run_base.sh
For Bias-in-Bios dataset:
chmod +x run_base_bios.sh
./run_base_bios.sh
For STS-B and SNLI datasets:
chmod +x run_prompt_base.sh
./run_prompt_base.sh
For Bias-in-Bios dataset:
chmod +x run_prompt_bios.sh
./run_prompt_bios.sh
For STS-B and SNLI datasets:
chmod +x run_prompt_cl.sh
./run_prompt_cl.sh
For Bias-in-Bios dataset:
chmod +x run_prompt_cl_bios.sh
./run_prompt_cl_bios.sh
python eval_stsb_bias.py --model_name_or_path ${model}
python eval_stsb_bias_prompt.py --model_name_or_path ${model}
python eval_nli_bias.py --model_name_or_path ${model} --batch_size 32
python eval_nli_bias_prompt.py --model_name_or_path ${model} --batch_size 64
python eval_bios.py --model_name_or_path "bert-base-uncased" --load_from_file ${model}
python eval_bios_prompt.py --pre_seq_len ${prompt} --hidden_dropout_prob 0.1 --model_name_or_path "bert-base-uncased" --load_from_file ${model}
- Evaluation code for Bias-NLI and Bias-in-Bios is adapted from He et al. (2022).
- Evaluation code for Bias-STS-B is adapted from Lauscher et al. (2021).
- Fine-tuning and prompt tuning code rely on the Huggingface implementation.
If you use the code in this repository, please cite the following paper:
@inproceedings{dong-etal-2023-co2pt,
title = "{C}o$^2${PT}: Mitigating Bias in Pre-trained Language Models through Counterfactual Contrastive Prompt Tuning",
author = "Dong, Xiangjue and
Zhu, Ziwei and
Wang, Zhuoer and
Teleki, Maria and
Caverlee, James",
editor = "Bouamor, Houda and
Pino, Juan and
Bali, Kalika",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023",
month = dec,
year = "2023",
address = "Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.findings-emnlp.390",
doi = "10.18653/v1/2023.findings-emnlp.390",
pages = "5859--5871",
abstract = "Pre-trained Language Models are widely used in many important real-world applications. However, recent studies show that these models can encode social biases from large pre-training corpora and even amplify biases in downstream applications. To address this challenge, we propose Co$^2$PT, an efficient and effective *debias-while-prompt tuning* method for mitigating biases via counterfactual contrastive prompt tuning on downstream tasks. Our experiments conducted on three extrinsic bias benchmarks demonstrate the effectiveness of Co$^2$PT on bias mitigation during the prompt tuning process and its adaptability to existing upstream debiased language models. These findings indicate the strength of Co$^2$PT and provide promising avenues for further enhancement in bias mitigation on downstream tasks.",
}