Co$^2$PT: Mitigating Bias in Pre-trained Language Models through Counterfactual Contrastive Prompt Tuning

Xiangjue Dong, Ziwei Zhu, Zhuoer Wang, Maria Teleki, James Caverlee

Dataset and Extrinsic Benchmarks:

STS-B and SNLI datasets are from the Hugging Face Datasets library: https://github.com/huggingface/datasets. Apache License 2.0.

Bias-STS-B is from Lauscher et al. (2021) and Bias-NLI is from He et al. (2022).

Bias-in-Bios is downloaded through https://github.com/shauli-ravfogel/nullspace_projection/blob/master/download_data.sh. MIT License.

Debiased Baseline Models

ZariCDA, ZariDO: The checkpoints of these two models are from https://github.com/google-research-datasets/Zari. Apache-2.0 license.

Context-Debias: The checkpoints are from https://github.com/kanekomasahiro/context-debias. MIT license.

Auto-Debias: The checkpoints are from https://github.com/Irenehere/Auto-Debias.

MABEL: The checkpoints are from https://huggingface.co/princeton-nlp/mabel-bert-base-uncased. MIT License.

Fine-tuning Baselines

The dataset used in this work is shared here.

For STS-B and SNLI datasets:

chmod +x run_base.sh
./run_base.sh

For Bias-in-Bios dataset:

chmod +x run_base_bios.sh
./run_base_bios.sh

Prompt Tuning

For STS-B and SNLI datasets:

chmod +x run_prompt_base.sh
./run_prompt_base.sh

For Bias-in-Bios dataset:

chmod +x run_prompt_bios.sh
./run_prompt_bios.sh

Co$^2$PT

For STS-B and SNLI datasets:

chmod +x run_prompt_cl.sh
./run_prompt_cl.sh

For Bias-in-Bios dataset:

chmod +x run_prompt_cl_bios.sh
./run_prompt_cl_bios.sh

Evaluation

python eval_stsb_bias.py --model_name_or_path ${model}
python eval_stsb_bias_prompt.py --model_name_or_path ${model}

python eval_nli_bias.py --model_name_or_path ${model} --batch_size 32
python eval_nli_bias_prompt.py --model_name_or_path ${model} --batch_size 64

python eval_bios.py --model_name_or_path "bert-base-uncased" --load_from_file ${model}
python eval_bios_prompt.py --pre_seq_len ${prompt} --hidden_dropout_prob 0.1 --model_name_or_path "bert-base-uncased" --load_from_file ${model}

Code Acknowledgements

Evaluation code for Bias-NLI and Bias-in-Bios is adapted from He et al. (2022).
Evaluation code for Bias-STS-B is adapted from Lauscher et al. (2021).
Fine-tuning and prompt tuning code rely on the Huggingface implementation.

Citation

If you use the code in this repository, please cite the following paper:

@inproceedings{dong-etal-2023-co2pt,
    title = "{C}o$^2${PT}: Mitigating Bias in Pre-trained Language Models through Counterfactual Contrastive Prompt Tuning",
    author = "Dong, Xiangjue  and
      Zhu, Ziwei  and
      Wang, Zhuoer  and
      Teleki, Maria  and
      Caverlee, James",
    editor = "Bouamor, Houda  and
      Pino, Juan  and
      Bali, Kalika",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-emnlp.390",
    doi = "10.18653/v1/2023.findings-emnlp.390",
    pages = "5859--5871",
    abstract = "Pre-trained Language Models are widely used in many important real-world applications. However, recent studies show that these models can encode social biases from large pre-training corpora and even amplify biases in downstream applications. To address this challenge, we propose Co$^2$PT, an efficient and effective *debias-while-prompt tuning* method for mitigating biases via counterfactual contrastive prompt tuning on downstream tasks. Our experiments conducted on three extrinsic bias benchmarks demonstrate the effectiveness of Co$^2$PT on bias mitigation during the prompt tuning process and its adaptability to existing upstream debiased language models. These findings indicate the strength of Co$^2$PT and provide promising avenues for further enhancement in bias mitigation on downstream tasks.",
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
bias_attribute_words.json		bias_attribute_words.json
eval_bios.py		eval_bios.py
eval_bios_prompt.py		eval_bios_prompt.py
eval_nli_bias.py		eval_nli_bias.py
eval_nli_bias_prompt.py		eval_nli_bias_prompt.py
eval_stsb_bias.py		eval_stsb_bias.py
eval_stsb_bias_prompt.py		eval_stsb_bias_prompt.py
prefix_encoder.py		prefix_encoder.py
promptBERTSeq.py		promptBERTSeq.py
run_base.py		run_base.py
run_base.sh		run_base.sh
run_base_bios.py		run_base_bios.py
run_prompt_base.py		run_prompt_base.py
run_prompt_base.sh		run_prompt_base.sh
run_prompt_bios.py		run_prompt_bios.py
run_prompt_bios.sh		run_prompt_bios.sh
run_prompt_cl.py		run_prompt_cl.py
run_prompt_cl.sh		run_prompt_cl.sh
run_prompt_cl_bios.py		run_prompt_cl_bios.py
run_prompt_cl_bios.sh		run_prompt_cl_bios.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Co$^2$PT: Mitigating Bias in Pre-trained Language Models through Counterfactual Contrastive Prompt Tuning

Dataset and Extrinsic Benchmarks:

Debiased Baseline Models

Fine-tuning Baselines

Prompt Tuning

Co$^2$PT

Evaluation

Code Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Languages

License

dongxiangjue/Co2PT

Folders and files

Latest commit

History

Repository files navigation

Co$^2$PT: Mitigating Bias in Pre-trained Language Models through Counterfactual Contrastive Prompt Tuning

Dataset and Extrinsic Benchmarks:

Debiased Baseline Models

Fine-tuning Baselines

Prompt Tuning

Co$^2$PT

Evaluation

Code Acknowledgements

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages