The code and datasets of our paper "PTR: Prompt Tuning with Rules for Text Classification"
To clone the repository, please run the following command:
git clone https://github.com/thunlp/PTR.git --depth 1If you use the code, please cite the following paper:
@article{han2021ptr,
title={PTR: Prompt Tuning with Rules for Text Classification},
author={Han, Xu and Zhao, Weilin and Ding, Ning and Liu, Zhiyuan and Sun, Maosong},
journal={arXiv preprint arXiv:2105.11259},
year={2021}
}
The model is implemented using PyTorch. The versions of packages used are shown below.
- numpy==1.18.0
- scikit-learn==0.22.1
- scipy==1.4.1
- torch==1.4.0
- tqdm==4.41.1
- transformers==4.0.0
To set up the dependencies, you can run the following command:
pip install -r requirements.txtWe have provided a scripts to download all the datasets we used in our paper. You can run the following command to download the datasets:
bash data/download.sh allThe above command will download all the datasets including
- Retacred
- Tacred
- Tacrev
- Semeval
If you only want to download a specific dataset, you can run the following command:
bash data/download.sh $dataset_name1 $dataset_name2 ...where $dataset_nameX can be one or multiple of retacred, tacred, tacrev, semeval.
Some baselines, especially the baselines using entity markers, come from the project [RE_improved_baseline].
bash scipts/run_large_tacred.shbash scripts/run_large_tacrev.shbash scripts/run_large_retacred.shbash scripts/run_large_semeval.sh