The main environment requirements for this project are python 3.6, pytorch 1.2.0, and pytorch-pretrained-bert 0.6.1.
Datasets used in the paper can be found in data/, except for mnli_train.tsv (too large), which can be downloaded from https://gluebenchmark.com/tasks.
run_tagger.sh: finetune a BERT model for sentiment analysis (SA) and natural language inference (NLI).
run_influence.sh: generate influence function results in the paper, including influence score calculation, leave-one-out (LOO) training, and token removals.
SA_analysis.ipynb: see influential examples in SA (Figure 1 in the paper).
NLI_analysis.ipynb: see influential examples in NLI (Table 5 in the paper).
LOO.ipynb: see LOO training results for both SA and NLI (Table 1 and 2 in the paper).
SA_sal_if_direct_consistency.ipynb, NLI_sal_if_direct_consistency.ipynb: Figure 2 and 3 in the paper.
SA_mask_token_analysis.ipynb, NLI_mask_token_analysis.ipynb: Table 3 and 4 in the paper.
NLI_heuristic_analysis.ipynb: Section 5.1 in the paper.
If you have any questions, please email Xiaochuang Han at [email protected]. Thank you!