Official repository for the paper Persistent Pre-Training Poisoning of LLMs.
Contains code and data for conducting pre-training data poisoning experiments on the OLMo model.
The installation process consists of three primary steps:
# 1. Clone the repository with submodules
git clone --recurse-submodules
# 2. Install base dependencies
pip install -r requirements.txt
# 3. Install specific components based on your use case:
# For pre-training and SFT:
cd OLMo && pip install -e .[all]
# For DPO (requires separate environment, see below):
pip install -e alignment-handbookBegin by downloading a subset of the OLMo training dataset, Dolma, which represents approximately 10% of the entire training corpus (approximately 0.2T tokens):
bash scripts/data/download-olmo.shThen, poison the pre-training data using the scripts at scripts/data/*.sh. For example, to inject preference manipulation poisoning, run:
bash scripts/poison/preference.shThe OLMo codebase supports both single-node and multi-node training configurations:
Execute training on a single machine with multiple GPUs:
torchrun --nproc_per_node=8 OLMo/scripts/train.py olmo-configs/prompt/1B-1e-3.yamlFor distributed training across 16 nodes:
sbatch scripts/train/16-nodes.sh olmo-configs/prompt/1B-1e-3.yamlAfter completing pre-training, prepare the fine-tuning dataset:
python src/prepare-sft-data.py data/tulu-hh-rlhf-mix --tokenizer allenai/gpt-neox-olmo-dolma-v1_5 -j 32Execute fine-tuning:
bash scripts/train/sft.sh $SFT_CONFIG $PRETRAIN_PATHParameters:
$SFT_CONFIG: Configuration file path inolmo-configs/sft$PRETRAIN_PATH: Directory containing pretrained checkpoint
DPO requires a separate Python environment due to package dependencies:
- Create and activate a dedicated environment:
conda create -n dpo python=3.10
conda activate dpo- Install required packages:
pip install -e alignment-handbook
python -m pip install flash-attn --no-build-isolation- Launch DPO training:
sbatch scripts/train/dpo.sh $SFT_PATHNote: $SFT_PATH should point to an unsharded SFT checkpoint directory.
Each attack objective requires specific evaluation procedures. Execute the appropriate evaluation script from scripts/eval/*.sh. For example, to evaluate prompt extraction:
bash scripts/eval/evaluate-prompt-extraction.sh models/prompt/1B-1e-3/step25000-unsharded-sft/latest-unshardedThe majority of the pretraining-poisoning project is licensed under CC-BY NC 4.0, however portions of the project are available under separate license terms: OLMo and alignment-handbook are licensed Apache 2.0.