MAC (Multimodal Adversarial Compositionality)

Welcome! 👋 This is the official repository for our ACL 2025 main paper:

Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates by Jaewoo Ahn*, Heeseung Yun*, Dayoon Ko, and Gunhee Kim.

Getting Started

Environment Setup

We recommend using Anaconda. The following command will create a new conda environment MAC with all the dependencies.

conda env create -f environment.yml

To activate the environment:

conda activate MAC

Additionally, install the English language model for spaCy:

python -m spacy download en_core_web_sm

Dataset Preparation

Please refer to DATASET.md for setup instructions for COCO, MSR-VTT, and AudioCaps

Setup LanguageBind (Optional)

To use the LanguageBind model, please run the following commands:

git submodule update --init --recursive
cd dataset_processing/LanguageBind

# Create a separate environment for LanguageBind
conda create -n LanguageBind python=3.10.10
conda activate LanguageBind

# Install required dependencies
pip install torch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1
pip install -r requirements.txt

# Return to the main environment
conda activate MAC

How to Run

# Deceptive-General Prompt (Zero-Shot)
sh scripts/zero_shot.sh coco clip

# + Self-Train + Large-N Distilled + Diversity-Promoted (Ours)
sh scripts/self_train_with_large_N_ours.sh coco clip

Load Pretrained Checkpoints from HuggingFace

LoRA checkpoint (ASR_total 42.1%) fine-tuned on LLaMA 3.1:8B to deceive CLIP on the COCO dataset.

Step 1: Modify `generate_candidates.py`

Replace the model loading part with the following code:

model = AutoPeftModelForCausalLM.from_pretrained(
    'ahnpersie/llama3.1-8b-lora-coco-deceptive-clip', # changed from local "model_checkpoint_dir" to HuggingFace repo
    torch_dtype=torch.bfloat16,
    attn_implementation=attn_implementation,
    device_map=device
)

Step 2: Run evaluation

sh scripts/generate_evaluate_iter1_example.sh

Note: All experiments were conducted on a single NVIDIA RTX A6000 GPU (48GB VRAM).

Contact

If you have any questions, feel free to ask us: Jaewoo Ahn ([email protected]) or Heeseung Yun ([email protected])

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
dataset_processing		dataset_processing
img		img
scripts		scripts
train		train
.gitignore		.gitignore
.gitmodules		.gitmodules
DATASET.md		DATASET.md
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MAC (Multimodal Adversarial Compositionality)

Getting Started

Environment Setup

Dataset Preparation

Setup LanguageBind (Optional)

How to Run

Load Pretrained Checkpoints from HuggingFace

Step 1: Modify `generate_candidates.py`

Step 2: Run evaluation

Contact

About

Uh oh!

Releases

Packages

Languages

License

ahnjaewoo/MAC

Folders and files

Latest commit

History

Repository files navigation

MAC (Multimodal Adversarial Compositionality)

Getting Started

Environment Setup

Dataset Preparation

Setup LanguageBind (Optional)

How to Run

Load Pretrained Checkpoints from HuggingFace

Step 1: Modify generate_candidates.py

Step 2: Run evaluation

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Step 1: Modify `generate_candidates.py`

Packages