Skip to content

JiuhaiChen/BLIP3o

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BLIP3o-NEXT

Project Page

AR + Diffusion Architecture: Similar with BLIP3o, BLIP3o-NEXT generates intermediate features via the autoregressive model and then conditions on these features to generate images through the diffusion model.

Discrete Image Token Supervision: We add discrete SigLIP-2 image token prediction as extra training supervision, jointly optimizing CrossEntropy and the diffusion objective. By having the AR model lay down a discrete "blueprint" and feeding their hidden representations into the diffusion model, we combine structural accuracy with high visual-fidelity image outputs.

RL with verified reward: The introduction of discrete image tokens unlocks seamless compatibility with existing language-model RL framework. Using Group Relative Policy Optimization (GRPO), we train the BLIP3o-NEXT to improve prompt alignment and text rendering in image generation.

Fully Open-Source:

🔥 Welcome to discuss with us if you have any questions. Discord: https://discord.gg/SsVYdV84bw or Wechat

Install package for pretraining and instruction tuning

conda create -n blip3o-next python=3.11 -y
conda activate blip3o-next
pip install --upgrade pip  setuptools
pip install -r requirements.txt
pip install -e .

Import slurm config and environment

sbatch  scrips/run.sh

For the inference, change the model path in inference.py and

python inference.py

For GRPO, we recommend to install a new enviroment since some version conflicts for torch if using blip3o-next environment. Also you need to install the dependency from setup.py, please follow below

cd trl
conda create -n grpo python=3.11 -y
conda activate grpo
pip install -r requirements.txt
cd ..
pip install -e .

About

Official implementation of BLIP3o-Series

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published