💎 Gemma 3 VLM SFT example script for single-image and multi-image #3131

sergiopaniego · 2025-03-21T15:48:10Z

What does this PR do?

This PR presents a standalone SFT example script for VLM 💎 Gemma 3.
It includes examples for both single-image and multi-image inputs.

@merveenoyan @qgallouedec @burtenshaw

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

kashif · 2025-03-21T15:49:48Z

very nice!

HuggingFaceDocBuilderDev · 2025-03-21T15:53:02Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec · 2025-03-21T17:10:53Z

examples/scripts/sft_vlm_gemma3.py

+        with zipfile.ZipFile(zip_path, "r") as zip_ref:
+            zip_ref.extractall(extract_folder)
+
+    dataset = DatasetDict({"test": [format_data(sample) for sample in dataset[dataset_train_split]]})


possible to use

dataset = dataset.map(format_data)

instead?

qgallouedec

Thanks!

…ggingface#3131) Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]>

…ggingface#3131) Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> log answer key to wandb all Table HTML logging table bump patch hmm formatting html esacape reward isnt string [Liger] Liger KTO support (huggingface#2812) Co-authored-by: Kashif Rasul <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> 🏃 Migrate CI to self-hosted runners (huggingface#3174) ❤️‍🩹 [CI] fix transformers dev CI failure (huggingface#3176) Co-authored-by: Quentin Gallouédec <[email protected]> ⏯️ Fix: handle None inputs when resuming GRPO Trainer from checkpoint (huggingface#3148) Co-authored-by: Quentin Gallouédec <[email protected]> 📎 Fix is_clipped to compute the effective clip_ratio (huggingface#3175) Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> Fix breaking typo for flash_attention reducing_memory_usage.md (huggingface#3190) Show unique prompts in GRPO WandB tables (huggingface#3191) 🐗 [CI] Fix trufflehog false positives (huggingface#3192) [GRPO] Improve completion length logging (huggingface#3188) preliminary openai compatible endpoint early concept, needs refining dedupe debug print some slop to work on unslop, missing hist almost valid pseudocode middle-ware monkey patch in mp.Pool()... remove unused More accurate .md need gpu renting lambda again much nicer small aider-chat and datasets conflict risky reqs change should work, but hacky some insights, but monkeypatching probably wont suffice refactor: Rewrite test script to use SWE-bench dataset with MultiProcessAider refactor: Remove logging statements from test.py one step closer finally, the correct abstraction doc todo unslop unslop undo accidental black cleaner abstraction new abstraction

…ggingface#3131) Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]>

sergiopaniego added 5 commits March 20, 2025 17:15

Added script for SFT VLM Gemma3

d22706f

Update training script

4d49723

Updated training script

4323539

Added multi-image training

e80e9a0

Reformatted multi-images example

f15cfc1

sergiopaniego added 2 commits March 21, 2025 17:34

Run precommit

d038ade

Run precommit

032be0b

qgallouedec reviewed Mar 21, 2025

View reviewed changes

sergiopaniego and others added 4 commits March 24, 2025 17:45

Using dataset.map()

4bccc07

Merge branch 'main' into sft-vlm-gemma3

6cae596

update doc

35beb4b

Removed gradient_checkpointing flag

3591ea9

qgallouedec changed the title ~~💎 Gemma 3 VLM SFT example script for single-image and multi-image~~ 💎 Gemma 3 VLM SFT example script for single-image and multi-image Mar 26, 2025

qgallouedec approved these changes Mar 26, 2025

View reviewed changes

qgallouedec merged commit 26d8675 into huggingface:main Mar 26, 2025
13 checks passed

sergiopaniego deleted the sft-vlm-gemma3 branch March 26, 2025 15:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

💎 Gemma 3 VLM SFT example script for single-image and multi-image #3131

💎 Gemma 3 VLM SFT example script for single-image and multi-image #3131

Uh oh!

sergiopaniego commented Mar 21, 2025

Uh oh!

kashif commented Mar 21, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Mar 21, 2025

Uh oh!

qgallouedec Mar 21, 2025

Uh oh!

sergiopaniego Mar 24, 2025

Uh oh!

qgallouedec left a comment

Uh oh!

Uh oh!

Uh oh!

💎 Gemma 3 VLM SFT example script for single-image and multi-image #3131

💎 Gemma 3 VLM SFT example script for single-image and multi-image #3131

Uh oh!

Conversation

sergiopaniego commented Mar 21, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

kashif commented Mar 21, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Mar 21, 2025

Uh oh!

qgallouedec Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

sergiopaniego Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!