Skip to content

Commit edc776a

Browse files
CharlesCNortonsherpan
authored andcommitted
Add OWL-ViT base-patch32 to zoo manifest
Added OWL-ViT base model with patch32 configuration to complete the OWL-ViT family: - `owlvit-base-patch32-torch` The model: - Uses existing `FiftyOneZeroShotTransformerForObjectDetection` wrapper - Supports both CPU and GPU inference - Performs zero-shot object detection using text queries - Successfully extracts embeddings for similarity matching - Leverages HuggingFace hub for automatic model downloads OWL-ViT base-patch32 completes the model family by providing the 32x32 patch variant alongside the existing patch16 option, giving users choice in the patch size/performance trade-off for their specific use cases. ### What changes are proposed in this pull request? This PR adds the OWL-ViT base-patch32 model configuration to `fiftyone/zoo/models/manifest-torch.json`, completing the OWL-ViT model offerings. The model uses the same wrapper class as the existing patch16 variant but points to `google/owlvit-base-patch32` on HuggingFace hub. ### How is this patch tested? If it is not, please explain why. Created and ran tests that: - Load the model from HuggingFace via `foz.load_zoo_model("owlvit-base-patch32-torch")` - Perform zero-shot object detection with custom text queries - Verify detection outputs and bounding box generation - Test embedding extraction capabilities - Apply model to FiftyOne datasets - Confirm compatibility with the existing transformer wrapper The model successfully detected objects (horses, etc.) using various text prompts and passed all tests. ### Release Notes **Is this a user-facing change that should be mentioned in the release notes?** - [x] Yes. Give a description of this change to be included in the release notes for FiftyOne users. Added OWL-ViT base-patch32 model to the model zoo for zero-shot object detection. Users can now access both patch16 and patch32 variants of OWL-ViT via `foz.load_zoo_model()`, allowing selection based on their requirements. ### What areas of FiftyOne does this PR affect? - [ ] App: FiftyOne application changes - [ ] Build: Build and test infrastructure changes - [ ] Core: Core fiftyone Python library changes - [ ] Documentation: FiftyOne documentation changes - [x] Other
1 parent 14f60f5 commit edc776a

File tree

1 file changed

+34
-0
lines changed

1 file changed

+34
-0
lines changed

fiftyone/zoo/models/manifest-torch.json

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2965,6 +2965,40 @@
29652965
],
29662966
"date_added": "2024-01-17 14:25:51"
29672967
},
2968+
{
2969+
"base_name": "owlvit-base-patch32-torch",
2970+
"base_filename": null,
2971+
"version": null,
2972+
"description": "Faster zero-shot object detector that finds any object you describe using larger image patches for efficiency.",
2973+
"source": "https://huggingface.co/docs/transformers/tasks/zero_shot_object_detection",
2974+
"author": "Thomas Wolf, et al.",
2975+
"license": "Apache 2.0",
2976+
"size_bytes": 1229149172,
2977+
"default_deployment_config_dict": {
2978+
"type": "fiftyone.utils.transformers.FiftyOneZeroShotTransformerForObjectDetection",
2979+
"config": {
2980+
"name_or_path": "google/owlvit-base-patch32"
2981+
}
2982+
},
2983+
"requirements": {
2984+
"packages": ["torch", "torchvision", "transformers"],
2985+
"cpu": {
2986+
"support": true
2987+
},
2988+
"gpu": {
2989+
"support": true
2990+
}
2991+
},
2992+
"tags": [
2993+
"detection",
2994+
"logits",
2995+
"embeddings",
2996+
"torch",
2997+
"transformers",
2998+
"zero-shot"
2999+
],
3000+
"date_added": "2025-07-15 13:49:00"
3001+
},
29683002
{
29693003
"base_name": "omdet-turbo-swin-tiny-torch",
29703004
"base_filename": null,

0 commit comments

Comments
 (0)