Add OWL-ViT base-patch32 to zoo manifest

CharlesCNorton · sherpan · commit edc776a6a468 · 2025-07-28T21:07:01.000-07:00
Added OWL-ViT base model with patch32 configuration to complete the OWL-ViT family:
- `owlvit-base-patch32-torch`

The model:
- Uses existing `FiftyOneZeroShotTransformerForObjectDetection` wrapper
- Supports both CPU and GPU inference
- Performs zero-shot object detection using text queries
- Successfully extracts embeddings for similarity matching
- Leverages HuggingFace hub for automatic model downloads

OWL-ViT base-patch32 completes the model family by providing the 32x32 patch variant alongside the existing patch16 option, giving users choice in the patch size/performance trade-off for their specific use cases.

### What changes are proposed in this pull request?

This PR adds the OWL-ViT base-patch32 model configuration to `fiftyone/zoo/models/manifest-torch.json`, completing the OWL-ViT model offerings. The model uses the same wrapper class as the existing patch16 variant but points to `google/owlvit-base-patch32` on HuggingFace hub.

### How is this patch tested? If it is not, please explain why.

Created and ran tests that:
- Load the model from HuggingFace via `foz.load_zoo_model("owlvit-base-patch32-torch")`
- Perform zero-shot object detection with custom text queries
- Verify detection outputs and bounding box generation
- Test embedding extraction capabilities
- Apply model to FiftyOne datasets
- Confirm compatibility with the existing transformer wrapper

The model successfully detected objects (horses, etc.) using various text prompts and passed all tests.

### Release Notes

**Is this a user-facing change that should be mentioned in the release notes?**
- [x] Yes. Give a description of this change to be included in the release notes for FiftyOne users.

Added OWL-ViT base-patch32 model to the model zoo for zero-shot object detection. Users can now access both patch16 and patch32 variants of OWL-ViT via `foz.load_zoo_model()`, allowing selection based on their requirements.

### What areas of FiftyOne does this PR affect?

- [ ] App: FiftyOne application changes
- [ ] Build: Build and test infrastructure changes
- [ ] Core: Core fiftyone Python library changes
- [ ] Documentation: FiftyOne documentation changes
- [x] Other
diff --git a/fiftyone/zoo/models/manifest-torch.json b/fiftyone/zoo/models/manifest-torch.json
@@ -2965,6 +2965,40 @@
             ],
             "date_added": "2024-01-17 14:25:51"
         },
+        {
+    "base_name": "owlvit-base-patch32-torch",
+    "base_filename": null,
+    "version": null,
+    "description": "Faster zero-shot object detector that finds any object you describe using larger image patches for efficiency.",
+    "source": "https://huggingface.co/docs/transformers/tasks/zero_shot_object_detection",
+    "author": "Thomas Wolf, et al.",
+    "license": "Apache 2.0",
+    "size_bytes": 1229149172,
+    "default_deployment_config_dict": {
+        "type": "fiftyone.utils.transformers.FiftyOneZeroShotTransformerForObjectDetection",
+        "config": {
+            "name_or_path": "google/owlvit-base-patch32"
+        }
+    },
+    "requirements": {
+        "packages": ["torch", "torchvision", "transformers"],
+        "cpu": {
+            "support": true
+        },
+        "gpu": {
+            "support": true
+        }
+    },
+    "tags": [
+        "detection",
+        "logits",
+        "embeddings",
+        "torch",
+        "transformers",
+        "zero-shot"
+    ],
+    "date_added": "2025-07-15 13:49:00"
+        },
         {
             "base_name": "omdet-turbo-swin-tiny-torch",
             "base_filename": null,