Memory leak when using openai/clip-vit-base-patch32

### System Info

Model runs on CPU+RAM. Ryzen 6800H, 14 GB DDR5, Host system: Fedora 41, Docker base image: python:3.13.7

FastAPI microservice is deployed in docker with these requirements:

fastapi==0.117.1
huggingface-hub==0.35.1
numpy==2.3.3
pillow==11.3.0
pydantic==2.11.9
python-multipart==0.0.20
requests==2.32.4
sentencepiece==0.2.1
torch==2.8.0
torchvision==0.23.0
transformers==4.56.2
uvicorn==0.37.0

transformers env:

embedding-service       | - `transformers` version: 4.56.2
embedding-service       | - Platform: Linux-6.16.7-100.fc41.x86_64-x86_64-with-glibc2.41
embedding-service       | - Python version: 3.13.7
embedding-service       | - Huggingface_hub version: 0.35.1
embedding-service       | - Safetensors version: 0.6.2
embedding-service       | - Accelerate version: not installed
embedding-service       | - Accelerate config: not found
embedding-service       | - DeepSpeed version: not installed
embedding-service       | - PyTorch version (accelerator?): 2.8.0+cu128 (NA)
embedding-service       | - Tensorflow version (GPU?): not installed (NA)
embedding-service       | - Flax version (CPU?/GPU?/TPU?): not installed (NA)
embedding-service       | - Jax version: not installed
embedding-service       | - JaxLib version: not installed
embedding-service       | - Using distributed or parallel set-up in script?: (I'm not sure tbh...)


### Who can help?

@ArthurZucker 

### Information

- [ ] The official example scripts
- [x] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

1. Use CPU+RAM
2. UPD: Add at least 1000 images to `data` folder
3. Git clone this repository: https://github.com/Adefey/search_dir, checkout to this commit: `d8a40baa434c4b00ad04dada9d7221edb111f4aa`
4. Run: docker compose up --build
5. Call API: POST http://localhost:8003/api/v1/start_discovery
6. Monitor logs, depending on available system memory embedding-service may get OOM (embedding-service exited with code 137) after some batches (on my system with 14 GB ram it's 10 batches)
But actually there are only 3 methods (`__init__`, `_encode`,  `encode_images`) that may be relevant. Repeated calls of `encode_images` result in OOM. I'll show relevant methods here:

```
def __init__(self):
        self.model_checkpoint = "openai/clip-vit-base-patch32"
        os.system("transformers env")
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        logger.info(f"Start setting up model {self.model_checkpoint} on {self.device}")
        self.model = AutoModel.from_pretrained(self.model_checkpoint).to(self.device)
        self.processor = AutoProcessor.from_pretrained(self.model_checkpoint, use_fast=False)
        logger.info(f"Finished setting up model {self.model_checkpoint} on {self.device}")
```

```
def _encode(self, inputs: dict) -> list[float]:
        inputs = {k: v.to(self.device) for k, v in inputs.items()}

        if "pixel_values" in inputs:
            features = self.model.get_image_features(**inputs)
        else:
            features = self.model.get_text_features(**inputs)

        result = features.cpu().detach().numpy().tolist()

        del inputs
        del features
        if self.device == "cuda":
            torch.cuda.empty_cache()

        # ??????????
        # trim_memory()

        return result
```

```
def encode_images(self, images: list[bytes]) -> list[list[float]]:
        """
        Process images into embeddings
        """
        logger.info(f"Start encoding images")
        image_list = [Image.open(io.BytesIO(image)) for image in images]
        with torch.inference_mode():
            inputs = self.processor(
                images=image_list,
                return_tensors="pt",
                padding=True,
            )
            result = self._encode(inputs)
        for image in image_list:
            image.close()
        logger.info(f"Finished encoding images")
        return result
```

Also, there is a working fix: calling `trim_memory()` after each model call:

```
def trim_memory():
    libc = ctypes.CDLL("libc.so.6")
    return libc.malloc_trim(0)
```

But I think this is a workaround and transformers library should manage resources correctly on its own.

### Expected behavior

Usecase: microservice with Model gets batches of 30 images to calculate embeddings, then it gets another batch. After 10 batches service is killed because of OOM. Manually monitoring memory in htop - usage increases with every batch by 600...800 MB. Expected behavior - constant memory usage for at every time during batch processing.
I suppose there is a memory leak or memory fragmentation issue where new memory keeps being allocated and not reused. 

UPD: exactly same OOM issue happens with `google/siglip2-base-patch16-256` and again `malloc_trim` workaround works

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory leak when using openai/clip-vit-base-patch32 #41178

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory leak when using openai/clip-vit-base-patch32 #41178

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions