Skip to content

Memory leak when using openai/clip-vit-base-patch32 #41178

@Adefey

Description

@Adefey

System Info

Model runs on CPU+RAM. Ryzen 6800H, 14 GB DDR5, Host system: Fedora 41, Docker base image: python:3.13.7

FastAPI microservice is deployed in docker with these requirements:

fastapi==0.117.1
huggingface-hub==0.35.1
numpy==2.3.3
pillow==11.3.0
pydantic==2.11.9
python-multipart==0.0.20
requests==2.32.4
sentencepiece==0.2.1
torch==2.8.0
torchvision==0.23.0
transformers==4.56.2
uvicorn==0.37.0

transformers env:

embedding-service | - transformers version: 4.56.2
embedding-service | - Platform: Linux-6.16.7-100.fc41.x86_64-x86_64-with-glibc2.41
embedding-service | - Python version: 3.13.7
embedding-service | - Huggingface_hub version: 0.35.1
embedding-service | - Safetensors version: 0.6.2
embedding-service | - Accelerate version: not installed
embedding-service | - Accelerate config: not found
embedding-service | - DeepSpeed version: not installed
embedding-service | - PyTorch version (accelerator?): 2.8.0+cu128 (NA)
embedding-service | - Tensorflow version (GPU?): not installed (NA)
embedding-service | - Flax version (CPU?/GPU?/TPU?): not installed (NA)
embedding-service | - Jax version: not installed
embedding-service | - JaxLib version: not installed
embedding-service | - Using distributed or parallel set-up in script?: (I'm not sure tbh...)

Who can help?

@ArthurZucker

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. Use CPU+RAM
  2. UPD: Add at least 1000 images to data folder
  3. Git clone this repository: https://github.com/Adefey/search_dir, checkout to this commit: d8a40baa434c4b00ad04dada9d7221edb111f4aa
  4. Run: docker compose up --build
  5. Call API: POST http://localhost:8003/api/v1/start_discovery
  6. Monitor logs, depending on available system memory embedding-service may get OOM (embedding-service exited with code 137) after some batches (on my system with 14 GB ram it's 10 batches)
    But actually there are only 3 methods (__init__, _encode, encode_images) that may be relevant. Repeated calls of encode_images result in OOM. I'll show relevant methods here:
def __init__(self):
        self.model_checkpoint = "openai/clip-vit-base-patch32"
        os.system("transformers env")
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        logger.info(f"Start setting up model {self.model_checkpoint} on {self.device}")
        self.model = AutoModel.from_pretrained(self.model_checkpoint).to(self.device)
        self.processor = AutoProcessor.from_pretrained(self.model_checkpoint, use_fast=False)
        logger.info(f"Finished setting up model {self.model_checkpoint} on {self.device}")
def _encode(self, inputs: dict) -> list[float]:
        inputs = {k: v.to(self.device) for k, v in inputs.items()}

        if "pixel_values" in inputs:
            features = self.model.get_image_features(**inputs)
        else:
            features = self.model.get_text_features(**inputs)

        result = features.cpu().detach().numpy().tolist()

        del inputs
        del features
        if self.device == "cuda":
            torch.cuda.empty_cache()

        # ??????????
        # trim_memory()

        return result
def encode_images(self, images: list[bytes]) -> list[list[float]]:
        """
        Process images into embeddings
        """
        logger.info(f"Start encoding images")
        image_list = [Image.open(io.BytesIO(image)) for image in images]
        with torch.inference_mode():
            inputs = self.processor(
                images=image_list,
                return_tensors="pt",
                padding=True,
            )
            result = self._encode(inputs)
        for image in image_list:
            image.close()
        logger.info(f"Finished encoding images")
        return result

Also, there is a working fix: calling trim_memory() after each model call:

def trim_memory():
    libc = ctypes.CDLL("libc.so.6")
    return libc.malloc_trim(0)

But I think this is a workaround and transformers library should manage resources correctly on its own.

Expected behavior

Usecase: microservice with Model gets batches of 30 images to calculate embeddings, then it gets another batch. After 10 batches service is killed because of OOM. Manually monitoring memory in htop - usage increases with every batch by 600...800 MB. Expected behavior - constant memory usage for at every time during batch processing.
I suppose there is a memory leak or memory fragmentation issue where new memory keeps being allocated and not reused.

UPD: exactly same OOM issue happens with google/siglip2-base-patch16-256 and again malloc_trim workaround works

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions