docs: Example and instructions on how to load model weights to persistent volume (#197)

vku-ibm · web-flow · commit 3f090b7d15ea · 2025-05-21T13:04:46.000+02:00
Signed-off-by: Viktor Kuropiatnyk &lt;vku@zurich.ibm.com&gt;
diff --git a/README.md b/README.md
@@ -70,7 +70,7 @@ An easy to use UI is available at the `/ui` endpoint.
 
 ## Documentation and advance usages
 
-Visit the [Docling Serve documentation](./docs/README.md) for learning how to [configure the webserver](./docs/configuration.md), use all the [runtime options](./docs/usage.md) of the API and [deployment examples](./docs/deployment.md).
+Visit the [Docling Serve documentation](./docs/README.md) for learning how to [configure the webserver](./docs/configuration.md), use all the [runtime options](./docs/usage.md) of the API and [deployment examples](./docs/deployment.md), pre-load model weights into a persistent volume [model weights on persistent volume](./docs/pre-loading-models.md)
 
 ## Get help and support
 
diff --git a/docs/deploy-examples/docling-model-cache-deployment.yaml b/docs/deploy-examples/docling-model-cache-deployment.yaml
@@ -0,0 +1,47 @@
+kind: Deployment
+apiVersion: apps/v1
+metadata:
+  name: docling-serve
+  labels:
+    app: docling-serve
+    component: docling-serve-api
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: docling-serve
+      component: docling-serve-api
+  template:
+    metadata:
+      labels:
+        app: docling-serve
+        component: docling-serve-api
+    spec:
+      restartPolicy: Always
+      containers:
+        - name: api
+          resources:
+            limits:
+              cpu: 500m
+              memory: 2Gi
+            requests:
+              cpu: 250m
+              memory: 1Gi
+          env:
+            - name: DOCLING_SERVE_ENABLE_UI
+              value: 'true'
+            - name: DOCLING_SERVE_ARTIFACTS_PATH
+              value: '/modelcache'
+          ports:
+            - name: http
+              containerPort: 5001
+              protocol: TCP
+          imagePullPolicy: Always
+          image: 'ghcr.io/docling-project/docling-serve-cpu'
+          volumeMounts:
+            - name: docling-model-cache
+              mountPath: /modelcache
+      volumes:
+        - name: docling-model-cache
+          persistentVolumeClaim:
+            claimName: docling-model-cache-pvc
diff --git a/docs/deploy-examples/docling-model-cache-job.yaml b/docs/deploy-examples/docling-model-cache-job.yaml
@@ -0,0 +1,33 @@
+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: docling-model-cache-load
+spec:
+  selector: {}
+  template:
+    metadata:
+      name: docling-model-load
+    spec:
+      containers:
+        - name: loader
+          image: ghcr.io/docling-project/docling-serve-cpu:main
+          command:
+            - docling-tools
+            - models
+            - download
+            - '--output-dir=/modelcache'
+            - 'layout'
+            - 'tableformer'
+            - 'code_formula'
+            - 'picture_classifier'
+            - 'smolvlm'
+            - 'granite_vision'
+            - 'easyocr'
+          volumeMounts:
+            - name: docling-model-cache
+              mountPath: /modelcache
+      volumes:
+        - name: docling-model-cache
+          persistentVolumeClaim:
+            claimName: docling-model-cache-pvc
+      restartPolicy: Never
diff --git a/docs/deploy-examples/docling-model-cache-pvc.yaml b/docs/deploy-examples/docling-model-cache-pvc.yaml
@@ -0,0 +1,11 @@
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: docling-model-cache-pvc
+spec:
+  accessModes:
+    - ReadWriteOnce
+  volumeMode: Filesystem
+  resources:
+    requests:
+      storage: 10Gi
diff --git a/docs/pre-loading-models.md b/docs/pre-loading-models.md
@@ -0,0 +1,103 @@
+# Pre-loading models for docling
+
+This document provides examples for pre-loading docling models to a persistent volume and re-using it for docling-serve deployments.
+
+1. We need to create a persistent volume that will store models weights:
+
+    ```yaml
+    apiVersion: v1
+    kind: PersistentVolumeClaim
+    metadata:
+      name: docling-model-cache-pvc
+    spec:
+      accessModes:
+        - ReadWriteOnce
+      volumeMode: Filesystem
+      resources:
+        requests:
+          storage: 10Gi
+    ```
+
+    If you don't want to use default storage class, set your custom storage class with following:
+
+    ```yaml
+    spec:
+      ...
+      storageClassName: <Storage Class Name>
+    ```
+
+    Manifest example: [docling-model-cache-pvc.yaml](./deploy-examples/docling-model-cache-pvc.yaml)
+
+2. In order to load model weights, we can use docling-toolkit to download them, as this is a one time operation we can use kubernetes job for this:
+
+    ```yaml
+    apiVersion: batch/v1
+    kind: Job
+    metadata:
+      name: docling-model-cache-load
+    spec:
+      selector: {}
+      template:
+        metadata:
+          name: docling-model-load
+        spec:
+          containers:
+            - name: loader
+              image: ghcr.io/docling-project/docling-serve-cpu:main
+              command:
+                - docling-tools
+                - models
+                - download
+                - '--output-dir=/modelcache'
+                - 'layout'
+                - 'tableformer'
+                - 'code_formula'
+                - 'picture_classifier'
+                - 'smolvlm'
+                - 'granite_vision'
+                - 'easyocr'
+              volumeMounts:
+                - name: docling-model-cache
+                  mountPath: /modelcache
+          volumes:
+            - name: docling-model-cache
+              persistentVolumeClaim:
+                claimName: docling-model-cache-pvc
+          restartPolicy: Never
+    ```
+
+    The job will mount previously created persistent volume and execute command similar to how we would load models locally:
+    `docling-tools models download --output-dir <MOUNT-PATH> [LIST_OF_MODELS]`
+
+    In manifest, we specify desired models individually, or we can use `--all` parameter to download all models.
+
+    Manifest example: [docling-model-cache-job.yaml](./deploy-examples/docling-model-cache-job.yaml)
+
+3. Now we can mount volume in the docling-serve deployment and set env `DOCLING_SERVE_ARTIFACTS_PATH` to point to it.
+    Following additions to deploymeny should be made:
+
+    ```yaml
+    spec:
+      template:
+        spec:
+          containers:
+            - name: api
+              env:
+              ...
+                - name: DOCLING_SERVE_ARTIFACTS_PATH
+                  value: '/modelcache'
+              volumeMounts:
+                - name: docling-model-cache
+                  mountPath: /modelcache
+          ...
+          volumes:
+            - name: docling-model-cache
+              persistentVolumeClaim:
+                claimName: docling-model-cache-pvc
+    ```
+
+    Make sure that value of `DOCLING_SERVE_ARTIFACTS_PATH` is the same as where models were downloaded and where volume is mounted.
+
+    Now when docling-serve is executing tasks, the underlying docling installation will load model weights from mouted volume.
+
+    Manifest example: [docling-model-cache-deployment.yaml](./deploy-examples/docling-model-cache-deployment.yaml)