|
| 1 | +# Pre-loading models for docling |
| 2 | + |
| 3 | +This document provides examples for pre-loading docling models to a persistent volume and re-using it for docling-serve deployments. |
| 4 | + |
| 5 | +1. We need to create a persistent volume that will store models weights: |
| 6 | + |
| 7 | + ```yaml |
| 8 | + apiVersion: v1 |
| 9 | + kind: PersistentVolumeClaim |
| 10 | + metadata: |
| 11 | + name: docling-model-cache-pvc |
| 12 | + spec: |
| 13 | + accessModes: |
| 14 | + - ReadWriteOnce |
| 15 | + volumeMode: Filesystem |
| 16 | + resources: |
| 17 | + requests: |
| 18 | + storage: 10Gi |
| 19 | + ``` |
| 20 | +
|
| 21 | + If you don't want to use default storage class, set your custom storage class with following: |
| 22 | +
|
| 23 | + ```yaml |
| 24 | + spec: |
| 25 | + ... |
| 26 | + storageClassName: <Storage Class Name> |
| 27 | + ``` |
| 28 | +
|
| 29 | + Manifest example: [docling-model-cache-pvc.yaml](./deploy-examples/docling-model-cache-pvc.yaml) |
| 30 | +
|
| 31 | +2. In order to load model weights, we can use docling-toolkit to download them, as this is a one time operation we can use kubernetes job for this: |
| 32 | +
|
| 33 | + ```yaml |
| 34 | + apiVersion: batch/v1 |
| 35 | + kind: Job |
| 36 | + metadata: |
| 37 | + name: docling-model-cache-load |
| 38 | + spec: |
| 39 | + selector: {} |
| 40 | + template: |
| 41 | + metadata: |
| 42 | + name: docling-model-load |
| 43 | + spec: |
| 44 | + containers: |
| 45 | + - name: loader |
| 46 | + image: ghcr.io/docling-project/docling-serve-cpu:main |
| 47 | + command: |
| 48 | + - docling-tools |
| 49 | + - models |
| 50 | + - download |
| 51 | + - '--output-dir=/modelcache' |
| 52 | + - 'layout' |
| 53 | + - 'tableformer' |
| 54 | + - 'code_formula' |
| 55 | + - 'picture_classifier' |
| 56 | + - 'smolvlm' |
| 57 | + - 'granite_vision' |
| 58 | + - 'easyocr' |
| 59 | + volumeMounts: |
| 60 | + - name: docling-model-cache |
| 61 | + mountPath: /modelcache |
| 62 | + volumes: |
| 63 | + - name: docling-model-cache |
| 64 | + persistentVolumeClaim: |
| 65 | + claimName: docling-model-cache-pvc |
| 66 | + restartPolicy: Never |
| 67 | + ``` |
| 68 | +
|
| 69 | + The job will mount previously created persistent volume and execute command similar to how we would load models locally: |
| 70 | + `docling-tools models download --output-dir <MOUNT-PATH> [LIST_OF_MODELS]` |
| 71 | + |
| 72 | + In manifest, we specify desired models individually, or we can use `--all` parameter to download all models. |
| 73 | + |
| 74 | + Manifest example: [docling-model-cache-job.yaml](./deploy-examples/docling-model-cache-job.yaml) |
| 75 | + |
| 76 | +3. Now we can mount volume in the docling-serve deployment and set env `DOCLING_SERVE_ARTIFACTS_PATH` to point to it. |
| 77 | + Following additions to deploymeny should be made: |
| 78 | + |
| 79 | + ```yaml |
| 80 | + spec: |
| 81 | + template: |
| 82 | + spec: |
| 83 | + containers: |
| 84 | + - name: api |
| 85 | + env: |
| 86 | + ... |
| 87 | + - name: DOCLING_SERVE_ARTIFACTS_PATH |
| 88 | + value: '/modelcache' |
| 89 | + volumeMounts: |
| 90 | + - name: docling-model-cache |
| 91 | + mountPath: /modelcache |
| 92 | + ... |
| 93 | + volumes: |
| 94 | + - name: docling-model-cache |
| 95 | + persistentVolumeClaim: |
| 96 | + claimName: docling-model-cache-pvc |
| 97 | + ``` |
| 98 | + |
| 99 | + Make sure that value of `DOCLING_SERVE_ARTIFACTS_PATH` is the same as where models were downloaded and where volume is mounted. |
| 100 | + |
| 101 | + Now when docling-serve is executing tasks, the underlying docling installation will load model weights from mouted volume. |
| 102 | + |
| 103 | + Manifest example: [docling-model-cache-deployment.yaml](./deploy-examples/docling-model-cache-deployment.yaml) |
0 commit comments