You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/configuration.md
+5Lines changed: 5 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -38,6 +38,11 @@ THe following table describes the options to configure the Docling Serve app.
38
38
|`--artifacts-path`|`DOCLING_SERVE_ARTIFACTS_PATH`| unset | If set to a valid directory, the model weights will be loaded from this path |
39
39
||`DOCLING_SERVE_STATIC_PATH`| unset | If set to a valid directory, the static assets for the docs and ui will be loaded from this path |
40
40
|`--enable-ui`|`DOCLING_SERVE_ENABLE_UI`|`false`| Enable the demonstrator UI. |
41
+
||`DOCLING_SERVE_ENABLE_REMOTE_SERVICES`|`false`| Allow pipeline components making remote connections. For example, this is needed when using a vision-language model via APIs. |
42
+
||`DOCLING_SERVE_ALLOW_EXTERNAL_PLUGINS`|`false`| Allow the selection of third-party plugins. |
43
+
||`DOCLING_SERVE_MAX_DOCUMENT_TIMEOUT`|`604800` (7 days) | The maximum time for processing a document. |
44
+
||`DOCLING_SERVE_MAX_NUM_PAGES`|| The maximum number of pages for a document to be processed. |
45
+
||`DOCLING_SERVE_MAX_FILE_SIZE`|| The maximum file size for a document to be processed. |
41
46
||`DOCLING_SERVE_OPTIONS_CACHE_SIZE`|`2`| How many DocumentConveter objects (including their loaded models) to keep in the cache. |
42
47
||`DOCLING_SERVE_CORS_ORIGINS`|`["*"]`| A list of origins that should be permitted to make cross-origin requests. |
43
48
||`DOCLING_SERVE_CORS_METHODS`|`["*"]`| A list of HTTP methods that should be allowed for cross-origin requests. |
Copy file name to clipboardExpand all lines: docs/usage.md
+72-1Lines changed: 72 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,6 +8,7 @@ On top of the source of file (see below), both endpoints support the same parame
8
8
9
9
-`from_format` (List[str]): Input format(s) to convert from. Allowed values: `docx`, `pptx`, `html`, `image`, `pdf`, `asciidoc`, `md`. Defaults to all formats.
10
10
-`to_formats` (List[str]): Output format(s) to convert to. Allowed values: `md`, `json`, `html`, `text`, `doctags`. Defaults to `md`.
11
+
-`pipeline` (str). The choice of which pipeline to use. Allowed values are `standard` and `vlm`. Defaults to `standard`.
11
12
-`do_ocr` (bool): If enabled, the bitmap content will be processed using OCR. Defaults to `True`.
12
13
-`image_export_mode`: Image export mode for the document (only in case of JSON, Markdown or HTML). Allowed values: embedded, placeholder, referenced. Optional, defaults to `embedded`.
13
14
-`force_ocr` (bool): If enabled, replace any existing text with OCR-generated text over the full content. Defaults to `False`.
@@ -18,7 +19,13 @@ On top of the source of file (see below), both endpoints support the same parame
18
19
-`abort_on_error` (bool): If enabled, abort on error. Defaults to false.
19
20
-`return_as_file` (boo): If enabled, return the output as a file. Defaults to false.
20
21
-`do_table_structure` (bool): If enabled, the table structure will be extracted. Defaults to true.
21
-
-`include_images` (bool): If enabled, images will be extracted from the document. Defaults to true.
22
+
-`do_code_enrichment` (bool): If enabled, perform OCR code enrichment. Defaults to false.
23
+
-`do_formula_enrichment` (bool): If enabled, perform formula OCR, return LaTeX code. Defaults to false.
24
+
-`do_picture_classification` (bool): If enabled, classify pictures in documents. Defaults to false.
25
+
-`do_picture_description` (bool): If enabled, describe pictures in documents. Defaults to false.
26
+
-`picture_description_local` (dict): Options for running a local vision-language model in the picture description. The parameters refer to a model hosted on Hugging Face. This parameter is mutually exclusive with picture_description_api.
27
+
-`picture_description_api` (dict): API details for using a vision-language model in the picture description. This parameter is mutually exclusive with picture_description_local.
28
+
-`include_images` (bool): If enabled, images will be extracted from the document. Defaults to false.
22
29
-`images_scale` (float): Scale factor for images. Defaults to 2.0.
23
30
24
31
## Convert endpoints
@@ -244,6 +251,70 @@ data = response.json()
244
251
245
252
</details>
246
253
254
+
### Picture description options
255
+
256
+
When the picture description enrichment is activated, users may specify which model and which execution mode to use for this task. There are two choices for the execution mode: _local_ will run the vision-language model directly, _api_ will invoke an external API endpoint.
257
+
258
+
The local option is specified with:
259
+
260
+
```jsonc
261
+
{
262
+
"picture_description_local": {
263
+
"repo_id":"", // Repository id from the Hugging Face Hub.
"prompt":"Describe this image in a few sentences. ", // Prompt used when calling the vision-language model.
266
+
}
267
+
}
268
+
```
269
+
270
+
The possible values for `generation_config` are documented in the [Hugging Face text generation docs](https://huggingface.co/docs/transformers/en/main_classes/text_generation#transformers.GenerationConfig).
271
+
272
+
The api option is specified with:
273
+
274
+
```jsonc
275
+
{
276
+
"picture_description_api": {
277
+
"url":"", // Endpoint which accepts openai-api compatible requests.
278
+
"headers": {}, // Headers used for calling the API endpoint. For example, it could include authentication headers.
279
+
"params": {}, // Model parameters.
280
+
"timeout":20, // Timeout for the API request.
281
+
"prompt":"Describe this image in a few sentences. ", // Prompt used when calling the vision-language model.
282
+
}
283
+
}
284
+
```
285
+
286
+
Example URLs are:
287
+
288
+
-`http://localhost:8000/v1/chat/completions` for the local vllm api, with example `params`:
289
+
- the `HuggingFaceTB/SmolVLM-256M-Instruct` model
290
+
291
+
```json
292
+
{
293
+
"model": "HuggingFaceTB/SmolVLM-256M-Instruct",
294
+
"max_completion_tokens": 200,
295
+
}
296
+
```
297
+
298
+
- the `ibm-granite/granite-vision-3.2-2b` model
299
+
300
+
```json
301
+
{
302
+
"model": "ibm-granite/granite-vision-3.2-2b",
303
+
"max_completion_tokens": 200,
304
+
}
305
+
```
306
+
307
+
- `http://localhost:11434/v1/chat/completions` for the local ollama api, with example `params`:
308
+
- the `granite3.2-vision:2b` model
309
+
310
+
```json
311
+
{
312
+
"model": "granite3.2-vision:2b"
313
+
}
314
+
```
315
+
316
+
Note that when using `picture_description_api`, the server must be launched with `DOCLING_SERVE_ENABLE_REMOTE_SERVICES=true`.
0 commit comments