Skip to content

Is it possible for Docling to recognise Header depth through controls in Docling Serve? #294

@JonasPapinigis

Description

@JonasPapinigis

Currently in the progress of developing a custom chunker for OpenWebUI, a frontend which utilises Docling Serve. I've not been happy with the way OpenWebUI does chunking so I've written my langchain script.

Problem is, OWUI uses Docling Serve to ingest documents, and I'm finding that all headers are recognised as being of level 2 (##) which makes it extremely difficult to gather useful metadata, and is something I can't seem to find a way to fix after testing DS independantly.

Furthermore, headings that are multi-line are recognised seperately (an issue I fixed by fusing them together after the fact). It would also be nice to fix this too at digestion.

I'm really eager to not alter source code in order to not have to patch Docling Serve manually in production, but can do need-be.

Image

Appreciate any help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions