You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently in the progress of developing a custom chunker for OpenWebUI, a frontend which utilises Docling Serve. I've not been happy with the way OpenWebUI does chunking so I've written my langchain script.
Problem is, OWUI uses Docling Serve to ingest documents, and I'm finding that all headers are recognised as being of level 2 (##) which makes it extremely difficult to gather useful metadata, and is something I can't seem to find a way to fix after testing DS independantly.
Furthermore, headings that are multi-line are recognised seperately (an issue I fixed by fusing them together after the fact). It would also be nice to fix this too at digestion.
I'm really eager to not alter source code in order to not have to patch Docling Serve manually in production, but can do need-be.