LlamaIndex vulnerable to data loss through hash collisions in its DocugamiReader class

A vulnerability in the DocugamiReader class of the run-llama/llama_index repository, up to but excluding version 0.12.41, involves the use of MD5 hashing to generate IDs for document chunks. This approach leads to hash collisions when structurally distinct chunks contain identical text, resulting in one chunk overwriting another. This can cause loss of semantically or legally important document content, breakage of parent-child chunk hierarchies, and inaccurate or hallucinated responses in AI outputs. The issue is resolved in version 0.3.1.

References

Published by the National Vulnerability Database Jul 10, 2025

Published to the GitHub Advisory Database Jul 10, 2025

Reviewed Jul 10, 2025

Last updated Jul 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Package

Affected versions

Patched versions

Description

References

Severity

CVSS overall score

CVSS v3 base metrics

CVSS v3 base metrics

EPSS score

Exploit Prediction Scoring System (EPSS)

Weaknesses

Expected Behavior Violation

CVE ID

GHSA ID

Source code

Uh oh!