v1.4.9.4
Important Notes: Eliminate Bottlenecks in Processing Large-scale Datasets
In production deployments, entity and relation metadata can grow unbounded as documents are continuously ingested. The source_id (chunk IDs) and file_path fields in entities and relations can accumulate thousands of entries, leading to:
- Performance degradation in vector database operations
 - Increased storage costs
 - Memory pressure during query operations
 - Slower merge operations when processing new documents
 
LightRAG implements a configurable metadata size control system with two key features:
- Source ID limiting: Controls the maximum number of chunk IDs stored per entity/relation
 - File path limiting: Controls the maximum number of file paths displayed in metadata (display-only, doesn't affect query performance)
 
Both features support two strategies:
- FIFO (First In First Out): Removes oldest entries when limit is reached. Best for evolving knowledge bases, keeps most recent information.
 - KEEP: Keeps oldest entries, skips new ones when limit is reached. Best for stable knowledge bases, faster (fewer merge operations)
 
New environment variables with default values:
# Source ID limits (affects query performance)
MAX_SOURCE_IDS_PER_ENTITY=300
MAX_SOURCE_IDS_PER_RELATION=300
SOURCE_IDS_LIMIT_METHOD=FIFO
# File path limits (display only)
MAX_FILE_PATHS=100
Auto Data Migration
Upgrading to this version requires data migration. If your current system contains a large number of entity relationships, the upgrade process may take an extended period of time.
What's New
- Feat: Add offline Docker build support with embedded models and cache by @danielaskdd in #2222
 - Refact: Limit Vector Database Metadata Size to Support Large Scale Dataset by @danielaskdd in #2240
 - Feat: Add Optional LLM Cache Deletion for Document Deletion by @danielaskdd in #2244
 - Refact: Add Entity Identifier Length Truncation to Prevent Storage Failures by @danielaskdd in #2245
 - Refact: Add Multimodal Processing Status Support to DocProcessingStatus for RayAnything Compatibility by @danielaskdd in #2248
 
What's Changed
- Refact: Improve query result with semantic null returns by @danielaskdd in #2218
 - remove deprecated dotenv package. by @wkpark in #2229
 - Refact: Frontend UI Fixes and Performance Improvements by @danielaskdd in #2234
 - Security: Fix SQL injection vulnerabilities in PostgreSQL storage by @lucky-verma in #2235
 - Update openai requirement from <2.0.0,>=1.0.0 to >=1.0.0,<3.0.0 by @dependabot[bot] in #2238
 - Update pandas requirement from <2.3.0,>=2.0.0 to >=2.0.0,<2.4.0 by @dependabot[bot] in #2239
 - Optimize PostgreSQL initialization performance by @yrangana in #2237
 - fix(docs): correct typo "acivate" → "activate" by @xiaojunxiang2023 in #2243
 
New Contributors
- @wkpark made their first contribution in #2229
 - @lucky-verma made their first contribution in #2235
 - @dependabot[bot] made their first contribution in #2238
 - @xiaojunxiang2023 made their first contribution in #2243
 
Full Changelog: v1.4.9.3...v1.4.9.4