-
Notifications
You must be signed in to change notification settings - Fork 18.4k
fix: prevent memory leak in get_usage_metadata_callback #32366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: prevent memory leak in get_usage_metadata_callback #32366
Conversation
… chains This resolves issue langchain-ai#28848 where calling bind_tools() on a RunnableSequence created by with_structured_output() would fail with AttributeError. The fix enables the combination of structured output and tool binding, which is essential for modern AI applications that need both: - Structured JSON output formatting - External function calling capabilities **Changes:** - Added bind_tools() method to RunnableSequence class - Method intelligently detects structured output patterns - Delegates tool binding to the underlying ChatModel - Preserves existing sequence structure and behavior - Added comprehensive unit tests **Technical Details:** - Detects 2-step sequences (Model < /dev/null | Parser) from with_structured_output() - Binds tools to the first step if it supports bind_tools() - Returns new RunnableSequence with updated model + same parser - Falls back gracefully with helpful error messages **Impact:** This enables previously impossible workflows like ChatGPT-style apps that need both structured UI responses and tool calling capabilities. Fixes langchain-ai#28848 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Remove quoted type annotations - Fix line length violations - Remove trailing whitespace - Use double quotes consistently - Improve error message formatting for better readability The S110 warnings about try-except-pass are intentional - we want silent fallback behavior before raising the final helpful error.
…ain-ai#32169) ## **Description:** This PR updates the internal documentation link for the RAG tutorials to reflect the updated path. Previously, the link pointed to the root `/docs/tutorials/`, which was generic. It now correctly routes to the RAG-specific tutorial page for the following text-embedding models. 1. DatabricksEmbeddings 2. IBM watsonx.ai 3. OpenAIEmbeddings 4. NomicEmbeddings 5. CohereEmbeddings 6. MistralAIEmbeddings 7. FireworksEmbeddings 8. TogetherEmbeddings 9. LindormAIEmbeddings 10. ModelScopeEmbeddings 11. ClovaXEmbeddings 12. NetmindEmbeddings 13. SambaNovaCloudEmbeddings 14. SambaStudioEmbeddings 15. ZhipuAIEmbeddings ## **Issue:** N/A ## **Dependencies:** None ## **Twitter handle:** N/A
- Replace broad Exception catching with specific exceptions (AttributeError, TypeError, ValueError) - Add proper type annotations to test functions and variables - Add type: ignore comments for dynamic method assignment in tests - Fix line length violations and formatting issues - Ensure all MyPy checks pass All lint checks now pass successfully. The S110 warnings are resolved by using more specific exception handling instead of bare try-except-pass.
getting the latest changes
- Remove test_bind_tools_fix.py - Remove test_real_example.py - Remove test_sequence_bind_tools.py These test files were created during development but should not be in the root directory. The actual fix for issue langchain-ai#28848 (RunnableSequence.bind_tools) is already implemented in core.
pulling from the updated branch
- Add fallback mechanism in _create_chat_result to handle cases where OpenAI client's model_dump() returns choices as None even when the original response object contains valid choices data - This resolves TypeError: 'Received response with null value for choices' when using vLLM with LangChain-OpenAI integration - Add comprehensive test suite to validate the fix and edge cases - Maintain backward compatibility for cases where choices are truly unavailable - Fix addresses GitHub issue langchain-ai#32252 The issue occurred because some OpenAI-compatible APIs like vLLM return valid response objects, but the OpenAI client library's model_dump() method sometimes fails to properly serialize the choices field, returning None instead of the actual choices array. This fix attempts to access the choices directly from the response object when model_dump() fails.
- Add fallback mechanism in _create_chat_result to handle cases where OpenAI client's model_dump() returns choices as None even when the original response object contains valid choices data - This resolves TypeError: 'Received response with null value for choices' when using vLLM with LangChain-OpenAI integration - Add comprehensive test suite to validate the fix and edge cases - Maintain backward compatibility for cases where choices are truly unavailable - Fix addresses GitHub issue langchain-ai#32252 The issue occurred because some OpenAI-compatible APIs like vLLM return valid response objects, but the OpenAI client library's model_dump() method sometimes fails to properly serialize the choices field, returning None instead of the actual choices array. This fix attempts to access the choices directly from the response object when model_dump() fails.
fix(openai): resolve vLLM compatibility issue with ChatOpenAI (langchain-ai#32252) More details can be read on this thread.
See https://docs.astral.sh/ruff/rules/#flake8-unused-arguments-arg Co-authored-by: Mason Daugherty <[email protected]>
…hain-ai#32256) - **Description:** This PR updates the internal documentation link for the RAG tutorials to reflect the updated path. Previously, the link pointed to the root `/docs/tutorials/`, which was generic. It now correctly routes to the RAG-specific tutorial page. - **Issue:** N/A - **Dependencies:** None - **Twitter handle:** N/A
…angchain-ai#32261) Following existing codebase conventions
Ensures proper reStructuredText formatting by adding the required blank line before closing docstring quotes, which resolves the "Block quote ends without a blank line; unexpected unindent" warning.
…32266) should resolve the file sharing issue for users on macOS.
…tion to ChatGeneration objects (langchain-ai#32156) ## Problem ChatLiteLLM encounters a `ValidationError` when using cache on subsequent calls, causing the following error: ``` ValidationError(model='ChatResult', errors=[{'loc': ('generations', 0, 'type'), 'msg': "unexpected value; permitted: 'ChatGeneration'", 'type': 'value_error.const', 'ctx': {'given': 'Generation', 'permitted': ('ChatGeneration',)}}]) ``` This occurs because: 1. The cache stores `Generation` objects (with `type="Generation"`) 2. But `ChatResult` expects `ChatGeneration` objects (with `type="ChatGeneration"` and a required `message` field) 3. When cached values are retrieved, validation fails due to the type mismatch ## Solution Added graceful handling in both sync (`_generate_with_cache`) and async (`_agenerate_with_cache`) cache methods to: 1. **Detect** when cached values contain `Generation` objects instead of expected `ChatGeneration` objects 2. **Convert** them to `ChatGeneration` objects by wrapping the text content in an `AIMessage` 3. **Preserve** all original metadata (`generation_info`) 4. **Allow** `ChatResult` creation to succeed without validation errors ## Example ```python # Before: This would fail with ValidationError from langchain_community.chat_models import ChatLiteLLM from langchain_community.cache import SQLiteCache from langchain.globals import set_llm_cache set_llm_cache(SQLiteCache(database_path="cache.db")) llm = ChatLiteLLM(model_name="openai/gpt-4o", cache=True, temperature=0) print(llm.predict("test")) # Works fine (cache empty) print(llm.predict("test")) # Now works instead of ValidationError # After: Seamlessly handles both Generation and ChatGeneration objects ``` ## Changes - **`libs/core/langchain_core/language_models/chat_models.py`**: - Added `Generation` import from `langchain_core.outputs` - Enhanced cache retrieval logic in `_generate_with_cache` and `_agenerate_with_cache` methods - Added conversion from `Generation` to `ChatGeneration` objects when needed - **`libs/core/tests/unit_tests/language_models/chat_models/test_cache.py`**: - Added test case to validate the conversion logic handles mixed object types ## Impact - **Backward Compatible**: Existing code continues to work unchanged - **Minimal Change**: Only affects cache retrieval path, no API changes - **Robust**: Handles both legacy cached `Generation` objects and new `ChatGeneration` objects - **Preserves Data**: All original content and metadata is maintained during conversion Fixes langchain-ai#22389. <!-- START COPILOT CODING AGENT TIPS --> --- 💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs. --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: mdrxy <[email protected]> Co-authored-by: Mason Daugherty <[email protected]> Co-authored-by: Mason Daugherty <[email protected]> Co-authored-by: Copilot <[email protected]>
…ngchain-ai#32160) Fixes a streaming bug where models like Qwen3 (using OpenAI interface) send tool call chunks with inconsistent indices, resulting in duplicate/erroneous tool calls instead of a single merged tool call. ## Problem When Qwen3 streams tool calls, it sends chunks with inconsistent `index` values: - First chunk: `index=1` with tool name and partial arguments - Subsequent chunks: `index=0` with `name=None`, `id=None` and argument continuation The existing `merge_lists` function only merges chunks when their `index` values match exactly, causing these logically related chunks to remain separate, resulting in multiple incomplete tool calls instead of one complete tool call. ```python # Before fix: Results in 1 valid + 1 invalid tool call chunk1 = AIMessageChunk(tool_call_chunks=[ {"name": "search", "args": '{"query":', "id": "call_123", "index": 1} ]) chunk2 = AIMessageChunk(tool_call_chunks=[ {"name": None, "args": ' "test"}', "id": None, "index": 0} ]) merged = chunk1 + chunk2 # Creates 2 separate tool calls # After fix: Results in 1 complete tool call merged = chunk1 + chunk2 # Creates 1 merged tool call: search({"query": "test"}) ``` ## Solution Enhanced the `merge_lists` function in `langchain_core/utils/_merge.py` with intelligent tool call chunk merging: 1. **Preserves existing behavior**: Same-index chunks still merge as before 2. **Adds special handling**: Tool call chunks with `name=None`/`id=None` that don't match any existing index are now merged with the most recent complete tool call chunk 3. **Maintains backward compatibility**: All existing functionality works unchanged 4. **Targeted fix**: Only affects tool call chunks, doesn't change behavior for other list items The fix specifically handles the pattern where: - A continuation chunk has `name=None` and `id=None` (indicating it's part of an ongoing tool call) - No matching index is found in existing chunks - There exists a recent tool call chunk with a valid name or ID to merge with ## Testing Added comprehensive test coverage including: - ✅ Qwen3-style chunks with different indices now merge correctly - ✅ Existing same-index behavior preserved - ✅ Multiple distinct tool calls remain separate - ✅ Edge cases handled (empty chunks, orphaned continuations) - ✅ Backward compatibility maintained Fixes langchain-ai#31511. <!-- START COPILOT CODING AGENT TIPS --> --- 💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click [here](https://survey.alchemer.com/s3/8343779/Copilot-Coding-agent) to start the survey. --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: mdrxy <[email protected]> Co-authored-by: Mason Daugherty <[email protected]> Co-authored-by: Mason Daugherty <[email protected]>
In the section [How to load documents from a directory](https://python.langchain.com/docs/how_to/document_loader_directory/) there is a link to the docs of *unstructured*. When you click this link, it tells you that it has moved. Accordingly this PR fixes this link in LangChain docs directly from: `https://unstructured-io.github.io/unstructured/#` to: `https://docs.unstructured.io/`
…ices from Qwen3" (langchain-ai#32307) Reverts langchain-ai#32160 Original issue stems from using `ChatOpenAI` to interact with a `qwen` model. Recommended to use [langchain-qwq](https://python.langchain.com/docs/integrations/chat/qwq/) which is built for Qwen
Fix memory leak where _configure_hooks global variable accumulated entries with each call to get_usage_metadata_callback(). The issue was that register_configure_hook() was called every time the context manager was entered, causing the _configure_hooks list to grow indefinitely in long-running applications. Changes: - Move ContextVar declaration to module level - Register hook only once at module import time - Use proper token-based context variable management - Add test to verify no memory leak occurs - Maintain backward compatibility with 'name' parameter Fixes langchain-ai#32300
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
CodSpeed WallTime Performance ReportMerging #32366 will not alter performanceComparing
|
Use lazy initialization pattern with global flag to register hook only once while maintaining proper import placement for linting requirements. This preserves the memory leak fix while satisfying CI/lint requirements.
c76103c
to
3a8f7c1
Compare
CodSpeed Instrumentation Performance ReportMerging #32366 will not alter performanceComparing Summary
|
Avoid duplicate hook registration by checking if a ContextVar with the same name is already registered in _configure_hooks. This prevents the memory leak while maintaining the original API and import patterns. Changes: - Check _configure_hooks before registering new hooks - Maintain original function signature and behavior - Use proper token-based context variable management - Keep imports inside function to satisfy linting
41b68e5
to
82cf06e
Compare
Use a simple module-level cache (_registered_context_vars) to store and reuse ContextVar instances by name. This prevents the memory leak in _configure_hooks while maintaining the original API. Changes: - Add _registered_context_vars cache dict at module level - Reuse existing ContextVar instances instead of creating new ones - Only register hooks once per unique name - Update test to verify cache behavior instead of internal hooks - Maintain full backward compatibility
d7f3a1c
to
86a035b
Compare
Make test more robust by: - Using unique test name to avoid conflicts with other tests - Cleaning up test data before and after - More focused assertions - Proper test isolation
Remove complex type annotations that may be causing linting issues: - Use dict[str, Any] instead of complex ContextVar generic - Remove inline type annotation for variable - Maintain functionality while fixing lint issues
Closing this PR to work on a different approach to the memory leak issue. The technical solution is correct but getting it to pass all CI/CD checks requires more iteration than available time allows. |
Summary
Fix memory leak where
_configure_hooks
global variable accumulated entries with each call toget_usage_metadata_callback()
.register_configure_hook()
was called every time the context manager was entered, causing the_configure_hooks
list to grow indefinitely in long-running applicationsContextVar
declaration to module level and register hook only once at module import timeChanges Made
_usage_metadata_callback_var
as a module-level variableregister_configure_hook()
only once at module import timetry/finally
name
parameter for API compatibility_configure_hooks
doesn't grow with repeated callsTest Plan
test_no_configure_hooks_memory_leak()
that verifies_configure_hooks
length remains constantBefore/After
Before (Memory Leak)
After (Fixed)
Fixes #32300