Skip to content

Conversation

bdraco
Copy link
Member

@bdraco bdraco commented May 28, 2025

Summary

This PR fixes a memory leak in CookieJar.filter_cookies() that was causing unbounded memory growth when making requests to different URL paths. The issue was introduced in version 3.10.0 and could cause memory usage to grow from ~100MB to ~5GB over weeks of operation.

The Problem

The filter_cookies() method was inadvertently creating empty cookie entries for every domain-path combination it checked. This happened because:

  1. The method generates all possible domain-path combinations for a request URL
  2. It then accesses self._cookies[p] for each combination
  3. Since self._cookies is a defaultdict(SimpleCookie), accessing a non-existent key creates an empty SimpleCookie object
  4. These empty objects accumulate over time, causing the memory leak

The Fix

Added a simple check before accessing the dictionary:

if p not in self._cookies:
    continue

This prevents the creation of empty cookie entries while maintaining all existing functionality.

Testing

  • Added regression test test_filter_cookies_does_not_leak_memory() that verifies the internal storage doesn't grow when filtering cookies for different paths
  • All existing cookie tests pass
  • Manual testing confirms the memory leak is resolved

Impact

This PR fixes a memory leak in `CookieJar.filter_cookies()` that was causing unbounded memory growth when making requests to different URL paths. The issue was introduced in version 3.10.0 and could cause memory usage to grow from ~100MB to ~5GB over weeks of operation.

The `filter_cookies()` method was inadvertently creating empty cookie entries for every domain-path combination it checked. This happened because:

1. The method generates all possible domain-path combinations for a request URL
2. It then accesses `self._cookies[p]` for each combination
3. Since `self._cookies` is a `defaultdict(SimpleCookie)`, accessing a non-existent key creates an empty `SimpleCookie` object
4. These empty objects accumulate over time, causing the memory leak

Added a simple check before accessing the dictionary:

```python
if p not in self._cookies:
    continue
```

This prevents the creation of empty cookie entries while maintaining all existing functionality.

- Added regression test `test_filter_cookies_does_not_leak_memory()` that verifies the internal storage doesn't grow when filtering cookies for different paths
- All existing cookie tests pass
- Manual testing confirms the memory leak is resolved

- Fixes memory leak issue #11052
- No API changes
- No behavioral changes for existing code
- Minimal performance impact (one additional dictionary lookup per domain-path combination)
@bdraco bdraco requested a review from asvetlov as a code owner May 28, 2025 14:55
@bdraco bdraco added backport-3.12 backport-3.13 Trigger automatic backporting to the 3.13 release branch by Patchback robot labels May 28, 2025
@bdraco bdraco requested a review from webknjaz as a code owner May 28, 2025 15:00
@psf-chronographer psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label May 28, 2025
Copy link

codecov bot commented May 28, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.81%. Comparing base (7f69167) to head (6c5bdb5).
⚠️ Report is 338 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #11054      +/-   ##
==========================================
+ Coverage   98.79%   98.81%   +0.02%     
==========================================
  Files         129      129              
  Lines       41367    41391      +24     
  Branches     2226     2231       +5     
==========================================
+ Hits        40869    40902      +33     
+ Misses        345      339       -6     
+ Partials      153      150       -3     
Flag Coverage Δ
CI-GHA 98.70% <100.00%> (+0.01%) ⬆️
OS-Linux 98.42% <100.00%> (+0.02%) ⬆️
OS-Windows 96.68% <100.00%> (+0.04%) ⬆️
OS-macOS 97.61% <100.00%> (+<0.01%) ⬆️
Py-3.10.11 97.51% <100.00%> (+1.21%) ⬆️
Py-3.10.17 98.01% <100.00%> (-0.01%) ⬇️
Py-3.11.12 98.04% <100.00%> (+0.15%) ⬆️
Py-3.11.9 97.58% <100.00%> (+1.21%) ⬆️
Py-3.12.10 98.48% <100.00%> (?)
Py-3.13.3 98.46% <100.00%> (+0.24%) ⬆️
Py-3.9.13 97.38% <100.00%> (+<0.01%) ⬆️
Py-3.9.22 97.89% <100.00%> (+<0.01%) ⬆️
Py-pypy7.3.16 97.48% <100.00%> (?)
VM-macos 97.61% <100.00%> (+<0.01%) ⬆️
VM-ubuntu 98.42% <100.00%> (+0.02%) ⬆️
VM-windows 96.68% <100.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

codspeed-hq bot commented May 28, 2025

CodSpeed Performance Report

Merging #11054 will not alter performance

Comparing filter_cookies_leak (6c5bdb5) with master (7f69167)

Summary

✅ 59 untouched benchmarks

@bdraco
Copy link
Member Author

bdraco commented May 28, 2025

I think _morsel_cache might be leaking as well

@bdraco
Copy link
Member Author

bdraco commented May 28, 2025

Actually probably not since its already guarded and synchronized on _cookies

@bdraco
Copy link
Member Author

bdraco commented May 28, 2025

Yeah its ok

@bdraco
Copy link
Member Author

bdraco commented May 28, 2025

Once we fix the leak here it solves the cache as well

@bdraco bdraco mentioned this pull request May 28, 2025
@bdraco bdraco enabled auto-merge (squash) May 28, 2025 20:19
@bdraco bdraco merged commit e2eb195 into master May 28, 2025
39 checks passed
@bdraco bdraco deleted the filter_cookies_leak branch May 28, 2025 20:48
Copy link
Contributor

patchback bot commented May 28, 2025

Backport to 3.11: 💚 backport PR created

✅ Backport PR branch: patchback/backports/3.11/e2eb1959237c8bc46a3f0e79a77c2086145d98cf/pr-11054

Backported as #11067

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

patchback bot pushed a commit that referenced this pull request May 28, 2025
Copy link
Contributor

patchback bot commented May 28, 2025

Backport to 3.12: 💚 backport PR created

✅ Backport PR branch: patchback/backports/3.12/e2eb1959237c8bc46a3f0e79a77c2086145d98cf/pr-11054

Backported as #11068

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

Copy link
Contributor

patchback bot commented May 28, 2025

Backport to 3.13: 💚 backport PR created

✅ Backport PR branch: patchback/backports/3.13/e2eb1959237c8bc46a3f0e79a77c2086145d98cf/pr-11054

Backported as #11069

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

bdraco added a commit that referenced this pull request May 28, 2025
…er_cookies() (#11069)

Co-authored-by: J. Nick Koston <[email protected]>
Fixes #11052 memory leak issue
bdraco added a commit that referenced this pull request May 28, 2025
…er_cookies() (#11068)

Co-authored-by: J. Nick Koston <[email protected]>
Fixes #11052 memory leak issue
bdraco added a commit that referenced this pull request May 28, 2025
…er_cookies() (#11067)

Co-authored-by: J. Nick Koston <[email protected]>
Fixes #11052 memory leak issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-3.13 Trigger automatic backporting to the 3.13 release branch by Patchback robot bot:chronographer:provided There is a change note present in this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Possible memory leak in the CookieJar

2 participants