Skip to content

Memory Leak Detected #962

@EdGaere

Description

@EdGaere

Overview Description

There appears to be a memory leak generated by repeated calls to dates.parse_pattern(dt, format, locale) if the function is called with a wide variety of different formats and locales.

This is because each time a new DateTimePattern is created for a new (format, locale), the object is cached to _pattern_cache (dict), which grows endlessly.

babel/dates.py:1598 (Babel 2.9.1)

def parse_pattern(pattern):
    """Parse date, time, and datetime format patterns."""
    ...

    # here is the problem
     _pattern_cache[pattern] = pat = DateTimePattern(pattern, u''.join(result))

Perhaps a better design could be to simply lru_cache the dates.parse_pattern() function ?

from functools import lru_cache

@lru_cache(maxsize=1000)
def parse_pattern(pattern):
    """Parse date, time, and datetime format patterns."""
    ...

Steps to Reproduce

from datetime import datetime

from babel.localedata import locale_identifiers
from babel.dates import format_datetime

from pympler import tracker # track memory leaks(=> https://github.com/pympler/pympler)

# show initial memory usage
tr = tracker.SummaryTracker()
tr.print_diff()

# create some random datetime
d = datetime(2007, 4, 1, 13, 27, 53)

# create some datetime formats
custom_formats = [  r"M/d/yy, h:mm a" # short
                    ,r"MMM d, y, h:mm:ss a" # medium
                    ,r"MMMM d, y 'at' h:mm:ss a z" # long
                    ,r"EEEE, MMMM d, y 'at' h:mm:ss a zzzz" # full

                    ,r"EEEE, MMMM d, y 'at' hh:mm:ss zzz" # shorter timezone
                    ,r"EEEE, MMMM d, y 'at' hh:mm:ss zzzz" # full, 24hr
                        
                    ,r"EEEE, MMMM d, y 'at' hh:mm:ss"
                    ,r"EEEE, MMMM d, y 'at' h:mm:ss a"

                    ,r"EEEE, d MMM y hh:mm:ss"
                    ,r"EEEE, d MMM y h:mm:ss a"

                    ,r"d MMM y hh:mm:ss"
                    ,r"d MMM y h:mm:ss a"
                    ]

# call format_datetime for all locale/format combinations, about 9.4k combinations
for locale_name in locale_identifiers():
    for custom_format in custom_formats:
        s = format_datetime(d, locale=locale_name, format=custom_format)

# show difference in memory usage since start
tr.print_diff()




Actual Results

Initial Memory Snapshot
types | # objects | total size

               list |        3750 |     318.95 KB
                str |        3747 |     260.45 KB
                int |         817 |      22.34 KB

Final Memory Snapshot
types | # objects | total size

                         dict |      272282 |    113.17 MB
                          str |       21809 |      1.51 MB
                         list |       12416 |      1.12 MB
  babel.dates.DateTimePattern |        9668 |    453.19 KB
                        tuple |        6829 |    385.02 KB
  babel.numbers.NumberPattern |        7550 |    353.91 K

Expected Results

Reproducibility

Additional Information

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions