Fix memory leaks

Several Node.js users have reported that using a single worker with hundreds of images increases memory usage linearly over time, which indicates the presence of a memory leak.  The recommended solution has been to periodically terminate workers and create new ones.  While this is good advice for other reasons (see note below), we should still attempt to resolve the memory leak.  

The leak is small enough as to only (based on user reports) impact Node.js users recognizing many images on a server, so is likely relatively small on a per-image basis.  The most likely explanation is that there is some issue with how we export results from Tesseract.  This is based purely on process of elimination--if the issue was with the input (images), the leak would be much larger in magnitude, and if the leak occurred within Tesseract presumably it would be reported and (hopefully) patched within the main Tesseract repo. 

**Note for users:** the advice to not reuse the same workers in perpetuity on a server is good, even if the memory leak gets fixed.  This is because Tesseract workers "learn" over time by default.  While this learning generally improves results, it assumes that (1) previous results are generally correct and (2) the image that is being recognized closely resembles previous images.  As a result, if the same worker is used with hundreds of different documents from different users, it is common for Tesseract to "learn" something incorrect or inapplicable, making results worse than had a fresh worker be used. 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix memory leaks #977

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Fix memory leaks #977

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions