Implement Recognize #76

DavidVentura · 2025-07-16T16:30:16Z

Calling recognize on the iterator instead of getHOCRText is ~20% faster for my test cases.

Image 1

Test	Recognize() (ms)	GetHOCRText() (ms)
1	1366	1718
2	1116	1242
3	1048	1239
Average	1177	1400

Image 2

Test	Recognize() (ms)	GetHOCRText() (ms)
1	878	1603
2	929	1201
3	1396	1132
Average	1068	1312

Robyer · 2025-07-29T13:27:21Z

Hi, sorry for late reply.

Have you tried comparison between getHOCRText and getUTF8Text? Because the recognize function is called at the start of both of these methods (if the image is not already recognized). And then the difference is only that getHOCRText is providing monitor to the recognize call to get informed about progress and let user be able to cancel the processing, and getUTF8Text is not (same as the recognize in your PR).

So the 20 % difference can be just because of that (+ some for extra markup of the HOCR format)?

DavidVentura · 2025-08-19T07:44:25Z

About 15% of the overhead seems to happen when the progress callback is not null. There is still a ~5% overhead on getHOCRText vs Recognize

Robyer · 2025-08-24T15:57:01Z

Thanks, that makes sense.

So if you don't want callback or the HOCR text format, just use getUTF8Text as that will be fastest - no need for separate Recognize call.

Or do you still see some benefit of using Recognize separately from getUTF8Text? If not, we can close this PR.

Implement Recognize

41affbf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement Recognize #76

Implement Recognize #76

Uh oh!

DavidVentura commented Jul 16, 2025

Uh oh!

Robyer commented Jul 29, 2025

Uh oh!

DavidVentura commented Aug 19, 2025

Uh oh!

Robyer commented Aug 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Implement Recognize #76

Are you sure you want to change the base?

Implement Recognize #76

Uh oh!

Conversation

DavidVentura commented Jul 16, 2025

Uh oh!

Robyer commented Jul 29, 2025

Uh oh!

DavidVentura commented Aug 19, 2025

Uh oh!

Robyer commented Aug 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants