Skip to content

Conversation

kdt523
Copy link

@kdt523 kdt523 commented Oct 10, 2025

Solution
Added two-layer protection in addAvailableLanguages():

  1. Preventive check: Verify directory exists before attempting iteration
  2. Exception handling: Catch filesystem errors during traversal

Changes

  • Add std::filesystem::exists() and std::filesystem::is_directory() checks
  • Wrap recursive_directory_iterator in try-catch block
  • Function now returns gracefully with empty language list instead of crashing

Testing

  • Verified fix handles missing directories without crashing
  • Tested edge cases (empty paths, permission denied scenarios)

Impact

  • After: Graceful degradation - returns empty language list and continues running

This fix improves robustness for users with incomplete Tesseract installations.

Contributing to Hacktoberfest 2025

@stweil
Copy link
Member

stweil commented Oct 10, 2025

How did you get the crash?

@stweil
Copy link
Member

stweil commented Oct 10, 2025

I get a exception which is handled:

% tesseract --tessdata-dir /missing  --list-langs
exception: filesystem error: in recursive_directory_iterator: No such file or directory ["/missing/"]

% tesseract --tessdata-dir file  --list-langs 
exception: filesystem error: in recursive_directory_iterator: Not a directory ["file/"]

Isn't it better to get such an error message instead of silently failing?

Comment on lines 149 to 153
// Check if directory exists before attempting to iterate
if (!std::filesystem::exists(datadir) || !std::filesystem::is_directory(datadir)) {
return;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code block is not needed. The iteration will raise an exception for both bases, and this exception is handled.

Comment on lines 149 to 153
// Check if directory exists before attempting to iterate
if (!std::filesystem::exists(datadir) || !std::filesystem::is_directory(datadir)) {
return;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Check if directory exists before attempting to iterate
if (!std::filesystem::exists(datadir) || !std::filesystem::is_directory(datadir)) {
return;
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done Removed the existence check as suggested. The code now uses only the try-catch approach to handle filesystem errors, which is cleaner and still prevents the crash

}
}
} catch (const std::filesystem::filesystem_error&) {
// Silently handle filesystem errors (e.g., permission denied, corrupted filesystem)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Permission denied" was already silently handled.

@zdenop
Copy link
Contributor

zdenop commented Oct 11, 2025

@kdt523 : please check if #4372 solve your problem.

@kdt523
Copy link
Author

kdt523 commented Oct 11, 2025

How did you get the crash?

The crash was reported in a GitHub issue with a complete stack trace. I didn't personally reproduce it, but analyzed the code to understand the root cause:

The crash happens when:

User installs tesseract-ocr package without language data packages
The /usr/share/tessdata/ directory doesn't exist

@stweil
Copy link
Member

stweil commented Oct 11, 2025

How did you get the crash?

The crash was reported in a GitHub issue with a complete stack trace.

Are you referring to issue #4364?

@kdt523
Copy link
Author

kdt523 commented Oct 11, 2025

How did you get the crash?

The crash was reported in a GitHub issue with a complete stack trace.

Are you referring to issue #4364?
yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants