Add --validate-images CLI option to filter corrupt images using PIL #388
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds a new
--validate-images
CLI option that enables PIL-based image validation to filter out corrupt or invalid image files during processing.Problem
When processing image datasets, users may encounter corrupt or invalid image files that cause processing to fail. Currently, zamba only checks if files exist and have non-zero size, but doesn't validate that they are actually valid images that can be opened and processed.
Solution
This PR adds a new CLI option
--validate-images
that:Usage
Command Line Interface
For image prediction:
For image training:
Python API
Implementation Details
validate_images=False
)Changes Made
--validate-images
option to bothpredict
andtrain
commandsvalidate_images: bool = False
parameter to both config classes_validate_filepath_with_pil()
function using PILExample Output
With validation enabled, users will see:
This feature is particularly useful when working with datasets from external sources or when data integrity is uncertain.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.