-
Notifications
You must be signed in to change notification settings - Fork 362
Open
Description
Right now, there exists "output_format files" and also the calculate hash optional feature.
What would be really nice to see, would be "output_format md5_bucket_files".
eg: file has md5 hash a37f352376.....
So it gets saved to a3/a37f35....jpg
I personally have a post-download conversion routine to organized the downloaded files liie this, but I just realized.... what if img2dataset did it out of the box?
There are multiple advantages to this new feature:
- the "100,000 files in a single directory" problem mostly goes away
- collaboration with other people on a dataset becomes a lot easier. For example, if you are both working from a raw image dataset somewhere, one of you can run captioning on it, zip up the directory with just the .txt files, send it over, then the other person can extract it, and they will automatically match up to the right images.
Metadata
Metadata
Assignees
Labels
No labels