Replies: 3 comments
-
This is a good idea that's been in my mind for a while but I've never found any proper ways to do it. There are a number of issues:
As long as below principles are met, I'm open to any ideas
P.S. None of the anime is old enough to become public domain yet. |
Beta Was this translation helpful? Give feedback.
-
Thanks again for your thoughtful response and for sharing your perspective so clearly. It’s refreshing to see someone take such care with both the technical and ethical sides of a project. I also had one more question, if you don’t mind me asking: Since trace.moe indexes such a large amount of anime content (over 100k hours), I was curious how you manage the collection and storage of that data. It seems like it would require a lot of bandwidth and storage, not to mention the challenge of obtaining the episodes themselves. If you’re comfortable sharing, I’d love to learn how you approached that side of the project, whether it’s automation, storage optimization, legal handling, or something else. I understand if some parts need to remain private, but any insight would be really appreciated. Thanks again for being so open about your work and for making trace.moe such a valuable resource. |
Beta Was this translation helpful? Give feedback.
-
The entire collection is about 30TB now, which is a pretty small compared to what other DataHoarders on reddit are storing, even with backups. And there's only a limited amonut of new anime produced every year, so it grows steadily by only about 1TB/year. I've some scripts to help me detect duplicate entries (either by file name or stream hash), and from time to time I'll check if I should keep it or drop it. They're just some ad-hoc scripts/commands, nothing magical. There're some other tools I found it useful: ncdu, vifm, rclone, yt-dlp. Like collecting stamps, I collect things as a hobby long before I made this search engine. 100,000 files may seem a lot, but over a span of 15 years it's less than 20 files/day. I also use anilist to check if there's anything missing in my dataset, and use it to create regex pattern to move files automatically to the correct folder with anilist ID. So most of the time it runs unattended throughout a season. And it doesn't use that much bandwidth too (<1TB/month) because the video preview is very short and highly compressed. Hetzner and Cloudflare is also part of Bandwidth Alliance, so that's basically free unlimited traffic. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Soruly,
I’m really impressed by the work you’ve done on trace.moe — it’s an incredibly helpful tool and an inspiring open-source project.
I’m currently exploring how anime scene search engines work, and I’d like to experiment with building something similar for personal or educational use. I noticed that while the codebase is open source, the anime video dataset used for indexing isn’t included (which is completely understandable).
I wanted to kindly ask:
• Is there any way to access the anime episode library you use, even in limited form (e.g. public domain anime or a subset)?
• Or alternatively, is there an API or service you provide (public or private) that allows querying or accessing the anime video data for indexing or testing?
I completely understand if this data can’t be shared due to copyright or bandwidth constraints, but I’d really appreciate any pointers or suggestions you’re willing to offer.
Thank you again for building trace.moe and making part of it open to the community.
Beta Was this translation helpful? Give feedback.
All reactions