-
Notifications
You must be signed in to change notification settings - Fork 50
Efficient getsize in the Zarr store #605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This implements a natie version of: ```python async def getsize(self, key: str) -> int: ... ``` Notably, it doesn't implement `getsize_prefix`. We think letting zarr-python do the loop that calls `getsize` can be enough performance.
|
Codespell will approve once we merge my manifest open PR that has all these fixes |
| # wrap the async method in a sync method. | ||
| return self._store.list_dir(prefix) | ||
|
|
||
| async def getsize(self, key: str) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a getsize_prefix too: https://zarr.readthedocs.io/en/stable/api/zarr/abc/store/index.html#zarr.abc.store.Store.getsize_prefix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, see comment in the PR ... I want to try if we can skip implementing those. If performance is not there we can implement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AH ok. I'll push a benchmark, then approve.
|
Phew after some work: So big difference in No difference in |
| # > pytest [...] --benchmark-save=main_3abfa48a --icechunk-prefix=benchmarks/main_3abfa48a/ benchmarks/ | ||
| # note the created prefix: main_(first-8-characters-of-commit), for convenienve export it | ||
| export PREFIX=benchmarks/main_3abfa48a/ | ||
| pytest --benchmark-compare -k getsize --benchmark-group-by=group,func,param --benchmark-columns=median --benchmark-sort=name --icechunk-prefix=$PREFIX benchmarks/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
struggled with making a just command for this
This implements a natie version of:
Notably, it doesn't implement
getsize_prefix. We think letting zarr-python do the loop that callsgetsizecan be enough performance.