Skip to content

Conversation

@paraseba
Copy link
Collaborator

This implements a natie version of:

async def getsize(self, key: str) -> int: ...

Notably, it doesn't implement getsize_prefix. We think letting zarr-python do the loop that calls getsize can be enough performance.

This implements a natie version of:

```python
async def getsize(self, key: str) -> int: ...
```

Notably, it doesn't implement `getsize_prefix`. We think letting
zarr-python do the loop that calls `getsize` can be enough performance.
@paraseba
Copy link
Collaborator Author

Codespell will approve once we merge my manifest open PR that has all these fixes

# wrap the async method in a sync method.
return self._store.list_dir(prefix)

async def getsize(self, key: str) -> int:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, see comment in the PR ... I want to try if we can skip implementing those. If performance is not there we can implement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AH ok. I'll push a benchmark, then approve.

@dcherian
Copy link
Contributor

Phew after some work:

-------- benchmark 'test_time_getsize_key era5-single': 2 tests --------
Name (time in us)                                       Median
------------------------------------------------------------------------
test_time_getsize_key[era5-single] (0034_main_3a)     110.2501 (1.05)
test_time_getsize_key[era5-single] (NOW)              104.7080 (1.0)
------------------------------------------------------------------------

-------- benchmark 'test_time_getsize_key gb-128mb': 2 tests --------
Name (time in us)                                    Median
---------------------------------------------------------------------
test_time_getsize_key[gb-128mb] (0034_main_3a)     106.5000 (1.05)
test_time_getsize_key[gb-128mb] (NOW)              101.2499 (1.0)
---------------------------------------------------------------------

-------- benchmark 'test_time_getsize_key gb-8mb': 2 tests --------
Name (time in us)                                  Median
-------------------------------------------------------------------
test_time_getsize_key[gb-8mb] (0034_main_3a)     105.6660 (1.05)
test_time_getsize_key[gb-8mb] (NOW)              100.8749 (1.0)
-------------------------------------------------------------------

------- benchmark 'test_time_getsize_prefix era5-single': 2 tests --------
Name (time in ms)                                         Median
--------------------------------------------------------------------------
test_time_getsize_prefix[era5-single] (0034_main_3a)     68.8355 (31.10)
test_time_getsize_prefix[era5-single] (NOW)               2.2133 (1.0)
--------------------------------------------------------------------------

So big difference in getsize_prefix.

No difference ingetsize(key) because I am only testing getsize on an array.json which is already in memory.

# > pytest [...] --benchmark-save=main_3abfa48a --icechunk-prefix=benchmarks/main_3abfa48a/ benchmarks/
# note the created prefix: main_(first-8-characters-of-commit), for convenienve export it
export PREFIX=benchmarks/main_3abfa48a/
pytest --benchmark-compare -k getsize --benchmark-group-by=group,func,param --benchmark-columns=median --benchmark-sort=name --icechunk-prefix=$PREFIX benchmarks/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struggled with making a just command for this

@paraseba paraseba merged commit 9d59080 into main Jan 23, 2025
4 checks passed
@paraseba paraseba deleted the push-uknyqnpypzro branch January 23, 2025 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants