-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Hi, this is a random person who got this repo in github recommendations.
Comment1: cool and nice, but ungzipped matrix market is a very expensive option. Comparison with gzip is more valuable IMO
Comment2: And as a baseline, it's valuable to see something terribly simple based on columnar format, e.g.:
import polars as pl
from scipy import sparse
import numpy as np
x = sparse.coo_array(np.random.randn(1000, 1000)) # not sparse, I know
# save to parquet
pl.DataFrame(dict(col=x.col, row=x.row, data=x.data)).write_parquet('/tmp/sparse_coo.pqt')
df = pl.read_parquet('/tmp/sparse_coo.pqt')
x2 = sparse.coo_array((df['data'].to_numpy(), (df['row'].to_numpy(), df['col'].to_numpy())))
np.allclose(x.todense(), x2.todense())and see how these compares in terms of speed (because from perspective of floating precision binary formats certainly win)
Metadata
Metadata
Assignees
Labels
No labels