parq-tools is a collection of utilities for efficiently working with large-scale Parquet datasets.
A typical use case is asset-based workflows with large scientific datasets.
:::note
If your datasets are not large, you might find the pandas library more convenient.
:::
- Filtering → Efficiently filter large parquet files.
- Concatenation → Combines multiple Parquet files efficiently along rows (
axis=0) or columns (axis=1). - Tokenized Filtering → Converts pandas-style expressions into efficient PyArrow queries.
- Profiling Enhancements → Improves
ydata-profilingby profiling specific columns incrementally, merging results for large files. - DataFrame Enhancements → Provides a
LazyParquetDataFrameclass that extendspandas.DataFramewith lazy loading from Parquet files.