-
Couldn't load subscription status.
- Fork 982
Caching information for repeated UPSERTs to the same chunk #8182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #8182 +/- ##
==========================================
+ Coverage 82.29% 82.41% +0.12%
==========================================
Files 256 256
Lines 47934 47958 +24
Branches 12077 12089 +12
==========================================
+ Hits 39448 39526 +78
- Misses 3643 3648 +5
+ Partials 4843 4784 -59 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
This change brings close to 2x performance gains on upsert performance: |
75d04e7 to
3849fcf
Compare
3849fcf to
5aa9641
Compare
When upserting data into the same compressed chunk we make index checks and compression setting lookup that is suprisingly expensive. This change allows caching scan keys and compression settings between repeated calls to the same compressed chunk and it speeds up these repeated upserts.
5aa9641 to
ee81eb5
Compare
|
@akuzm : I believe I addressed all the concerns |
Thanks! Let's also ask @antekresic to take a look since I'm not really up to date on this code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @dbeck ! LGTM 🎉
## 2.21.0 (2025-07-08) This release contains performance improvements and bug fixes since the 2.20.3 release. We recommend that you upgrade at the next available opportunity. **Highlighted features in TimescaleDB v2.21.0** * The attach & detach chunks feature allows manually adding or removing chunks from a hypertable with uncompressed chunks, similar to PostgreSQL’s partition management. * Continued improvement of backfilling into the columnstore, achieving up to 2.5x speedup for constrained tables, by introducing caching logic that boosts throughput for writes to compressed chunks, bringing `INSERT` performance close to that of uncompressed chunks. * Optimized `DELETE` operations on the columstore through batch-level deletions of non-segmentby keys in the filter condition, greatly improving performance to up to 42x faster in some cases, as well as reducing bloat, and lowering resource usage. * The heavy lock taken in Continuous Aggregate refresh was relaxed, enabling concurrent refreshes for non-overlapping ranges and eliminating the need for complex customer workarounds. * [tech preview] Direct Compress is an innovative TimescaleDB feature that improves high-volume data ingestion by compressing data in memory and writing it directly to disk, reducing I/O overhead, eliminating dependency on background compression jobs, and significantly boosting insert performance. **Sunsetting of the hypercore access method** We made the decision to deprecate hypercore access method (TAM) with the 2.21.0 release. It was an experiment, which did not show the signals we hoped for and will be sunsetted in TimescaleDB 2.22.0, scheduled for September 2025. Upgrading to 2.22.0 and higher will be blocked if TAM is still in use. Since TAM’s inception in [2.18.0](https://github.com/timescale/timescaledb/releases/tag/2.18.0), we learned that btrees were not the right architecture. The recent advancements in the columnstore—such as more performant backfilling, SkipScan, adding check constraints, and faster point queries—put the [columnstore](https://www.timescale.com/blog/hypercore-a-hybrid-row-storage-engine-for-real-time-analytics) close to or on par with TAM without the storage from the additional index. We apologize for the inconvenience this action potentially causes and are here to assist you during the migration process. Migration path ``` do $$ declare relid regclass; begin for relid in select cl.oid from pg_class cl join pg_am am on (am.oid = cl.relam) where am.amname = 'hypercore' loop raise notice 'converting % to heap', relid::regclass; execute format('alter table %s set access method heap', relid); end loop; end $$; ``` **Features** * [#8081](#8081) Use JSON error code for job configuration parsing * [#8100](#8100) Support splitting compressed chunks * [#8131](#8131) Add policy to process hypertable invalidations * [#8141](#8141) Add function to process hypertable invalidations * [#8165](#8165) Reindex recompressed chunks in compression policy * [#8178](#8178) Add columnstore option to `CREATE TABLE WITH` * [#8179](#8179) Implement direct `DELETE` on non-segmentby columns * [#8182](#8182) Cache information for repeated upserts into the same compressed chunk * [#8187](#8187) Allow concurrent Continuous Aggregate refreshes * [#8191](#8191) Add option to not process hypertable invalidations * [#8196](#8196) Show deprecation warning for TAM * [#8208](#8208) Use `NULL` compression for bool batches with all null values like the other compression algorithms * [#8223](#8223) Support for attach/detach chunk * [#8265](#8265) Set incremental Continous Aggregate refresh policy on by default * [#8274](#8274) Allow creating concurrent continuous aggregate refresh policies * [#8314](#8314) Add support for timescaledb_lake in loader * [#8209](#8209) Add experimental support for Direct Compress of `COPY` * [#8341](#8341) Allow quick migration from hypercore TAM to (columnstore) heap **Bugfixes** * [#8153](#8153) Restoring a database having NULL compressed data * [#8164](#8164) Check columns when creating new chunk from table * [#8294](#8294) The "vectorized predicate called for a null value" error for WHERE conditions like `x = any(null::int[])`. * [#8307](#8307) Fix missing catalog entries for bool and null compression in fresh installations * [#8323](#8323) Fix DML issue with expression indexes and BHS **GUCs** * `enable_direct_compress_copy`: Enable experimental support for direct compression during `COPY`, default: off * `enable_direct_compress_copy_sort_batches`: Enable batch sorting during direct compress `COPY`, default: on * `enable_direct_compress_copy_client_sorted`: Correct handling of data sorting by the user is required for this option, default: off --------- Signed-off-by: Philip Krauss <[email protected]> Co-authored-by: philkra <[email protected]> Co-authored-by: philkra <[email protected]> Co-authored-by: Fabrízio de Royes Mello <[email protected]> Co-authored-by: Anastasiia Tovpeko <[email protected]>
When upserting data into the same compressed chunk we make index checks and compression setting lookup that is suprisingly expensive. This change allows caching scan keys and compression settings between repeated calls to the same compressed chunk and it speeds up these repeated upserts.