-
Notifications
You must be signed in to change notification settings - Fork 993
Support splitting compressed chunks #8100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d31d571 to
3f49c47
Compare
013df47 to
26baf56
Compare
daac8de to
5f66725
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #8100 +/- ##
==========================================
- Coverage 82.16% 81.97% -0.19%
==========================================
Files 257 257
Lines 48323 48513 +190
Branches 12189 12221 +32
==========================================
+ Hits 39704 39768 +64
- Misses 3741 3920 +179
+ Partials 4878 4825 -53 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
558bdb5 to
fcba4c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds support for splitting compressed chunks by extending split_chunk() test coverage and updating internal APIs to propagate access methods.
- Introduce
split_chunk_validate()PL/pgSQL procedure and update existing tests to verify both row counts and compression stats. - Add
row_compressor_append_ordered_slot()helper in the compression subsystem and refine tuple building context. - Extend chunk-creation APIs (
ts_chunk_create_table, related helpers) to accept an access-method OID, defaulting toInvalidOidwhen unchanged.
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tsl/test/sql/split_chunk.sql | New validation proc and adjusted split tests for compressed chunks |
| tsl/src/compression/compression.h | Declare row_compressor_append_ordered_slot |
| tsl/src/compression/compression.c | Implement ordered-slot compressor and tidy up tuple building |
| tsl/src/chunk_api.c | Pass InvalidOid for AM OID in chunk_create |
| src/chunk.h | Extend ts_chunk_create_table and ts_chunk_find_or_create_without_cuts with amoid |
| src/chunk.c | Propagate AM OID through ts_chunk_create_table and related calls |
| .unreleased/pr_8100 | Add release note for feature #8100 |
Comments suppressed due to low confidence (1)
tsl/test/sql/split_chunk.sql:81
- Remove the stray trailing 'and' before the THEN on the IF condition (lines 79–81) so the syntax reads
IF cond1 AND cond2 THENrather than ending with an extra 'and'.
then
| * | ||
| * Main state for doing a split. | ||
| */ | ||
| typedef struct SplitContext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be a good idea to put all split-related things into a separate file? Seems pretty big and independent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is growing big. But it is probably best to break out in a separate PR. Otherwise it will be hard to see the changes. WDYT?
7384aed to
06ccd68
Compare
| if (rewrite_heap_dead_tuple(rws->rwstate, tuple)) | ||
| { | ||
| /* A previous recently-dead tuple is now known dead */ | ||
| tups_vacuumed += 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly sure how to get this covered by a test. This code is modeled on the cluster code in PG, so should be correct.
I might try an isolation test. Probably one where there are concurrent transactions that the finish before it gets here.
mkindahl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good, but some minor nits and a few requests about test coverage.
| static int | ||
| route_tuple(TupleTableSlot *slot, const SplitPoint *sp) | ||
| { | ||
| bool isnull = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: since you're using it as an output parameter below, this assignment is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure which of these combinations you have tests for, but please make sure you have for all of:
- Fully compressed to two fully compressed
- Partially compressed to only uncompressed plus only compressed (both combinations: moving compressed data to new chunk and moving uncompressed data to new chunk).
- Partially compressed to two partially compressed (both uncompressed and compressed data in both chunks).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of 1-3 should be covered now. There are also test for several other combinations, for example:
- Split point within segment
- Split point at min of segment
- Split point at max of segment
| -- Split compressed chunk in a way that leaves no compressed data in | ||
| -- one of the chunks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you testing both combinations? That is, moving (all) compressed data to the new chunk and moving (all) uncompressed data to the new chunk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are tests to split so that one chunk gets all compressed data and the other chunk gets non-compressed data. But I am not sure the same test exists for both "left" and "right"... Do you think that is necessary? The test is already quite big and hard to read so not convinced of the value of adding this since it is more or less covered by all other combos.
42777c5 to
c59b4b8
Compare
Add support for splitting compressed chunks when calling split_chunk(). Only 2-way splits are supported, as before. When splitting a compressed chunk, the same rewrite-approach is taken as when splitting a non-compressed chunk. The main difference is that the internal compressed relation is split in a separate step, similar to the non-compressed relation. One complication is that the split might end up in the middle of a compressed segment, in which case the segment needs to be split as well. This is done by decompressing the segment in memory and creating two new segments that get written to the resulting split relations. Since the non-compressed and compressed relations are split separetely, the non-compressed data remains non-compressed after the split. Special care is also taken to handle compression stats. In particular, instead of computing new stats, the existing stats are divided across the result relations depending on the amount of data received. This means that the sum of the stats after a split is the same as before the split. Computing new stats is not feasible given that the current approach requires knowing the pre-compressed relation sizes, which is not possible without first decompressing everything.
c59b4b8 to
26b0c5d
Compare
## 2.21.0 (2025-07-08) This release contains performance improvements and bug fixes since the 2.20.3 release. We recommend that you upgrade at the next available opportunity. **Highlighted features in TimescaleDB v2.21.0** * The attach & detach chunks feature allows manually adding or removing chunks from a hypertable with uncompressed chunks, similar to PostgreSQL’s partition management. * Continued improvement of backfilling into the columnstore, achieving up to 2.5x speedup for constrained tables, by introducing caching logic that boosts throughput for writes to compressed chunks, bringing `INSERT` performance close to that of uncompressed chunks. * Optimized `DELETE` operations on the columstore through batch-level deletions of non-segmentby keys in the filter condition, greatly improving performance to up to 42x faster in some cases, as well as reducing bloat, and lowering resource usage. * The heavy lock taken in Continuous Aggregate refresh was relaxed, enabling concurrent refreshes for non-overlapping ranges and eliminating the need for complex customer workarounds. * [tech preview] Direct Compress is an innovative TimescaleDB feature that improves high-volume data ingestion by compressing data in memory and writing it directly to disk, reducing I/O overhead, eliminating dependency on background compression jobs, and significantly boosting insert performance. **Sunsetting of the hypercore access method** We made the decision to deprecate hypercore access method (TAM) with the 2.21.0 release. It was an experiment, which did not show the signals we hoped for and will be sunsetted in TimescaleDB 2.22.0, scheduled for September 2025. Upgrading to 2.22.0 and higher will be blocked if TAM is still in use. Since TAM’s inception in [2.18.0](https://github.com/timescale/timescaledb/releases/tag/2.18.0), we learned that btrees were not the right architecture. The recent advancements in the columnstore—such as more performant backfilling, SkipScan, adding check constraints, and faster point queries—put the [columnstore](https://www.timescale.com/blog/hypercore-a-hybrid-row-storage-engine-for-real-time-analytics) close to or on par with TAM without the storage from the additional index. We apologize for the inconvenience this action potentially causes and are here to assist you during the migration process. Migration path ``` do $$ declare relid regclass; begin for relid in select cl.oid from pg_class cl join pg_am am on (am.oid = cl.relam) where am.amname = 'hypercore' loop raise notice 'converting % to heap', relid::regclass; execute format('alter table %s set access method heap', relid); end loop; end $$; ``` **Features** * [#8081](#8081) Use JSON error code for job configuration parsing * [#8100](#8100) Support splitting compressed chunks * [#8131](#8131) Add policy to process hypertable invalidations * [#8141](#8141) Add function to process hypertable invalidations * [#8165](#8165) Reindex recompressed chunks in compression policy * [#8178](#8178) Add columnstore option to `CREATE TABLE WITH` * [#8179](#8179) Implement direct `DELETE` on non-segmentby columns * [#8182](#8182) Cache information for repeated upserts into the same compressed chunk * [#8187](#8187) Allow concurrent Continuous Aggregate refreshes * [#8191](#8191) Add option to not process hypertable invalidations * [#8196](#8196) Show deprecation warning for TAM * [#8208](#8208) Use `NULL` compression for bool batches with all null values like the other compression algorithms * [#8223](#8223) Support for attach/detach chunk * [#8265](#8265) Set incremental Continous Aggregate refresh policy on by default * [#8274](#8274) Allow creating concurrent continuous aggregate refresh policies * [#8314](#8314) Add support for timescaledb_lake in loader * [#8209](#8209) Add experimental support for Direct Compress of `COPY` * [#8341](#8341) Allow quick migration from hypercore TAM to (columnstore) heap **Bugfixes** * [#8153](#8153) Restoring a database having NULL compressed data * [#8164](#8164) Check columns when creating new chunk from table * [#8294](#8294) The "vectorized predicate called for a null value" error for WHERE conditions like `x = any(null::int[])`. * [#8307](#8307) Fix missing catalog entries for bool and null compression in fresh installations * [#8323](#8323) Fix DML issue with expression indexes and BHS **GUCs** * `enable_direct_compress_copy`: Enable experimental support for direct compression during `COPY`, default: off * `enable_direct_compress_copy_sort_batches`: Enable batch sorting during direct compress `COPY`, default: on * `enable_direct_compress_copy_client_sorted`: Correct handling of data sorting by the user is required for this option, default: off --------- Signed-off-by: Philip Krauss <[email protected]> Co-authored-by: philkra <[email protected]> Co-authored-by: philkra <[email protected]> Co-authored-by: Fabrízio de Royes Mello <[email protected]> Co-authored-by: Anastasiia Tovpeko <[email protected]>
Add support for splitting compressed chunks when calling split_chunk(). Only 2-way splits are supported, as before.
When splitting a compressed chunk, the same rewrite-approach is taken as when splitting a non-compressed chunk. The main difference is that the internal compressed relation is split in a separate step, similar to the non-compressed relation. One complication is that the split might end up in the middle of a compressed segment, in which case the
segment needs to be split as well. This is done by decompressing the segment in memory and creating two new segments that get written to the resulting split relations. Since the non-compressed and compressed
relations are split separetely, the non-compressed data remains non-compressed after the split.
Special care is also taken to handle compression stats. In particular, instead of computing new stats, the existing stats are divided across the result relations depending on the amount of data received. This means that the sum of the stats after a split is the same as before the split. Computing new stats is not feasible given that the current approach requires knowing the pre-compressed relation sizes, which is not possible without first decompressing everything.