Support splitting compressed chunks #8100

erimatnor · 2025-05-09T15:27:13Z

Add support for splitting compressed chunks when calling split_chunk(). Only 2-way splits are supported, as before.

When splitting a compressed chunk, the same rewrite-approach is taken as when splitting a non-compressed chunk. The main difference is that the internal compressed relation is split in a separate step, similar to the non-compressed relation. One complication is that the split might end up in the middle of a compressed segment, in which case the
segment needs to be split as well. This is done by decompressing the segment in memory and creating two new segments that get written to the resulting split relations. Since the non-compressed and compressed
relations are split separetely, the non-compressed data remains non-compressed after the split.

Special care is also taken to handle compression stats. In particular, instead of computing new stats, the existing stats are divided across the result relations depending on the amount of data received. This means that the sum of the stats after a split is the same as before the split. Computing new stats is not feasible given that the current approach requires knowing the pre-compressed relation sizes, which is not possible without first decompressing everything.

codecov · 2025-05-22T13:16:09Z

Codecov Report

Attention: Patch coverage is 91.20879% with 32 lines in your changes missing coverage. Please review.

Project coverage is 81.97%. Comparing base (5a51693) to head (26b0c5d).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
tsl/src/chunk.c	91.34%	6 Missing and 23 partials ⚠️
tsl/src/compression/compression.c	88.23%	0 Missing and 2 partials ⚠️
src/chunk.c	91.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #8100      +/-   ##
==========================================
- Coverage   82.16%   81.97%   -0.19%     
==========================================
  Files         257      257              
  Lines       48323    48513     +190     
  Branches    12189    12221      +32     
==========================================
+ Hits        39704    39768      +64     
- Misses       3741     3920     +179     
+ Partials     4878     4825      -53

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull Request Overview

Adds support for splitting compressed chunks by extending split_chunk() test coverage and updating internal APIs to propagate access methods.

Introduce split_chunk_validate() PL/pgSQL procedure and update existing tests to verify both row counts and compression stats.
Add row_compressor_append_ordered_slot() helper in the compression subsystem and refine tuple building context.
Extend chunk-creation APIs (ts_chunk_create_table, related helpers) to accept an access-method OID, defaulting to InvalidOid when unchanged.

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tsl/test/sql/split_chunk.sql	New validation proc and adjusted split tests for compressed chunks
tsl/src/compression/compression.h	Declare `row_compressor_append_ordered_slot`
tsl/src/compression/compression.c	Implement ordered-slot compressor and tidy up tuple building
tsl/src/chunk_api.c	Pass `InvalidOid` for AM OID in `chunk_create`
src/chunk.h	Extend `ts_chunk_create_table` and `ts_chunk_find_or_create_without_cuts` with `amoid`
src/chunk.c	Propagate AM OID through `ts_chunk_create_table` and related calls
.unreleased/pr_8100	Add release note for feature #8100

Comments suppressed due to low confidence (1)

tsl/test/sql/split_chunk.sql:81

Remove the stray trailing 'and' before the THEN on the IF condition (lines 79–81) so the syntax reads IF cond1 AND cond2 THEN rather than ending with an extra 'and'.

        then

tsl/test/sql/split_chunk.sql

akuzm · 2025-05-23T09:56:41Z

tsl/src/chunk.c

+ *
+ * Main state for doing a split.
+ */
+typedef struct SplitContext


Would it be a good idea to put all split-related things into a separate file? Seems pretty big and independent.

Yes, it is growing big. But it is probably best to break out in a separate PR. Otherwise it will be hard to see the changes. WDYT?

tsl/src/chunk.c

erimatnor · 2025-05-23T14:46:12Z

tsl/src/chunk.c

+				if (rewrite_heap_dead_tuple(rws->rwstate, tuple))
+				{
+					/* A previous recently-dead tuple is now known dead */
+					tups_vacuumed += 1;


Not exactly sure how to get this covered by a test. This code is modeled on the cluster code in PG, so should be correct.

I might try an isolation test. Probably one where there are concurrent transactions that the finish before it gets here.

mkindahl

Generally looks good, but some minor nits and a few requests about test coverage.

tsl/src/chunk_api.c

tsl/src/chunk.c

mkindahl · 2025-05-27T09:33:26Z

tsl/src/chunk.c

+static int
+route_tuple(TupleTableSlot *slot, const SplitPoint *sp)
+{
+	bool isnull = false;


Nit: since you're using it as an output parameter below, this assignment is not needed.

tsl/src/chunk.c

tsl/test/expected/split_chunk.out

mkindahl · 2025-05-27T11:54:58Z

tsl/test/expected/split_chunk.out

I'm not sure which of these combinations you have tests for, but please make sure you have for all of:

Fully compressed to two fully compressed

Partially compressed to only uncompressed plus only compressed (both combinations: moving compressed data to new chunk and moving uncompressed data to new chunk).

Partially compressed to two partially compressed (both uncompressed and compressed data in both chunks).

All of 1-3 should be covered now. There are also test for several other combinations, for example:

Split point within segment

Split point at min of segment

Split point at max of segment

mkindahl · 2025-05-27T11:57:20Z

tsl/test/expected/split_chunk.out

+-- Split compressed chunk in a way that leaves no compressed data in
+-- one of the chunks


Are you testing both combinations? That is, moving (all) compressed data to the new chunk and moving (all) uncompressed data to the new chunk.

There are tests to split so that one chunk gets all compressed data and the other chunk gets non-compressed data. But I am not sure the same test exists for both "left" and "right"... Do you think that is necessary? The test is already quite big and hard to read so not convinced of the value of adding this since it is more or less covered by all other combos.

Add support for splitting compressed chunks when calling split_chunk(). Only 2-way splits are supported, as before. When splitting a compressed chunk, the same rewrite-approach is taken as when splitting a non-compressed chunk. The main difference is that the internal compressed relation is split in a separate step, similar to the non-compressed relation. One complication is that the split might end up in the middle of a compressed segment, in which case the segment needs to be split as well. This is done by decompressing the segment in memory and creating two new segments that get written to the resulting split relations. Since the non-compressed and compressed relations are split separetely, the non-compressed data remains non-compressed after the split. Special care is also taken to handle compression stats. In particular, instead of computing new stats, the existing stats are divided across the result relations depending on the amount of data received. This means that the sum of the stats after a split is the same as before the split. Computing new stats is not feasible given that the current approach requires knowing the pre-compressed relation sizes, which is not possible without first decompressing everything.

## 2.21.0 (2025-07-08) This release contains performance improvements and bug fixes since the 2.20.3 release. We recommend that you upgrade at the next available opportunity. **Highlighted features in TimescaleDB v2.21.0** * The attach & detach chunks feature allows manually adding or removing chunks from a hypertable with uncompressed chunks, similar to PostgreSQL’s partition management. * Continued improvement of backfilling into the columnstore, achieving up to 2.5x speedup for constrained tables, by introducing caching logic that boosts throughput for writes to compressed chunks, bringing `INSERT` performance close to that of uncompressed chunks. * Optimized `DELETE` operations on the columstore through batch-level deletions of non-segmentby keys in the filter condition, greatly improving performance to up to 42x faster in some cases, as well as reducing bloat, and lowering resource usage. * The heavy lock taken in Continuous Aggregate refresh was relaxed, enabling concurrent refreshes for non-overlapping ranges and eliminating the need for complex customer workarounds. * [tech preview] Direct Compress is an innovative TimescaleDB feature that improves high-volume data ingestion by compressing data in memory and writing it directly to disk, reducing I/O overhead, eliminating dependency on background compression jobs, and significantly boosting insert performance. **Sunsetting of the hypercore access method** We made the decision to deprecate hypercore access method (TAM) with the 2.21.0 release. It was an experiment, which did not show the signals we hoped for and will be sunsetted in TimescaleDB 2.22.0, scheduled for September 2025. Upgrading to 2.22.0 and higher will be blocked if TAM is still in use. Since TAM’s inception in [2.18.0](https://github.com/timescale/timescaledb/releases/tag/2.18.0), we learned that btrees were not the right architecture. The recent advancements in the columnstore—such as more performant backfilling, SkipScan, adding check constraints, and faster point queries—put the [columnstore](https://www.timescale.com/blog/hypercore-a-hybrid-row-storage-engine-for-real-time-analytics) close to or on par with TAM without the storage from the additional index. We apologize for the inconvenience this action potentially causes and are here to assist you during the migration process. Migration path ``` do $$ declare relid regclass; begin for relid in select cl.oid from pg_class cl join pg_am am on (am.oid = cl.relam) where am.amname = 'hypercore' loop raise notice 'converting % to heap', relid::regclass; execute format('alter table %s set access method heap', relid); end loop; end $$; ``` **Features** * [#8081](#8081) Use JSON error code for job configuration parsing * [#8100](#8100) Support splitting compressed chunks * [#8131](#8131) Add policy to process hypertable invalidations * [#8141](#8141) Add function to process hypertable invalidations * [#8165](#8165) Reindex recompressed chunks in compression policy * [#8178](#8178) Add columnstore option to `CREATE TABLE WITH` * [#8179](#8179) Implement direct `DELETE` on non-segmentby columns * [#8182](#8182) Cache information for repeated upserts into the same compressed chunk * [#8187](#8187) Allow concurrent Continuous Aggregate refreshes * [#8191](#8191) Add option to not process hypertable invalidations * [#8196](#8196) Show deprecation warning for TAM * [#8208](#8208) Use `NULL` compression for bool batches with all null values like the other compression algorithms * [#8223](#8223) Support for attach/detach chunk * [#8265](#8265) Set incremental Continous Aggregate refresh policy on by default * [#8274](#8274) Allow creating concurrent continuous aggregate refresh policies * [#8314](#8314) Add support for timescaledb_lake in loader * [#8209](#8209) Add experimental support for Direct Compress of `COPY` * [#8341](#8341) Allow quick migration from hypercore TAM to (columnstore) heap **Bugfixes** * [#8153](#8153) Restoring a database having NULL compressed data * [#8164](#8164) Check columns when creating new chunk from table * [#8294](#8294) The "vectorized predicate called for a null value" error for WHERE conditions like `x = any(null::int[])`. * [#8307](#8307) Fix missing catalog entries for bool and null compression in fresh installations * [#8323](#8323) Fix DML issue with expression indexes and BHS **GUCs** * `enable_direct_compress_copy`: Enable experimental support for direct compression during `COPY`, default: off * `enable_direct_compress_copy_sort_batches`: Enable batch sorting during direct compress `COPY`, default: on * `enable_direct_compress_copy_client_sorted`: Correct handling of data sorting by the user is required for this option, default: off --------- Signed-off-by: Philip Krauss <[email protected]> Co-authored-by: philkra <[email protected]> Co-authored-by: philkra <[email protected]> Co-authored-by: Fabrízio de Royes Mello <[email protected]> Co-authored-by: Anastasiia Tovpeko <[email protected]>

github-actions bot assigned erimatnor May 9, 2025

erimatnor force-pushed the split-chunk-compressed branch 6 times, most recently from d31d571 to 3f49c47 Compare May 21, 2025 07:50

erimatnor changed the title ~~Split compressed chunks~~ Support splitting compressed chunks May 21, 2025

erimatnor force-pushed the split-chunk-compressed branch 5 times, most recently from 013df47 to 26baf56 Compare May 21, 2025 10:07

erimatnor added the split/merge label May 21, 2025

erimatnor force-pushed the split-chunk-compressed branch 3 times, most recently from daac8de to 5f66725 Compare May 22, 2025 13:01

erimatnor force-pushed the split-chunk-compressed branch 7 times, most recently from 558bdb5 to fcba4c4 Compare May 23, 2025 08:51

erimatnor marked this pull request as ready for review May 23, 2025 09:31

erimatnor requested review from Copilot, fabriziomello, melihmutlu and mkindahl May 23, 2025 09:31

Copilot AI reviewed May 23, 2025

View reviewed changes

tsl/test/sql/split_chunk.sql Outdated Show resolved Hide resolved

akuzm reviewed May 23, 2025

View reviewed changes

tsl/src/chunk.c Show resolved Hide resolved

akuzm reviewed May 23, 2025

View reviewed changes

tsl/src/chunk.c Show resolved Hide resolved

akuzm reviewed May 23, 2025

View reviewed changes

tsl/src/chunk.c Show resolved Hide resolved

akuzm reviewed May 23, 2025

View reviewed changes

tsl/src/chunk.c Outdated Show resolved Hide resolved

erimatnor force-pushed the split-chunk-compressed branch 4 times, most recently from 7384aed to 06ccd68 Compare May 23, 2025 13:29

erimatnor commented May 23, 2025

View reviewed changes

philkra added this to the v2.21.0 milestone May 26, 2025

erimatnor requested a review from akuzm May 26, 2025 10:38

mkindahl reviewed May 27, 2025

View reviewed changes

erimatnor force-pushed the split-chunk-compressed branch 5 times, most recently from 42777c5 to c59b4b8 Compare June 13, 2025 12:44

erimatnor requested a review from mkindahl June 13, 2025 12:45

mkindahl approved these changes Jun 23, 2025

View reviewed changes

natalya-aksman approved these changes Jun 26, 2025

View reviewed changes

erimatnor force-pushed the split-chunk-compressed branch from c59b4b8 to 26b0c5d Compare June 26, 2025 14:33

erimatnor merged commit eb093e9 into timescale:main Jun 26, 2025
41 checks passed

philkra mentioned this pull request Jul 2, 2025

Release 2.21.0 #8325

Merged

timescale-automation added the released-2.21.0 Released in 2.21.0 label Jul 8, 2025

		-- Split compressed chunk in a way that leaves no compressed data in
		-- one of the chunks

Support splitting compressed chunks #8100

Support splitting compressed chunks #8100

Uh oh!

Conversation

erimatnor commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mkindahl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

erimatnor commented May 9, 2025 •

edited

Loading

codecov bot commented May 22, 2025 •

edited

Loading