Skip to content

Conversation

@erimatnor
Copy link
Member

@erimatnor erimatnor commented May 9, 2025

Add support for splitting compressed chunks when calling split_chunk(). Only 2-way splits are supported, as before.

When splitting a compressed chunk, the same rewrite-approach is taken as when splitting a non-compressed chunk. The main difference is that the internal compressed relation is split in a separate step, similar to the non-compressed relation. One complication is that the split might end up in the middle of a compressed segment, in which case the
segment needs to be split as well. This is done by decompressing the segment in memory and creating two new segments that get written to the resulting split relations. Since the non-compressed and compressed
relations are split separetely, the non-compressed data remains non-compressed after the split.

Special care is also taken to handle compression stats. In particular, instead of computing new stats, the existing stats are divided across the result relations depending on the amount of data received. This means that the sum of the stats after a split is the same as before the split. Computing new stats is not feasible given that the current approach requires knowing the pre-compressed relation sizes, which is not possible without first decompressing everything.

@erimatnor erimatnor force-pushed the split-chunk-compressed branch 6 times, most recently from d31d571 to 3f49c47 Compare May 21, 2025 07:50
@erimatnor erimatnor changed the title Split compressed chunks Support splitting compressed chunks May 21, 2025
@erimatnor erimatnor force-pushed the split-chunk-compressed branch 5 times, most recently from 013df47 to 26baf56 Compare May 21, 2025 10:07
@erimatnor erimatnor force-pushed the split-chunk-compressed branch 3 times, most recently from daac8de to 5f66725 Compare May 22, 2025 13:01
@codecov
Copy link

codecov bot commented May 22, 2025

Codecov Report

Attention: Patch coverage is 91.20879% with 32 lines in your changes missing coverage. Please review.

Project coverage is 81.97%. Comparing base (5a51693) to head (26b0c5d).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
tsl/src/chunk.c 91.34% 6 Missing and 23 partials ⚠️
tsl/src/compression/compression.c 88.23% 0 Missing and 2 partials ⚠️
src/chunk.c 91.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8100      +/-   ##
==========================================
- Coverage   82.16%   81.97%   -0.19%     
==========================================
  Files         257      257              
  Lines       48323    48513     +190     
  Branches    12189    12221      +32     
==========================================
+ Hits        39704    39768      +64     
- Misses       3741     3920     +179     
+ Partials     4878     4825      -53     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@erimatnor erimatnor force-pushed the split-chunk-compressed branch 7 times, most recently from 558bdb5 to fcba4c4 Compare May 23, 2025 08:51
@erimatnor erimatnor marked this pull request as ready for review May 23, 2025 09:31
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds support for splitting compressed chunks by extending split_chunk() test coverage and updating internal APIs to propagate access methods.

  • Introduce split_chunk_validate() PL/pgSQL procedure and update existing tests to verify both row counts and compression stats.
  • Add row_compressor_append_ordered_slot() helper in the compression subsystem and refine tuple building context.
  • Extend chunk-creation APIs (ts_chunk_create_table, related helpers) to accept an access-method OID, defaulting to InvalidOid when unchanged.

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tsl/test/sql/split_chunk.sql New validation proc and adjusted split tests for compressed chunks
tsl/src/compression/compression.h Declare row_compressor_append_ordered_slot
tsl/src/compression/compression.c Implement ordered-slot compressor and tidy up tuple building
tsl/src/chunk_api.c Pass InvalidOid for AM OID in chunk_create
src/chunk.h Extend ts_chunk_create_table and ts_chunk_find_or_create_without_cuts with amoid
src/chunk.c Propagate AM OID through ts_chunk_create_table and related calls
.unreleased/pr_8100 Add release note for feature #8100
Comments suppressed due to low confidence (1)

tsl/test/sql/split_chunk.sql:81

  • Remove the stray trailing 'and' before the THEN on the IF condition (lines 79–81) so the syntax reads IF cond1 AND cond2 THEN rather than ending with an extra 'and'.
        then

*
* Main state for doing a split.
*/
typedef struct SplitContext
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be a good idea to put all split-related things into a separate file? Seems pretty big and independent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is growing big. But it is probably best to break out in a separate PR. Otherwise it will be hard to see the changes. WDYT?

@erimatnor erimatnor force-pushed the split-chunk-compressed branch 4 times, most recently from 7384aed to 06ccd68 Compare May 23, 2025 13:29
if (rewrite_heap_dead_tuple(rws->rwstate, tuple))
{
/* A previous recently-dead tuple is now known dead */
tups_vacuumed += 1;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly sure how to get this covered by a test. This code is modeled on the cluster code in PG, so should be correct.

I might try an isolation test. Probably one where there are concurrent transactions that the finish before it gets here.

@philkra philkra added this to the v2.21.0 milestone May 26, 2025
@erimatnor erimatnor requested a review from akuzm May 26, 2025 10:38
Copy link
Contributor

@mkindahl mkindahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, but some minor nits and a few requests about test coverage.

static int
route_tuple(TupleTableSlot *slot, const SplitPoint *sp)
{
bool isnull = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: since you're using it as an output parameter below, this assignment is not needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure which of these combinations you have tests for, but please make sure you have for all of:

  1. Fully compressed to two fully compressed
  2. Partially compressed to only uncompressed plus only compressed (both combinations: moving compressed data to new chunk and moving uncompressed data to new chunk).
  3. Partially compressed to two partially compressed (both uncompressed and compressed data in both chunks).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of 1-3 should be covered now. There are also test for several other combinations, for example:

  • Split point within segment
  • Split point at min of segment
  • Split point at max of segment

Comment on lines +807 to +897
-- Split compressed chunk in a way that leaves no compressed data in
-- one of the chunks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you testing both combinations? That is, moving (all) compressed data to the new chunk and moving (all) uncompressed data to the new chunk.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are tests to split so that one chunk gets all compressed data and the other chunk gets non-compressed data. But I am not sure the same test exists for both "left" and "right"... Do you think that is necessary? The test is already quite big and hard to read so not convinced of the value of adding this since it is more or less covered by all other combos.

@erimatnor erimatnor force-pushed the split-chunk-compressed branch 5 times, most recently from 42777c5 to c59b4b8 Compare June 13, 2025 12:44
@erimatnor erimatnor requested a review from mkindahl June 13, 2025 12:45
Add support for splitting compressed chunks when calling
split_chunk(). Only 2-way splits are supported, as before.

When splitting a compressed chunk, the same rewrite-approach is taken
as when splitting a non-compressed chunk. The main difference is that
the internal compressed relation is split in a separate step, similar
to the non-compressed relation. One complication is that the split
might end up in the middle of a compressed segment, in which case the
segment needs to be split as well. This is done by decompressing the
segment in memory and creating two new segments that get written to
the resulting split relations. Since the non-compressed and compressed
relations are split separetely, the non-compressed data remains
non-compressed after the split.

Special care is also taken to handle compression stats. In particular,
instead of computing new stats, the existing stats are divided across
the result relations depending on the amount of data received. This
means that the sum of the stats after a split is the same as before
the split. Computing new stats is not feasible given that the current
approach requires knowing the pre-compressed relation sizes, which is
not possible without first decompressing everything.
@erimatnor erimatnor force-pushed the split-chunk-compressed branch from c59b4b8 to 26b0c5d Compare June 26, 2025 14:33
@erimatnor erimatnor merged commit eb093e9 into timescale:main Jun 26, 2025
41 checks passed
@philkra philkra mentioned this pull request Jul 2, 2025
philkra added a commit that referenced this pull request Jul 8, 2025
## 2.21.0 (2025-07-08)

This release contains performance improvements and bug fixes since the
2.20.3 release. We recommend that you upgrade at the next available
opportunity.

**Highlighted features in TimescaleDB v2.21.0**
* The attach & detach chunks feature allows manually adding or removing
chunks from a hypertable with uncompressed chunks, similar to
PostgreSQL’s partition management.
* Continued improvement of backfilling into the columnstore, achieving
up to 2.5x speedup for constrained tables, by introducing caching logic
that boosts throughput for writes to compressed chunks, bringing
`INSERT` performance close to that of uncompressed chunks.
* Optimized `DELETE` operations on the columstore through batch-level
deletions of non-segmentby keys in the filter condition, greatly
improving performance to up to 42x faster in some cases, as well as
reducing bloat, and lowering resource usage.
* The heavy lock taken in Continuous Aggregate refresh was relaxed,
enabling concurrent refreshes for non-overlapping ranges and eliminating
the need for complex customer workarounds.
* [tech preview] Direct Compress is an innovative TimescaleDB feature
that improves high-volume data ingestion by compressing data in memory
and writing it directly to disk, reducing I/O overhead, eliminating
dependency on background compression jobs, and significantly boosting
insert performance.

**Sunsetting of the hypercore access method**
We made the decision to deprecate hypercore access method (TAM) with the
2.21.0 release. It was an experiment, which did not show the signals we
hoped for and will be sunsetted in TimescaleDB 2.22.0, scheduled for
September 2025. Upgrading to 2.22.0 and higher will be blocked if TAM is
still in use. Since TAM’s inception in
[2.18.0](https://github.com/timescale/timescaledb/releases/tag/2.18.0),
we learned that btrees were not the right architecture. The recent
advancements in the columnstore—such as more performant backfilling,
SkipScan, adding check constraints, and faster point queries—put the
[columnstore](https://www.timescale.com/blog/hypercore-a-hybrid-row-storage-engine-for-real-time-analytics)
close to or on par with TAM without the storage from the additional
index. We apologize for the inconvenience this action potentially causes
and are here to assist you during the migration process.

Migration path

```
do $$
declare   
   relid regclass;
begin
   for relid in
       select cl.oid from pg_class cl
       join pg_am am on (am.oid = cl.relam)
       where am.amname = 'hypercore'
   loop
       raise notice 'converting % to heap', relid::regclass;
       execute format('alter table %s set access method heap', relid);
   end loop;
end
$$;
```

**Features**
* [#8081](#8081) Use JSON
error code for job configuration parsing
* [#8100](#8100) Support
splitting compressed chunks
* [#8131](#8131) Add policy
to process hypertable invalidations
* [#8141](#8141) Add
function to process hypertable invalidations
* [#8165](#8165) Reindex
recompressed chunks in compression policy
* [#8178](#8178) Add
columnstore option to `CREATE TABLE WITH`
* [#8179](#8179) Implement
direct `DELETE` on non-segmentby columns
* [#8182](#8182) Cache
information for repeated upserts into the same compressed chunk
* [#8187](#8187) Allow
concurrent Continuous Aggregate refreshes
* [#8191](#8191) Add option
to not process hypertable invalidations
* [#8196](#8196) Show
deprecation warning for TAM
* [#8208](#8208) Use `NULL`
compression for bool batches with all null values like the other
compression algorithms
* [#8223](#8223) Support
for attach/detach chunk
* [#8265](#8265) Set
incremental Continous Aggregate refresh policy on by default
* [#8274](#8274) Allow
creating concurrent continuous aggregate refresh policies
* [#8314](#8314) Add
support for timescaledb_lake in loader
* [#8209](#8209) Add
experimental support for Direct Compress of `COPY`
* [#8341](#8341) Allow
quick migration from hypercore TAM to (columnstore) heap

**Bugfixes**
* [#8153](#8153) Restoring
a database having NULL compressed data
* [#8164](#8164) Check
columns when creating new chunk from table
* [#8294](#8294) The
"vectorized predicate called for a null value" error for WHERE
conditions like `x = any(null::int[])`.
* [#8307](#8307) Fix
missing catalog entries for bool and null compression in fresh
installations
* [#8323](#8323) Fix DML
issue with expression indexes and BHS

**GUCs**
* `enable_direct_compress_copy`: Enable experimental support for direct
compression during `COPY`, default: off
* `enable_direct_compress_copy_sort_batches`: Enable batch sorting
during direct compress `COPY`, default: on
* `enable_direct_compress_copy_client_sorted`: Correct handling of data
sorting by the user is required for this option, default: off

---------

Signed-off-by: Philip Krauss <[email protected]>
Co-authored-by: philkra <[email protected]>
Co-authored-by: philkra <[email protected]>
Co-authored-by: Fabrízio de Royes Mello <[email protected]>
Co-authored-by: Anastasiia Tovpeko <[email protected]>
@timescale-automation timescale-automation added the released-2.21.0 Released in 2.21.0 label Jul 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants