Skip to content

Expose TDigest type so it can be populated on the client and then inserted into a Timescale table #485

@oliora

Description

@oliora

Is your feature request related to a problem? Please describe.
I'm aggregating metrics on the client into a histogram and I want to store this aggregated data in TimescaleDB and be able to use continuous aggregates on this data to downsample and analyze it with the help of TimescaleDB hyperfunctions. I have a control over how client aggregates the data the only limitation is that it has to be aggregated. Applications that need to store data in such a way are written in C++ and Python.

Describe the solution you'd like
I'd like to aggregate data on the client into some structure that I can later insert into TimescaleDB table as a TDigest value. Then I can use all the TimescaleDB functions that work over TDigest type (e.g. rollup, approx_percentile etc). This also unifies the approaches between storing aggregated and non-aggregated data in TimescaleDB.

The histogram collected on the client currently has min, max, sum, count values and a set of buckets with counters and I suspect that it's pretty close to TDigest format already.

It would be great if TimescaleDB library exposes C-API to work with TDigest objects (create, update with new data samples etc) and allows to insert the final value to TimescaleDB.

Describe alternatives you've considered
I've considered two alternatives:

  1. Collect metrics via Promscale. This is a viable alternative, but Promscale histogram has a limited use with continuous aggregates and has other limitations coming from the Prometheus metrics format.
  2. Store non-aggregated data in TimescaleDB and attach a materialized view with continuos aggregate. This works for some of my use cases but not for all. In some cases it is not possible to transfer all the non-aggregated data to TimescaleDB due to it's amount and/or limited connectivity.

Update: The part of being able to insert TDigest object from the client is the most important here because I can implement the calculation part myself.

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature-requestAnd area of analysis that could be made easier

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions