Skip to content

[WIP] Timeseries Aggregation #2

@JLockerman

Description

@JLockerman

What's the functionality you would like to add

A generic interface to aggregate of (Time, Value) pairs and manipulate such timeseries.

How would the function be used

SELECT * FROM data;
          time          | value 
------------------------+-------
 2019-01-01 00:01:00-05 |     1
 2019-01-01 00:01:30-05 |     2
 2019-01-01 00:02:00-05 |     2
 2019-01-01 00:02:30-05 |     2
 2019-01-01 00:03:00-05 |     3
 2019-01-01 00:04:00-05 |      
 2019-01-01 00:04:30-05 |      
 2019-01-01 00:05:00-05 |      
 2019-01-01 00:05:30-05 |      
 2019-01-01 00:06:00-05 |     6
 2019-01-01 00:06:30-05 |     6
(11 rows)

-- a timeseries is sorted set of (time, value) pairs
SELECT timeseries(time, value, sample('1 minute') |> interpolate()) FROM data;
                                                                                                    timeseries
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 (time, double precision) ("2019-01-01 00:01:00-05",1), ("2019-01-01 00:01:30-05",2), ("2019-01-01 00:02:00-05",2), ("2019-01-01 00:02:30-05",2), ("2019-01-01 00:03:00-05",3), ("2019-01-01 00:04:00-05", NULL), ("2019-01-01 00:04:30-05", NULL), ("2019-01-01 00:05:00-05", NULL), ("2019-01-01 00:05:30-05", NULL), ("2019-01-01 00:06:00-05",6), ("2019-01-01 00:06:30-05",6)
(1 row)

-- it can take a pipeline of transformations to be applied to the underlying data
-- enabling transformations such as bucketing, interpolation and others
SELECT timeseries(time, value, sample('1 minute') |> interpolate()) FROM data;
                                                                                                    timeseries
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 (time, double precision) ("2019-01-01 00:01:00-05", 1), ("2019-01-01 00:02:00-05",2), ("2019-01-01 00:03:00-05",3), ("2019-01-01 00:04:00-05",3.6666666666666665), ("2019-01-01 00:05:00-05",4.666666666666666), ("2019-01-01 00:06:00-05",6)
(1 row)

Is your feature request related to a problem? A wishlist item Please describe.

TODO expand other wishlist items this relates to

Open Questions

  1. What, if anything, should be the pipeline syntax?

  2. We have multiple ways we can relate the pipeline to the aggregate, on is that the aggreagate takes in the pipline as an argument (shown above) this allows us the most flexibility in execution, and may allow us incremental execution in some instances. An alternative is that the pipeline takes in the aggregate as in

    timeseries(time, value) |> sample('1 minute') |> interpolate()

    this may be a more natural syntax, expecially with output aggregations as in

    timeseries(time, value) |> sample('1 minute') |> interpolate()

    in the current syntax that would look something like

    timeseries(time, value, sample('1 minute') |> interpolate())
  3. Ordering: to what extent should we force the input to be ordered, versus sort internally to the aggregation?

  4. Should we have separate types for regular and irregular timeseries?

Metadata

Metadata

Assignees

No one assigned

    Labels

    proposed-featureA proposed feature or function to ease a task

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions