Skip to content

interpolated_average can't pull values through empty buckets  #548

@emanuel-joos

Description

@emanuel-joos

Relevant system information:

  • OS: Timescale Managed Service Azure
  • PostgreSQL version: 14.4
  • TimescaleDB Toolkit version: 1.10.0
  • Installation method: Timescale Managed Service Azure

Describe the bug
Time weighted average fails if buckets without values exist. It just ignores buckets without data (output has less rows than expected) and for buckets with data that are adjacent to buckets without data it computes wrong averages.

To Reproduce

  • Input data:
ts, signal_id, value
('2019-12-31T23:10:00', 8, 10);
('2020-01-01T03:10:00', 8, 20);
('2020-01-01T03:50:00', 8, 40);
('2020-01-01T00:00:00', 9, 10);
('2020-01-01T03:10:00', 9, 20);
('2020-01-01T03:50:00', 9, 40);
  • Expected output (LOCF, no linear interpolation):
ts, signal_id, value
('2020-01-01T01:00:00', 8, 10)
('2020-01-01T02:00:00', 8, 10)
('2020-01-01T03:00:00', 8,10)
('2020-01-01T04:00:00', 8,21.66666)
('2020-01-01T05:00:00', 8,40)
('2020-01-01T01:00:00', 9, 10)
('2020-01-01T02:00:00', 9, 10)
('2020-01-01T03:00:00', 9,10)
('2020-01-01T04:00:00', 9,21.66666)
('2020-01-01T05:00:00', 9,40)
  • Query used
SELECT
    signal_id,
    time_value,
    toolkit_experimental.interpolated_average(
        tws,
        time_value,
        '3600 s',
        LAG(tws) OVER (Partition By signal_id ORDER BY time_value),
        LEAD(tws) OVER (Partition By signal_id ORDER BY time_value)
    )
    FROM (
    SELECT
        signal_id,
        time_bucket('3600 s', ts) AS time_value,
        time_weight('LOCF', ts, measurement_value) AS tws
    FROM measurements
    Where ts BETWEEN '2020-01-01T00:00:00' AND '2020-01-01T05:00:00'
    And signal_id = 9    
    GROUP BY signal_id, time_value
    ) t
  • Actual output:
('2020-01-01T01:00:00', 9, 10)
('2020-01-01T03:00:00', 9, 18)
('2020-01-01T01:00:00', 8, 10)
('2020-01-01T03:00:00', 8, 18)

Workaround
with:

  • datetime_from = '2020-01-01T00:00:00'
  • datetime_to = '2020-01-01T05:00:00'
  • signal_id = 9
  • resampled_interval = 3600
  • fall_back_time = datetime_from - 30 Days
with t1 as (
    SELECT signal_id, ts, measurement_value
    From measurements
    Where ts >= '{datetime_from}' AND ts < '{datetime_to}' And signal_id in ({signals})
),
t2 as (SELECT
    time_bucket_gapfill('{resampled_interval} s', ts) AS time_value_gap_fill,
    signal_id,
    locf(last(measurement_value,ts)) as locf_value,
    last(measurement_value, ts) as indicator_empty_bucket
    From measurements
    Where ts BETWEEN '{fall_back_time}' AND '{datetime_to}' and signal_id in ({signals})
    GROUP BY signal_id, time_value_gap_fill),
t3 as (
    select signal_id, time_value_gap_fill as ts, locf_value as measurement_value
    from t2 where indicator_empty_bucket is Null
    and time_value_gap_fill >= '{datetime_from-datetime.timedelta(seconds=resampled_interval)}'
),
t4 as (
select * from t3 union all select * from t1 order by ts),
t5 as (
    SELECT signal_id, time_bucket('{resampled_interval} s', ts) AS interval_label ,
     time_weight('LOCF', ts, measurement_value) AS tws
     FROM t4
    GROUP BY signal_id, interval_label
),
t6 as (
    SELECT interval_label, signal_id,
        toolkit_experimental.interpolated_average(
            tws,
            interval_label,
            '{resampled_interval} s',
            LAG(tws) OVER (Partition By signal_id ORDER BY interval_label),
            LEAD(tws) OVER (Partition By signal_id ORDER BY interval_label)) as time_weighted_average
            from t5
)
Select interval_label +'{resampled_interval} s' ,
 signal_id,
 time_weighted_average from t6 where interval_label < '{datetime_to}'
  and interval_label > '{datetime_from-datetime.timedelta(seconds=resampled_interval)}';

This should work out of the box. The workaround query also works for multiple signals.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions