-
Notifications
You must be signed in to change notification settings - Fork 62
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Relevant system information:
- OS: Timescale Managed Service Azure
- PostgreSQL version: 14.4
- TimescaleDB Toolkit version: 1.10.0
- Installation method: Timescale Managed Service Azure
Describe the bug
Time weighted average fails if buckets without values exist. It just ignores buckets without data (output has less rows than expected) and for buckets with data that are adjacent to buckets without data it computes wrong averages.
To Reproduce
- Input data:
ts, signal_id, value
('2019-12-31T23:10:00', 8, 10);
('2020-01-01T03:10:00', 8, 20);
('2020-01-01T03:50:00', 8, 40);
('2020-01-01T00:00:00', 9, 10);
('2020-01-01T03:10:00', 9, 20);
('2020-01-01T03:50:00', 9, 40);
- Expected output (LOCF, no linear interpolation):
ts, signal_id, value
('2020-01-01T01:00:00', 8, 10)
('2020-01-01T02:00:00', 8, 10)
('2020-01-01T03:00:00', 8,10)
('2020-01-01T04:00:00', 8,21.66666)
('2020-01-01T05:00:00', 8,40)
('2020-01-01T01:00:00', 9, 10)
('2020-01-01T02:00:00', 9, 10)
('2020-01-01T03:00:00', 9,10)
('2020-01-01T04:00:00', 9,21.66666)
('2020-01-01T05:00:00', 9,40)
- Query used
SELECT
signal_id,
time_value,
toolkit_experimental.interpolated_average(
tws,
time_value,
'3600 s',
LAG(tws) OVER (Partition By signal_id ORDER BY time_value),
LEAD(tws) OVER (Partition By signal_id ORDER BY time_value)
)
FROM (
SELECT
signal_id,
time_bucket('3600 s', ts) AS time_value,
time_weight('LOCF', ts, measurement_value) AS tws
FROM measurements
Where ts BETWEEN '2020-01-01T00:00:00' AND '2020-01-01T05:00:00'
And signal_id = 9
GROUP BY signal_id, time_value
) t
- Actual output:
('2020-01-01T01:00:00', 9, 10)
('2020-01-01T03:00:00', 9, 18)
('2020-01-01T01:00:00', 8, 10)
('2020-01-01T03:00:00', 8, 18)
Workaround
with:
- datetime_from = '2020-01-01T00:00:00'
- datetime_to = '2020-01-01T05:00:00'
- signal_id = 9
- resampled_interval = 3600
- fall_back_time = datetime_from - 30 Days
with t1 as (
SELECT signal_id, ts, measurement_value
From measurements
Where ts >= '{datetime_from}' AND ts < '{datetime_to}' And signal_id in ({signals})
),
t2 as (SELECT
time_bucket_gapfill('{resampled_interval} s', ts) AS time_value_gap_fill,
signal_id,
locf(last(measurement_value,ts)) as locf_value,
last(measurement_value, ts) as indicator_empty_bucket
From measurements
Where ts BETWEEN '{fall_back_time}' AND '{datetime_to}' and signal_id in ({signals})
GROUP BY signal_id, time_value_gap_fill),
t3 as (
select signal_id, time_value_gap_fill as ts, locf_value as measurement_value
from t2 where indicator_empty_bucket is Null
and time_value_gap_fill >= '{datetime_from-datetime.timedelta(seconds=resampled_interval)}'
),
t4 as (
select * from t3 union all select * from t1 order by ts),
t5 as (
SELECT signal_id, time_bucket('{resampled_interval} s', ts) AS interval_label ,
time_weight('LOCF', ts, measurement_value) AS tws
FROM t4
GROUP BY signal_id, interval_label
),
t6 as (
SELECT interval_label, signal_id,
toolkit_experimental.interpolated_average(
tws,
interval_label,
'{resampled_interval} s',
LAG(tws) OVER (Partition By signal_id ORDER BY interval_label),
LEAD(tws) OVER (Partition By signal_id ORDER BY interval_label)) as time_weighted_average
from t5
)
Select interval_label +'{resampled_interval} s' ,
signal_id,
time_weighted_average from t6 where interval_label < '{datetime_to}'
and interval_label > '{datetime_from-datetime.timedelta(seconds=resampled_interval)}';
This should work out of the box. The workaround query also works for multiple signals.
jmcarter17, jledentu, DonTomika and chrishop
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request