Skip to content

Heartbeat_agg rollup gives inconsistent results. #731

@WireBaron

Description

@WireBaron

While combining several heartbeat aggs together using the rollup function I noticed that I get a different result if I first do a rollup on a different field and then roll the results together. Rollup for heartbeat agg should always give the same result for the same base data, regardless of the intermediate steps.

The data for this query is from the SustData data set, I've constructed the aggregates like so:

CREATE MATERIALIZED VIEW weekly_heartbeat AS
  SELECT 
    time_bucket('1 week', tmstp) as week,
    iid as unit,
    deploy,
    heartbeat_agg(tmstp, time_bucket('1w', tmstp), '1w', '2m') 
  FROM power_samples
  GROUP BY 1,2,3;

Here are the rollup queries I ran and the results I got:

with week_agg as (select deploy, week, rollup(heartbeat_agg) as agg from weekly_heartbeat group by 1, 2)
select deploy, live_ranges(rollup(agg)) from week_agg group by deploy order by deploy;
 deploy |                     live_ranges                     
--------+-----------------------------------------------------
      1 | ("2010-07-29 00:00:00+00","2010-11-26 00:01:00+00")
      2 | ("2010-11-25 00:00:00+00","2011-03-27 01:01:59+00")
      2 | ("2011-03-27 03:00:00+00","2012-03-25 01:01:59+00")
      2 | ("2012-03-25 03:00:26+00","2012-04-17 00:01:00+00")
      2 | ("2012-04-20 00:00:00+00","2012-04-21 00:01:00+00")
      2 | ("2012-05-11 00:00:00+00","2012-05-13 00:01:00+00")
      2 | ("2013-02-20 00:00:00+00","2013-02-21 00:01:00+00")
      3 | ("2012-08-01 00:00:01+00","2013-03-31 01:01:16+00")
      3 | ("2013-03-31 03:00:03+00","2013-05-22 00:01:00+00")
      4 | ("2013-07-31 00:00:00+00","2014-03-30 01:01:49+00")
      4 | ("2014-03-30 03:00:01+00","2014-04-25 00:01:00+00")

with unit_agg as (select deploy, unit, rollup(heartbeat_agg) as agg from weekly_heartbeat group by 1,2)
select deploy, live_ranges(rollup(agg)) from unit_agg group by 1 order by 1;
 deploy |                     live_ranges                     
--------+-----------------------------------------------------
      1 | ("2010-07-29 00:00:00+00","2010-08-01 00:01:00+00")
      1 | ("2010-08-02 00:00:00+00","2010-11-26 00:01:00+00")
      2 | ("2010-11-25 00:00:00+00","2011-03-27 01:01:59+00")
      2 | ("2011-03-27 03:00:00+00","2012-03-25 01:01:59+00")
      2 | ("2012-03-25 03:00:26+00","2012-04-17 00:01:00+00")
      2 | ("2012-04-20 00:00:00+00","2012-04-21 00:01:00+00")
      2 | ("2012-05-11 00:00:00+00","2012-05-13 00:01:00+00")
      2 | ("2013-02-20 00:00:00+00","2013-02-21 00:01:00+00")
      3 | ("2012-08-01 00:00:01+00","2013-03-31 01:01:16+00")
      3 | ("2013-03-31 03:00:03+00","2013-05-26 00:01:00+00")
      4 | ("2013-07-31 00:00:00+00","2014-03-30 01:01:49+00")
      4 | ("2014-03-30 03:00:01+00","2014-04-25 00:01:00+00")

select deploy, live_ranges(rollup(heartbeat_agg)) from weekly_heartbeat group by deploy order by deploy;
 deploy |                     live_ranges                     
--------+-----------------------------------------------------
      1 | ("2010-07-29 00:00:00+00","2010-09-13 00:01:40+00")
      1 | ("2010-09-27 00:00:00+00","2010-11-15 00:01:00+00")
      1 | ("2010-11-22 00:00:41+00","2010-11-26 00:01:00+00")
      2 | ("2010-11-25 00:00:00+00","2011-03-27 01:01:59+00")
      2 | ("2011-03-27 03:00:00+00","2012-03-25 01:01:59+00")
      2 | ("2012-03-25 03:00:26+00","2012-04-17 00:01:00+00")
      2 | ("2012-04-20 00:00:00+00","2012-04-21 00:01:00+00")
      2 | ("2012-05-11 00:00:00+00","2012-05-13 00:01:00+00")
      2 | ("2013-02-20 00:00:00+00","2013-02-21 00:01:00+00")
      3 | ("2012-08-01 00:00:01+00","2013-03-31 01:01:16+00")
      3 | ("2013-03-31 03:00:03+00","2013-04-15 00:01:09+00")
      3 | ("2013-05-06 00:00:05+00","2013-05-13 00:01:25+00")
      3 | ("2013-05-20 00:00:04+00","2013-05-22 00:01:00+00")
      4 | ("2013-07-31 00:00:00+00","2014-03-30 01:01:49+00")
      4 | ("2014-03-30 03:00:01+00","2014-04-07 00:01:01+00")

As can be seen, there are some missing ranges when simply rolling all the data together than aren't present if the overlapping ranges are first rolled together. There's an even different result in one of the cases when first rolling together the aggregates for a unit and then rolling up by week.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions