Add downsampling proposal #5028

wilfriedroset · 2023-05-16T19:47:26Z

What this PR does

Add a downsampling feature proposal.
The template used for this proposal is inspired by loki's LIDs: https://github.com/grafana/loki/tree/main/docs/sources/community/lids

Which issue(s) this PR fixes or relates to

Fixes NA

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Wilfried Roset <[email protected]>

Otterpohl · 2023-05-17T15:30:11Z

An interesting workaround posted by someone in slack (props to @adirmatzkin) was to treat the downsampled data like it was a different tenant.

Obviously this has its own drawbacks (most notably the double/triple/n x ingestion, depending on different levels of downsampling required). But the logic does fit in quite well with Mimir's current pattern (and would build on top of it as opposed to changing it (whether the maintainers prefer this, who knows)), since it keeps the data immutable and gives the user the option of having both downsampled and raw data, and lets existing compacting/pruning processes run as is. This has the limitation of downsampling at ingestion which is wasteful and not feasible in my eyes but this is more of a feeler for different types of solutions than a proposal for it to be an actual solution.

adirmatzkin · 2023-05-17T18:44:14Z

An interesting workaround posted by someone in slack (props to @adirmatzkin) was to treat the downsampled data like it was a different tenant.

Just added a more detailed comment there :)

wilfriedroset · 2023-05-29T08:29:00Z

@Otterpohl and @adirmatzkin thank you for sharing your tips.
As you stated they are workaround or DIY solution which can be enough for some cases but not all.
TBH, my proposal does not answer to all cases either.

I reckon that the fact this discussion is still ongoing ~1y later the first message in the discussion linked above demonstrate the need for a downsampling feature.

WDYT about the proposal?

adirmatzkin · 2023-05-30T07:01:05Z

Hey @wilfriedroset 👋
I enjoyed reading your proposal document!

I was thinking to my self after reading ~half way through -
The stated goal is "faster wide range queries", which I agree on and it makes sense considering the non-goals.
I was thinking maybe we shouldn't be so rigid in our thinking about the idea of downsampling to solve this problem and achieving this goal.
What I basically mean is that the proposals are mostly about doing downsampling - just in different ways and implementations.

My familiarity with the architecture, logic, and code of the compactor and query-fe isn't as comprehensive as I'd like it to be, so I might be wrong here.. let me know if I'm way off with my ideas.. but maybe we can achieve this goal in a simpler and cheaper way, here are some more ideas that came to my mind:

Adding a querying enhancement to the query-fe itself? Maybe we can query only some blocks from the backend storage? Call it something like "sampled query".
I understand that the older the blocks, the wider their time range and their compaction level (due to the compactor job). So maybe the compactor can "write to the side" sampled data? For example when compacting a day's worth of data - keeping aside let's say ~48 data-points for each metric? (Kind of downsampling, but trying to save the heavy IO work).

What do you think?

wilfriedroset · 2023-05-30T07:38:34Z

Adding a querying enhancement to the query-fe itself? Maybe we can query only some blocks from the backend storage? Call it something like "sampled query".

I'm not sure to understand this part. If I've understand correctly the compactor, at the end of the compaction we have 24h block. What a "sampled query" would look like in terms of block queried from the object storage?

I understand that the older the blocks, the wider their time range and their compaction level (due to the compactor job). So maybe the compactor can "write to the side" sampled data? For example when compacting a day's worth of data - keeping aside let's say ~48 data-points for each metric? (Kind of downsampling, but trying to save the heavy IO work).

It is somehow similar to the proposal 2 and I tend to prefer it above all the other proposals I've made. We can either do the downsampling during the last compaction stage or create a new stage dedicated to downsampling. Creating a new stage has pros and cons, on cons end we have to fetch once again the block to downsample and on the pros end that means we will downsample every old blocks. The "write to the side" part is also similar to the proposal 2 where we create a block containing only downsampled data. This is also an enabler for different retention based on downsampling range.

wouter2397 · 2023-09-12T11:10:41Z

@wilfriedroset Any updates on this feature proposal?
I'm very interested in this feature to use in our setup

wilfriedroset · 2023-09-12T13:06:42Z

@wouter2397, we are waiting for maintainers' feedbacks without it we cannot move forward.

dimitarvdimitrov

would we also need to modify how the promQL engine processes these downsampled chunks of samples? I'm curious about how the sum, count, num, and max values will be used in queries. Currently the promQL engine relies that chunks comply to the chunkenc.Iterator interface, which doesn't have support for these chunks.

wilfriedroset · 2023-09-15T06:44:05Z

would we also need to modify how the promQL engine processes these downsampled chunks of samples?

I believe we should look for implementation that don't rely on a modified promQL engine as it would add more work in the long run.

I'm curious about how the sum, count, num, and max values will be used in queries.

Let say we have a serie named cortex_request_duration_seconds_count which is downsampled. My idea was to create new series named <original serie name>:<operation><window>. Therefore we would have something like:

cortex_request_duration_seconds_count:sum1d
cortex_request_duration_seconds_count:count1d

btravouillon · 2023-10-27T16:08:38Z

While searching an external solution to downsample Prometheus data, I came across this PR. Nice work @wilfriedroset 😉

mrr4cc00n · 2023-12-14T17:09:05Z

Still running into this issue (queries are just too slow), do we know an ETA for this one? Thx for the push @wilfriedroset 👌

ChristopherWirt · 2024-08-05T17:47:57Z

+1

edgarasg · 2024-08-20T08:23:14Z

Any news on this?

wilfriedroset · 2024-08-23T07:08:48Z

Thank you for your interest @edgarasg, we are waiting for maintainers' feedbacks without it we cannot move forward.

colega · 2024-12-19T16:32:54Z

Hi, sorry for not giving this enough attention for such a long period of time. I guess downsampling just wasn't a priority for Grafana Labs so far. Not sure if it is now, but you definitely deserve some feedback here.

As @dimitarvdimitrov mentioned above, we would probably need to modify the query path for this. However, maybe one option for us would be to open-source the query-path of our Adaptive Metrics product. We never published all the implementation details of Adaptive Metrics publicly, but they're also not a secret, so I'll save you some reverse-engineering: aggregations store a separate series with some less labels and an __aggregation__="metric_name:<agg>" label. This label is later used in the query path to select approriate series. While Adaptive Metrics wasn't designed with downsampling as main focus (it does some time-aggregation, but it was designed to do space-aggregation), I think the query path might actually work for that (may I mention @leizor, @Logiraptor on this).

It has been previously discussed that we might open-source the query path of Adaptive Metrics, it's not a matter of keeping it secret, it's just a matter of lack of motivation to work on open-sourcing it: this might be our motivation now. I don't think we would want to maintain two separate aggregation-related query-path features. We also discussed in the past that bringing it to OSS might help further improving it (so far it has limited ability to hint the storage what it needs), and maybe this way it can become a first class citizen.

Those are my thoughts for the query path.

Regarding the generation of the downsampled blocks, I think it's a bad moment to do it and it's the best moment to talk about it. We are doing huge architecture changes in Mimir, there's a new component coming called block-builder (cc @seizethedave?), and afaik we also wanted to start thinking about a compactor-scheduler, to make more intelligent decisions about compaction: the proposals you make seem to be something that may fit some of these new components.

Regarding the proposal document itself:

I think there's a lot of conversation that should happen around the query path, and I don't see that in the doc, is it intended to be in a separate doc?
I think our general stance is that this kind of features should always be configured per-tenant.

What do you think should be our next steps?

We thought about merging it, with same "Draft" status. But we're not going to start working on this now. That would allow people to send amendments to this proposal, but they can also send them to this branch. We kind of have a feeling that having a PR here with its comments keeps the proposal discussion more alive than it would be if it was just a file in the repo, WDYT? I also don't want to merge it saying "rejected" since we don't have it planned now, because this might suddenly become a priority in 6 months, we all know how software development works.

wilfriedroset · 2024-12-20T10:13:18Z

Thank you @colega for your feedbacks and not forgetting this feature request. I believe the community appreciate it.
I will try to answer to all your points.

we would probably need to modify the query path for this.

The initial idea of my proposal was to create new metrics that would be queried similarly to the other metrics. this is why I did not investigate the changes needed in the query path. In that regard, my proposal is not optimal

It has been previously discussed that we might open-source the query path of Adaptive Metrics, it's not a matter of keeping it secret, it's just a matter of lack of motivation to work on open-sourcing it: this might be our motivation now. I don't think we would want to maintain two separate aggregation-related query-path features. We also discussed in the past that bringing it to OSS might help further improving it (so far it has limited ability to hint the storage what it needs), and maybe this way it can become a first class citizen.

100% for that but I understand that this means more work on your end.

Regarding the generation of the downsampled blocks, I think it's a bad moment to do it and it's the best moment to talk about it. We are doing huge architecture changes in Mimir, there's a new component coming called block-builder (cc @seizethedave?), and afaik we also wanted to start thinking about a compactor-scheduler, to make more intelligent decisions about compaction: the proposals you make seem to be something that may fit some of these new components.

With the new component coming, the proposal might need to be adjusted to take into account the block-builder and the compactor-scheduler. As I don't know yet about them, I will not be able to make the adjustment just now but I'm still willing to collaborate.

I think there's a lot of conversation that should happen around the query path, and I don't see that in the doc, is it intended to be in a separate doc?

As I've explained, it was intended to not modify the query path. maybe with the work done and coming we need to clarify the changes needed to the query path.

I think our general stance is that this kind of features should always be configured per-tenant.

I agree.

We thought about merging it, with same "Draft" status. But we're not going to start working on this now. That would allow people to send amendments to this proposal, but they can also send them to this branch. We kind of have a feeling that having a PR here with its comments keeps the proposal discussion more alive than it would be if it was just a file in the repo, WDYT?

I rarely do this kind of long shot proposal and therefore I don't have any strong feeling between the two ways of iterating over the proposal. In the end I would consider a couple thing:

Do we want to merge the proposal only if it is at it final design?
Where do we want to have the conversation about the whole proposal?
Is this proposal small enough to tackle in a single design proposal or do we want to break it in smaller pieces?

I also understand that a lot of thinking/design as to be done before the coding start. This in fair from my point of view and the context is clear enough.

I also don't want to merge it saying "rejected" since we don't have it planned now, because this might suddenly become a priority in 6 months, we all know how software development works.

100% agree on this.

github-actions · 2025-05-20T02:14:45Z

Thank you for your contribution. This pull request has been marked as stale because it has had no activity in the last 150 days. It will be closed in 30 days if there is no further activity. If you need more time, you can add a comment to the PR.

wilfriedroset · 2025-05-20T08:34:01Z

this PR is still needed don't close it 🙏🏾

fredericoschardong · 2025-06-05T17:56:30Z

Looking forward to see this feature merged!

wilfriedroset requested review from a team as code owners May 16, 2023 19:47

Add downsampling proposal

25444a6

Signed-off-by: Wilfried Roset <[email protected]>

wilfriedroset force-pushed the proposal-downsampling branch from 078f791 to 25444a6 Compare May 16, 2023 20:10

dimitarvdimitrov reviewed Sep 14, 2023

View reviewed changes

dimitarvdimitrov mentioned this pull request Feb 28, 2024

Limit per-series data points per minute #7484

Open

github-actions bot added the stale label May 20, 2025

github-actions bot removed the stale label May 21, 2025

colega added the keep-alive label May 30, 2025

Add downsampling proposal #5028

Are you sure you want to change the base?

Add downsampling proposal #5028

Uh oh!

Conversation

wilfriedroset commented May 16, 2023

What this PR does

Which issue(s) this PR fixes or relates to

Checklist

Uh oh!

Otterpohl commented May 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adirmatzkin commented May 17, 2023

Uh oh!

wilfriedroset commented May 29, 2023

Uh oh!

adirmatzkin commented May 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wilfriedroset commented May 30, 2023

Uh oh!

wouter2397 commented Sep 12, 2023

Uh oh!

wilfriedroset commented Sep 12, 2023

Uh oh!

dimitarvdimitrov left a comment

Choose a reason for hiding this comment

Uh oh!

wilfriedroset commented Sep 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

btravouillon commented Oct 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mrr4cc00n commented Dec 14, 2023

Uh oh!

ChristopherWirt commented Aug 5, 2024

Uh oh!

edgarasg commented Aug 20, 2024

Uh oh!

wilfriedroset commented Aug 23, 2024

Uh oh!

colega commented Dec 19, 2024

Uh oh!

wilfriedroset commented Dec 20, 2024

Uh oh!

github-actions bot commented May 20, 2025

Uh oh!

wilfriedroset commented May 20, 2025

Uh oh!

fredericoschardong commented Jun 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Otterpohl commented May 17, 2023 •

edited

Loading

adirmatzkin commented May 30, 2023 •

edited

Loading

wilfriedroset commented Sep 15, 2023 •

edited

Loading

btravouillon commented Oct 27, 2023 •

edited

Loading