-
Notifications
You must be signed in to change notification settings - Fork 655
Add downsampling proposal #5028
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Wilfried Roset <[email protected]>
078f791 to
25444a6
Compare
|
An interesting workaround posted by someone in slack (props to @adirmatzkin) was to treat the downsampled data like it was a different tenant. Obviously this has its own drawbacks (most notably the double/triple/n x ingestion, depending on different levels of downsampling required). But the logic does fit in quite well with Mimir's current pattern (and would build on top of it as opposed to changing it (whether the maintainers prefer this, who knows)), since it keeps the data immutable and gives the user the option of having both downsampled and raw data, and lets existing compacting/pruning processes run as is. This has the limitation of downsampling at ingestion which is wasteful and not feasible in my eyes but this is more of a feeler for different types of solutions than a proposal for it to be an actual solution. |
Just added a more detailed comment there :) |
|
@Otterpohl and @adirmatzkin thank you for sharing your tips. I reckon that the fact this discussion is still ongoing ~1y later the first message in the discussion linked above demonstrate the need for a downsampling feature. WDYT about the proposal? |
|
Hey @wilfriedroset 👋 I was thinking to my self after reading ~half way through - My familiarity with the architecture, logic, and code of the compactor and query-fe isn't as comprehensive as I'd like it to be, so I might be wrong here.. let me know if I'm way off with my ideas.. but maybe we can achieve this goal in a simpler and cheaper way, here are some more ideas that came to my mind:
What do you think? |
I'm not sure to understand this part. If I've understand correctly the compactor, at the end of the compaction we have 24h block. What a "sampled query" would look like in terms of block queried from the object storage?
It is somehow similar to the proposal 2 and I tend to prefer it above all the other proposals I've made. We can either do the downsampling during the last compaction stage or create a new stage dedicated to downsampling. Creating a new stage has pros and cons, on cons end we have to fetch once again the block to downsample and on the pros end that means we will downsample every old blocks. The "write to the side" part is also similar to the proposal 2 where we create a block containing only downsampled data. This is also an enabler for different retention based on downsampling range. |
|
@wilfriedroset Any updates on this feature proposal? |
|
@wouter2397, we are waiting for maintainers' feedbacks without it we cannot move forward. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would we also need to modify how the promQL engine processes these downsampled chunks of samples? I'm curious about how the sum, count, num, and max values will be used in queries. Currently the promQL engine relies that chunks comply to the chunkenc.Iterator interface, which doesn't have support for these chunks.
I believe we should look for implementation that don't rely on a modified promQL engine as it would add more work in the long run.
Let say we have a serie named
|
|
While searching an external solution to downsample Prometheus data, I came across this PR. Nice work @wilfriedroset 😉 |
|
Still running into this issue (queries are just too slow), do we know an ETA for this one? Thx for the push @wilfriedroset 👌 |
|
+1 |
|
Any news on this? |
|
Thank you for your interest @edgarasg, we are waiting for maintainers' feedbacks without it we cannot move forward. |
|
Hi, sorry for not giving this enough attention for such a long period of time. I guess downsampling just wasn't a priority for Grafana Labs so far. Not sure if it is now, but you definitely deserve some feedback here. As @dimitarvdimitrov mentioned above, we would probably need to modify the query path for this. However, maybe one option for us would be to open-source the query-path of our Adaptive Metrics product. We never published all the implementation details of Adaptive Metrics publicly, but they're also not a secret, so I'll save you some reverse-engineering: aggregations store a separate series with some less labels and an It has been previously discussed that we might open-source the query path of Adaptive Metrics, it's not a matter of keeping it secret, it's just a matter of lack of motivation to work on open-sourcing it: this might be our motivation now. I don't think we would want to maintain two separate aggregation-related query-path features. We also discussed in the past that bringing it to OSS might help further improving it (so far it has limited ability to hint the storage what it needs), and maybe this way it can become a first class citizen. Those are my thoughts for the query path. Regarding the generation of the downsampled blocks, I think it's a bad moment to do it and it's the best moment to talk about it. We are doing huge architecture changes in Mimir, there's a new component coming called block-builder (cc @seizethedave?), and afaik we also wanted to start thinking about a compactor-scheduler, to make more intelligent decisions about compaction: the proposals you make seem to be something that may fit some of these new components. Regarding the proposal document itself:
What do you think should be our next steps? We thought about merging it, with same "Draft" status. But we're not going to start working on this now. That would allow people to send amendments to this proposal, but they can also send them to this branch. We kind of have a feeling that having a PR here with its comments keeps the proposal discussion more alive than it would be if it was just a file in the repo, WDYT? I also don't want to merge it saying "rejected" since we don't have it planned now, because this might suddenly become a priority in 6 months, we all know how software development works. |
|
Thank you @colega for your feedbacks and not forgetting this feature request. I believe the community appreciate it.
The initial idea of my proposal was to create new metrics that would be queried similarly to the other metrics. this is why I did not investigate the changes needed in the query path. In that regard, my proposal is not optimal
100% for that but I understand that this means more work on your end.
With the new component coming, the proposal might need to be adjusted to take into account the block-builder and the compactor-scheduler. As I don't know yet about them, I will not be able to make the adjustment just now but I'm still willing to collaborate.
As I've explained, it was intended to not modify the query path. maybe with the work done and coming we need to clarify the changes needed to the query path.
I agree.
I rarely do this kind of long shot proposal and therefore I don't have any strong feeling between the two ways of iterating over the proposal. In the end I would consider a couple thing:
I also understand that a lot of thinking/design as to be done before the coding start. This in fair from my point of view and the context is clear enough.
100% agree on this. |
|
Thank you for your contribution. This pull request has been marked as stale because it has had no activity in the last 150 days. It will be closed in 30 days if there is no further activity. If you need more time, you can add a comment to the PR. |
|
this PR is still needed don't close it 🙏🏾 |
|
Looking forward to see this feature merged! |
What this PR does
Add a downsampling feature proposal.
The template used for this proposal is inspired by loki's LIDs: https://github.com/grafana/loki/tree/main/docs/sources/community/lids
Which issue(s) this PR fixes or relates to
Fixes NA
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]