Skip to content

Conversation

MichaHoffmann
Copy link
Contributor

@MichaHoffmann MichaHoffmann commented Aug 8, 2025

Currently the order of responses in the losertree on collisions in the labelset is random. This can happen if we drop the replica label in an endpoint.
In the case of sidecars the order of responses has effect on deduplication. The primary iterator is used until we find a large enough gap to failover to the replica iterator, where primary and replica is determined by the order they are returned from the proxy losertree. This can lead to slight differences if we repeat a query since different sidecars have scraped at different times possibly. Using the store labelset as tiebreaker is an attempt at stabilizing this.

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Verification

@MichaHoffmann MichaHoffmann force-pushed the mhoffmann/proxy-stabilize-duplicates-in-response-losertree branch 3 times, most recently from 13cc669 to 4d39b27 Compare August 8, 2025 10:35
saswatamcode
saswatamcode previously approved these changes Aug 8, 2025
Currently the order of responses in the losertree on collisions in the
labelset is random. This can happen if we drop the replica label in an
endpoint.
In the case of sidecars the order of responses has effect on
deduplication. The primary iterator is used until we find a large enough
gap to failover to the replica iterator, where primary and replica is
determined by the order they are returned from the proxy losertree.
This can lead to slight differences if we repeat a query since different
sidecars have scraped at different times possibly.
Using the store labelset as tiebreaker is an attempt at stabilizing
this.

Signed-off-by: Michael Hoffmann <[email protected]>
@MichaHoffmann MichaHoffmann force-pushed the mhoffmann/proxy-stabilize-duplicates-in-response-losertree branch from 4d39b27 to 9105dd6 Compare August 25, 2025 07:22
var maxVal *storepb.SeriesResponse = storepb.NewSeriesResponse(nil)
// It's agnostic to duplicates and overlaps, it forwards all duplicated series ordered by the labelset of their endpoint.
func NewProxyResponseLoserTree(seriesSets ...respSet) *proxyResponseLoserTree {
var maxVal seriesResponseWithStoreLabelset = seriesResponseWithStoreLabelset{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var maxVal seriesResponseWithStoreLabelset = seriesResponseWithStoreLabelset{}
var maxVal = seriesResponseWithStoreLabelset{}

Type can be inferred from the right-side

tree: losertree.New(
seriesSets,
maxVal,
func(s respSet) seriesResponseWithStoreLabelset {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At() is called more than once for every item in the set. Could we upgrade type respSet interface to be:

type respSet interface {
	Close()
	At() seriesResponseWithStoreLabelset

?

Plus, s.Labelset() allocates a new string each time. That's not good in a hot path. Internally, it's just:

storeLabelSets []labels.Labels

I think it should be much faster to compare these things directly instead of allocating a bunch of strings that will be identical across millions of series +/-.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants