-
Notifications
You must be signed in to change notification settings - Fork 2.2k
proxystore: stabilize order for duplicates in response losertree #8415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
proxystore: stabilize order for duplicates in response losertree #8415
Conversation
13cc669
to
4d39b27
Compare
Currently the order of responses in the losertree on collisions in the labelset is random. This can happen if we drop the replica label in an endpoint. In the case of sidecars the order of responses has effect on deduplication. The primary iterator is used until we find a large enough gap to failover to the replica iterator, where primary and replica is determined by the order they are returned from the proxy losertree. This can lead to slight differences if we repeat a query since different sidecars have scraped at different times possibly. Using the store labelset as tiebreaker is an attempt at stabilizing this. Signed-off-by: Michael Hoffmann <[email protected]>
4d39b27
to
9105dd6
Compare
var maxVal *storepb.SeriesResponse = storepb.NewSeriesResponse(nil) | ||
// It's agnostic to duplicates and overlaps, it forwards all duplicated series ordered by the labelset of their endpoint. | ||
func NewProxyResponseLoserTree(seriesSets ...respSet) *proxyResponseLoserTree { | ||
var maxVal seriesResponseWithStoreLabelset = seriesResponseWithStoreLabelset{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var maxVal seriesResponseWithStoreLabelset = seriesResponseWithStoreLabelset{} | |
var maxVal = seriesResponseWithStoreLabelset{} |
Type can be inferred from the right-side
tree: losertree.New( | ||
seriesSets, | ||
maxVal, | ||
func(s respSet) seriesResponseWithStoreLabelset { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At() is called more than once for every item in the set. Could we upgrade type respSet interface
to be:
type respSet interface {
Close()
At() seriesResponseWithStoreLabelset
?
Plus, s.Labelset()
allocates a new string each time. That's not good in a hot path. Internally, it's just:
storeLabelSets []labels.Labels
I think it should be much faster to compare these things directly instead of allocating a bunch of strings that will be identical across millions of series +/-.
Currently the order of responses in the losertree on collisions in the labelset is random. This can happen if we drop the replica label in an endpoint.
In the case of sidecars the order of responses has effect on deduplication. The primary iterator is used until we find a large enough gap to failover to the replica iterator, where primary and replica is determined by the order they are returned from the proxy losertree. This can lead to slight differences if we repeat a query since different sidecars have scraped at different times possibly. Using the store labelset as tiebreaker is an attempt at stabilizing this.
Changes
Verification