-
-
Notifications
You must be signed in to change notification settings - Fork 790
feat: Add support for ordering by multiple fields. #2681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Add support for ordering by multiple fields. #2681
Conversation
407a1d8
to
1c82ef0
Compare
## What Add a `TopDocs::order_by` method, which supports ordering by multiple fast fields and scores in one collection pass, as defined by the `TopOrderable` trait. The `TopOrderable` trait is implemented (by a macro) for tuples of length 1 through 3 (for now). ## How Add: * a `TopOrderable` trait which is implemented for tuples, and a `TopOrderableCollector` to collect for it. * a `Feature` trait which is implemented for `Score`s, and for fast fields. * To allow for boxing/dynamic dispatch of `Features` (which reduces code generation when the sort columns are not known until runtime), `Arc<dyn Feature>` is implemented via `ErasedFeature`. * a `TopNCompare` trait which can be used together with a `LazyTopNComputer` to lazily fetch columns during TopN. * This new interface is necessary because `TopNComputer` does not allow for lazily fetching additional fields for the comparison tuple, which can eliminate a lot of IO when tiebreakers are only rarely actually coming into play in the comparison (because most values are being eliminated by earlier columns). * It could also allow for making `DocId`/`DocAddress` tiebreaking optional ([see](quickwit-oss#2672 (comment))), via something like a "`DocIdFeature`". This interface additionally could not use the `CustomScorer` APIs because it does not allow segments to Top-N a different type than their final output type (which is essential for ordering by `String`s). ## Note This patch isolates everything to one module, but should almost certainly be split up into multiple modules, and better integrated with the existing modules. I was hoping to get some feedback on it before rearranging things, but I'm very happy to do so! ---- Upstream at quickwit-oss#2681
Uses: * a `TopOrderable` trait which can be derived for tuples * a `TopOrderableCollector` to collect for it. * a `TopNCompare` trait which can be used together with a `LazyTopNComputer` to lazily fetch columns during TopN. Note that this does not use the `CustomScorer` API because: 1. `TopNComputer` does not allow for lazily fetching additional fields for the comparison tuple, which is important when tiebreakers are only rarely actually coming into play in the comparison, and most values are being eliminated by earlier columns. 2. `CustomScoreTopCollector` does not allow segments to Top-N a different type than their final output type, which is essential for ordering by `String`s. 3. In order to include scores as one of the ordering columns, we need to be able to optionally enable scores. 4. The `CustomScoreTopCollector::merge_fruits` function needs to operate over a wrapper type in order to apply different ordering globally than per-segment.
1c82ef0
to
140c506
Compare
I am surprised this requires macros @stuhood. Can we get away with doing only generics? |
fn get( | ||
&self, | ||
column: &FeatureColumn, | ||
order: Order, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
order: Order, |
The comment is wrong. This method is not returning the value for this feature. (depending on the order it can return the opposite).
At this point, I don't think this is a good idea to have order. We could integrate the order within the Feature trait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, I don't think this is a good idea to have order. We could integrate the order within the Feature trait.
By making it a generic parameter, or as a field?
I think that a bit more performance could be gained by making it a generic parameter (as TopNComputer
now does): will give that a shot.
/// NOTE: We don't require a `PartialOrd` bound on the output type in order to make it possible | ||
/// to use a boxed type like `OwnedValue` without giving it a `PartialOrd` implementation which | ||
/// might be unsafe (i.e.: panicing) in other positions. | ||
fn compare(&self, a: &Self::Output, b: &Self::Output) -> Option<Ordering>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn compare(&self, a: &Self::Output, b: &Self::Output) -> Option<Ordering>; | |
fn compare(&self, lhs: &Self::Output, rhs: &Self::Output) -> Option<Ordering>; |
} | ||
} | ||
|
||
fn compare(&self, a: &Self::Output, b: &Self::Output) -> Option<Ordering> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming
I don't think so... AFAIK it is not possible to abstract over the length of a tuple, and doing this without tuples requires that the types be boxed/dynamic or wrapped in enums. I think that this is similar to the " Lines 291 to 292 in 2e4615c
|
What
Add a
TopDocs::order_by
method, which supports ordering by multiple fast fields and scores in one collection pass, as defined by theTopOrderable
trait. TheTopOrderable
trait is implemented (by a macro) for tuples of length 1 through 3 (for now).How
Add:
TopOrderable
trait which is implemented for tuples, and aTopOrderableCollector
to collect for it.Feature
trait which is implemented forScore
s, and for fast fields.Features
(which reduces code generation when the sort features are not known until runtime),Arc<dyn Feature>
is implemented viaErasedFeature
.TopNCompare
trait which can be used together with aLazyTopNComputer
to lazily fetch features during TopN.TopNComputer
does not allow for lazily fetching additional fields for the comparison tuple, which can eliminate a lot of IO when tiebreakers are only rarely actually coming into play in the comparison (because most values are being eliminated by earlier features).DocId
/DocAddress
tiebreaking optional (see), via something like a "DocIdFeature
".This interface additionally could not use the
CustomScorer
APIs because it does not allow segments to Top-N a different type than their final output type (which is essential for ordering byString
s).Note
This patch isolates everything to one module, but should almost certainly be split up into multiple modules, and better integrated with the existing modules. I was hoping to get some feedback on it before rearranging things, but I'm very happy to do so!