Skip to content

feat: Add support for ordering by multiple fields. #2681

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

stuhood
Copy link
Contributor

@stuhood stuhood commented Aug 7, 2025

What

Add a TopDocs::order_by method, which supports ordering by multiple fast fields and scores in one collection pass, as defined by the TopOrderable trait. The TopOrderable trait is implemented (by a macro) for tuples of length 1 through 3 (for now).

How

Add:

  • a TopOrderable trait which is implemented for tuples, and a TopOrderableCollector to collect for it.
  • a Feature trait which is implemented for Scores, and for fast fields.
    • To allow for boxing/dynamic dispatch of Features (which reduces code generation when the sort features are not known until runtime), Arc<dyn Feature> is implemented via ErasedFeature.
  • a TopNCompare trait which can be used together with a LazyTopNComputer to lazily fetch features during TopN.
    • This new interface is necessary because TopNComputer does not allow for lazily fetching additional fields for the comparison tuple, which can eliminate a lot of IO when tiebreakers are only rarely actually coming into play in the comparison (because most values are being eliminated by earlier features).
    • It could also allow for making DocId/DocAddress tiebreaking optional (see), via something like a "DocIdFeature".

This interface additionally could not use the CustomScorer APIs because it does not allow segments to Top-N a different type than their final output type (which is essential for ordering by Strings).

Note

This patch isolates everything to one module, but should almost certainly be split up into multiple modules, and better integrated with the existing modules. I was hoping to get some feedback on it before rearranging things, but I'm very happy to do so!

@stuhood stuhood force-pushed the stuhood.generic-order-by-upstream branch 2 times, most recently from 407a1d8 to 1c82ef0 Compare August 8, 2025 17:42
stuhood added a commit to paradedb/tantivy that referenced this pull request Aug 8, 2025
## What

Add a `TopDocs::order_by` method, which supports ordering by multiple fast fields and scores in one collection pass, as defined by the `TopOrderable` trait. The `TopOrderable` trait is implemented (by a macro) for tuples of length 1 through 3 (for now).

## How

Add:
* a `TopOrderable` trait which is implemented for tuples, and a `TopOrderableCollector` to collect for it.
* a `Feature` trait which is implemented for `Score`s, and for fast fields.
    * To allow for boxing/dynamic dispatch of `Features` (which reduces code generation when the sort columns are not known until runtime), `Arc<dyn Feature>` is implemented via `ErasedFeature`.  
* a `TopNCompare` trait which can be used together with a `LazyTopNComputer` to lazily fetch columns during TopN.
    * This new interface is necessary because `TopNComputer` does not allow for lazily fetching additional fields for the comparison tuple, which can eliminate a lot of IO when tiebreakers are only rarely actually coming into play in the comparison (because most values are being eliminated by earlier columns).
    * It could also allow for making `DocId`/`DocAddress` tiebreaking optional ([see](quickwit-oss#2672 (comment))), via something like a "`DocIdFeature`".

This interface additionally could not use the `CustomScorer` APIs because it does not allow segments to Top-N a different type than their final output type (which is essential for ordering by `String`s).

## Note

This patch isolates everything to one module, but should almost certainly be split up into multiple modules, and better integrated with the existing modules. I was hoping to get some feedback on it before rearranging things, but I'm very happy to do so!

----

Upstream at quickwit-oss#2681
stuhood added 2 commits August 8, 2025 14:09
Uses:
* a `TopOrderable` trait which can be derived for tuples
* a `TopOrderableCollector` to collect for it.
* a `TopNCompare` trait which can be used together with a
  `LazyTopNComputer` to lazily fetch columns during TopN.

Note that this does not use the `CustomScorer` API because:
1. `TopNComputer` does not allow for lazily fetching additional fields
   for the comparison tuple, which is important when tiebreakers are
   only rarely actually coming into play in the comparison, and most
   values are being eliminated by earlier columns.
2. `CustomScoreTopCollector` does not allow segments to Top-N a different type
   than their final output type, which is essential for ordering by `String`s.
3. In order to include scores as one of the ordering columns, we need to be able
   to optionally enable scores.
4. The `CustomScoreTopCollector::merge_fruits` function needs to operate over a
   wrapper type in order to apply different ordering globally than per-segment.
@stuhood stuhood force-pushed the stuhood.generic-order-by-upstream branch from 1c82ef0 to 140c506 Compare August 8, 2025 21:09
@fulmicoton
Copy link
Collaborator

I am surprised this requires macros @stuhood. Can we get away with doing only generics?

fn get(
&self,
column: &FeatureColumn,
order: Order,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
order: Order,

The comment is wrong. This method is not returning the value for this feature. (depending on the order it can return the opposite).
At this point, I don't think this is a good idea to have order. We could integrate the order within the Feature trait.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, I don't think this is a good idea to have order. We could integrate the order within the Feature trait.

By making it a generic parameter, or as a field?

I think that a bit more performance could be gained by making it a generic parameter (as TopNComputer now does): will give that a shot.

/// NOTE: We don't require a `PartialOrd` bound on the output type in order to make it possible
/// to use a boxed type like `OwnedValue` without giving it a `PartialOrd` implementation which
/// might be unsafe (i.e.: panicing) in other positions.
fn compare(&self, a: &Self::Output, b: &Self::Output) -> Option<Ordering>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn compare(&self, a: &Self::Output, b: &Self::Output) -> Option<Ordering>;
fn compare(&self, lhs: &Self::Output, rhs: &Self::Output) -> Option<Ordering>;

}
}

fn compare(&self, a: &Self::Output, b: &Self::Output) -> Option<Ordering> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming

@stuhood
Copy link
Contributor Author

stuhood commented Aug 12, 2025

I am surprised this requires macros @stuhood. Can we get away with doing only generics?

I don't think so... AFAIK it is not possible to abstract over the length of a tuple, and doing this without tuples requires that the types be boxed/dynamic or wrapped in enums.

I think that this is similar to the "SegmentCollectors for tuples" pattern over here:

// -----------------------------------------------
// Tuple implementations.
-- except implemented with a macro.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants