Fix TopNComputer for reverse order #2672

PSeitz · 2025-07-16T09:25:39Z

Similar fix to #2651, but more lightweight (no change of Score)

Co-authored-by: Pascal Seitz <[email protected]>

stuhood · 2025-08-07T18:31:24Z

src/collector/top_score_collector.rs

        if let Some(last_median) = self.threshold.clone() {
-            if feature < last_median {
+            if !REVERSE_ORDER && feature > last_median {
+                return;
+            }
+            if REVERSE_ORDER && feature < last_median {
                return;
            }
        }


Unfortunately, after further consideration, I think that the previous version was slightly incorrect, and so this one is too.

There are probably multiple factors, but one reason why #2651 would be slower is that it is comparing both the Score and the DocId. Before and after this PR, main is only comparing the Score to the threshold, which means that it will not tiebreak using the DocId.

Tiebreaking with the DocId is necessary for deterministic ordering of results: this will eliminate docs with equal Scores even if they have higher DocIds, which mismatches the behavior of ComparableDoc (and thus truncate_top_n/into_sorted_vec).

Tiebreaking with the DocId is necessary for deterministic ordering of results...

And if it isn't, then we should remove it / make it optional, because it's not free! 😅

Tiebreaking with the DocId is necessary for deterministic ordering of results...

And if it isn't, then we should remove it / make it optional, because it's not free! 😅

See #2681 (comment) on this topic: I think that we could introduce optional DocId/DocAddress tiebreaking with that API, and then remove it from TopN consumers who don't need it.

#2651 compares Docid for the threshold, but that's not necessary. It should be a very fast pre-filtering. We still compare Docids via ComparableDoc during the sorts

The check does not filter docs with equal scores only with smaller ones

Oh, true. And the ComparableDoc comparator should be doing the same thing and exiting early without comparing the docid.

Sorry for the noise.

## What Add a `TopDocs::order_by` method, which supports ordering by multiple fast fields and scores in one collection pass, as defined by the `TopOrderable` trait. The `TopOrderable` trait is implemented (by a macro) for tuples of length 1 through 3 (for now). ## How Add: * a `TopOrderable` trait which is implemented for tuples, and a `TopOrderableCollector` to collect for it. * a `Feature` trait which is implemented for `Score`s, and for fast fields. * To allow for boxing/dynamic dispatch of `Features` (which reduces code generation when the sort columns are not known until runtime), `Arc<dyn Feature>` is implemented via `ErasedFeature`. * a `TopNCompare` trait which can be used together with a `LazyTopNComputer` to lazily fetch columns during TopN. * This new interface is necessary because `TopNComputer` does not allow for lazily fetching additional fields for the comparison tuple, which can eliminate a lot of IO when tiebreakers are only rarely actually coming into play in the comparison (because most values are being eliminated by earlier columns). * It could also allow for making `DocId`/`DocAddress` tiebreaking optional ([see](quickwit-oss#2672 (comment))), via something like a "`DocIdFeature`". This interface additionally could not use the `CustomScorer` APIs because it does not allow segments to Top-N a different type than their final output type (which is essential for ordering by `String`s). ## Note This patch isolates everything to one module, but should almost certainly be split up into multiple modules, and better integrated with the existing modules. I was hoping to get some feedback on it before rearranging things, but I'm very happy to do so! ---- Upstream at quickwit-oss#2681

Fix TopNComputer for reverse order

911e881

PSeitz-dd requested review from fulmicoton and fulmicoton-dd July 16, 2025 09:28

fulmicoton-dd approved these changes Jul 16, 2025

View reviewed changes

PSeitz merged commit 4e84c70 into quickwit-oss:main Jul 16, 2025
3 checks passed

stuhood mentioned this pull request Jul 16, 2025

Fixes for TopDocs::order_by_string_fast_field and TopNComputer #2651

Closed

maksym-iv-ef pushed a commit to elastiflow/tantivy that referenced this pull request Jul 24, 2025

Fix TopNComputer for reverse order (quickwit-oss#2672)

c2a4060

Co-authored-by: Pascal Seitz <[email protected]>

stuhood reviewed Aug 7, 2025

View reviewed changes

This was referenced Aug 7, 2025

feat: Add support for ordering by multiple fields. #2681

Closed

feat: Add support for ordering by multiple fields. paradedb/tantivy#57

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix TopNComputer for reverse order #2672

Fix TopNComputer for reverse order #2672

Uh oh!

PSeitz commented Jul 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

stuhood Aug 7, 2025 •

edited

Loading

Uh oh!

stuhood Aug 7, 2025 •

edited

Loading

Uh oh!

stuhood Aug 7, 2025

Uh oh!

PSeitz Aug 12, 2025 •

edited

Loading

Uh oh!

stuhood Aug 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Fix TopNComputer for reverse order #2672

Fix TopNComputer for reverse order #2672

Uh oh!

Conversation

PSeitz commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

stuhood Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stuhood Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stuhood Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

PSeitz Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stuhood Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

PSeitz commented Jul 16, 2025 •

edited

Loading

stuhood Aug 7, 2025 •

edited

Loading

stuhood Aug 7, 2025 •

edited

Loading

PSeitz Aug 12, 2025 •

edited

Loading