Skip to content

Conversation

@LantaoJin
Copy link
Member

@LantaoJin LantaoJin commented Dec 23, 2025

Description

Support enumerable TopK, check #4982 for issue details.

  • EnumerableTopKConverterRule
    • Convert LogicalSort with fetch to CalciteEnumerableTopK
  • EnumerableTopKMergeRule
    • Merge EnumerableLimit and EnumerableSort to CalciteEnumerableTopK

The CalciteEnumerableTopK is derived from EnumerableLimitSort which has the corrected cost computation.

Related Issues

Resolves #4982

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Lantao Jin <[email protected]>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 23, 2025

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Limit+sort handling is now fused into a single top‑k operation, producing simpler plans and faster query execution.
  • Bug Fixes

    • Added a guard to avoid redundant top‑digest processing, preventing potential stack overflow in limit pushdown paths.
    • Removed duplicated limit annotations in serialized plans.
  • Refactor

    • Reworked physical plan shapes (joins, sorts, and pushdowns) for cleaner, more efficient execution.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Adds a TopK optimization that fuses a Limit+Sort pattern into a new CalciteEnumerableTopK operator via a planner rule, registers that rule, updates pushdown logic to guard against duplicate digests, extends PushDownContext with a top-check helper, and updates many expected physical-plan test outputs to reflect TopK use.

Changes

Cohort / File(s) Summary
New TopK operator
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java
New public class extending EnumerableLimitSort with custom cost model, copy and create factory to represent fused Limit+Sort as TopK.
Planner rule to merge Limit+Sort
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKMergeRule.java
New rule that matches EnumerableLimit over EnumerableSort and replaces them with CalciteEnumerableTopK (config + DEFAULT).
Rule registration
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
Adds ENUMERABLE_TOP_K_MERGE_RULE to index-scan rule list (inserted after RARE_TOP_PUSH_DOWN).
Sort index-scan predicate update
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortIndexScanRule.java
Adds isTopK predicate (detecting CalciteEnumerableTopK) and excludes TopK from the SortIndexScan operand predicate.
Pushdown duplicate-guard and limit handling
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
Adds precomputed LimitDigest reuse and early-return guard to avoid stacking identical limit digests (prevents potential stack overflow).
PushDownContext API
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java
Adds public boolean containsDigestOnTop(Object digest) to test whether the most-recently-added digest equals the provided one.
Integration test expected outputs
integ-test/src/test/resources/expectedOutput/calcite/... (many YAML/JSON files; use * grouping)
~40 expected-output files updated to reflect Limit+Sort → CalciteEnumerableTopK consolidation across various queries (q28, q29, many explain/.yaml/.json, chart_.yaml, streamstats, agg_*, etc.). Also minor formatting cleanups and some duplicated lines noted.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Planner
    participant RuleEngine
    participant TopK(Op)
    participant IndexScan
    participant PushDownCtx

    Client->>Planner: submit query
    Planner->>RuleEngine: run optimization rules
    Note right of RuleEngine: EnumerableTopKMergeRule matches\nEnumerableLimit -> EnumerableSort
    RuleEngine->>TopK(Op): create CalciteEnumerableTopK (collation, offset, fetch)
    TopK(Op)->>IndexScan: become parent of existing scan subtree
    IndexScan->>PushDownCtx: preserve/adjust PushDownContext
    PushDownCtx->>IndexScan: containsDigestOnTop used to guard duplicate digests
    Planner->>Client: produce physical plan with TopK wrapping scan
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

clickbench

Suggested reviewers

  • penghuo
  • ps48
  • kavithacm
  • derek-ho
  • joshuali925
  • GumpacG
  • Swiddis
  • Yury-Fridlyand
  • MaxKsyunz
  • RyanL1997

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.25% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Support enumerable TopK' is concise and directly describes the main feature being added to the codebase.
Description check ✅ Passed The description clearly explains the feature implementation, including EnumerableTopKConverterRule and EnumerableTopKMergeRule, and references the related issue #4982.
Linked Issues check ✅ Passed The pull request fulfills the main objective from issue #4982: converting Limit-Sort patterns into TopK operators to reduce memory usage during sorting, as evidenced by numerous test case updates showing EnumerableLimit+EnumerableSort replaced by CalciteEnumerableTopK.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing TopK support: new CalciteEnumerableTopK class, merger/converter rules, integration into rule sets, and corresponding test updates. No unrelated changes detected.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 940fc4d and 65d062a.

📒 Files selected for processing (34)
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_autodate_range_metric_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_autodate_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_multi_terms_autodate_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_range_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_null_str.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_use_other.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_multi_buckets_not_pushed.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_uncorrelated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_uncorrelated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown5.json
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_sort_agg_push.json
  • integ-test/src/test/resources/expectedOutput/calcite/explain_sort_pass_through_join_then_pushdown.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_distinct_count.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest_custom_time.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_global_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_reset_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_timechart.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_timechart_count.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_top_k_then_sort_push.yaml
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortIndexScanRule.java
🚧 Files skipped from review as they are similar to previous changes (7)
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_autodate_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_autodate_range_metric_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_sort_pass_through_join_then_pushdown.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push2.yaml
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_top_k_then_sort_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
🧰 Additional context used
📓 Path-based instructions (2)
integ-test/src/test/resources/**/*

⚙️ CodeRabbit configuration file

integ-test/src/test/resources/**/*: - Verify test data is realistic and representative

  • Check data format matches expected schema
  • Ensure test data covers edge cases and boundary conditions

Files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_timechart_count.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_null_str.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_distinct_count.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_uncorrelated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_global_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_reset_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_timechart.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_multi_terms_autodate_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest_custom_time.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_range_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_multi_buckets_not_pushed.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_use_other.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_sort_agg_push.json
  • integ-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown5.json
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_uncorrelated_subquery.yaml
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortIndexScanRule.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortIndexScanRule.java
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_timechart_count.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_null_str.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_distinct_count.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_uncorrelated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_global_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_reset_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_timechart.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_multi_terms_autodate_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest_custom_time.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_range_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_multi_buckets_not_pushed.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_use_other.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_sort_agg_push.json
  • integ-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown5.json
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_uncorrelated_subquery.yaml
📚 Learning: 2025-12-29T05:32:03.491Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:03.491Z
Learning: In opensearch-project/sql, when creating custom Calcite RelNode classes that extend EnumerableLimitSort or other Calcite RelNode types, always override the `copy` method. Without overriding copy, the class will downgrade to its parent class type during copy operations, losing the custom implementation.

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_null_str.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_uncorrelated_subquery.yaml
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortIndexScanRule.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_multi_terms_autodate_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_range_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_multi_buckets_not_pushed.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_use_other.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_sort_agg_push.json
  • integ-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown5.json
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_uncorrelated_subquery.yaml
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_null_str.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_uncorrelated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_global_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_reset_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_range_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_use_other.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_sort_agg_push.json
  • integ-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown5.json
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_uncorrelated_subquery.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: test-sql-cli-integration (21)
🔇 Additional comments (27)
integ-test/src/test/resources/expectedOutput/calcite/chart_null_str.yaml (1)

22-39: LGTM! TopK transformation is correct and well-tested.

The physical plan correctly applies the CalciteEnumerableTopK optimization, fusing the LogicalSystemLimit(fetch=[10000]) and LogicalSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) pattern from the logical plan. The transformation preserves:

  • Sort keys and directions (ASC on two fields)
  • Fetch limit (10000 rows)
  • All downstream operations (aggregations, joins, window functions)
  • NULL handling semantics with appropriate filter conditions

The test data is realistic for a banking/demographic use case and covers important edge cases including NULL value handling, which aligns with the test file name chart_null_str.yaml.

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml (1)

12-12: No issue found—the physical plan contains only a single CalciteEnumerableIndexScan operator.

The file correctly shows one optimized CalciteEnumerableIndexScan with AGGREGATION and LIMIT push-down context for the dedup operation. Verification of all dedup test files confirms each contains exactly one CalciteEnumerableIndexScan occurrence. The logical and physical plans are properly structured for the dedup query with expressions.

integ-test/src/test/resources/expectedOutput/calcite/agg_composite_range_sort_agg_measure_not_push.yaml (1)

12-13: TopK optimization correctly applied and well-tested.

The change correctly replaces the EnumerableLimit+EnumerableSort pattern with CalciteEnumerableTopK, matching the PR objectives to reduce memory usage. The TopK parameters (sort0=[$1], dir0=[ASC-nulls-first], fetch=[10000]) correctly preserve the logical plan constraints, and the aggregation push-down context is maintained. Test coverage includes 28 TopK operator instances across diverse scenarios—including edge cases with small fetch values (fetch=[25]) and varying sort directions (ASC-nulls-first, DESC-nulls-last)—validating TopK works correctly with complex aggregation scenarios where measures aren't pushed down.

integ-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yaml (1)

22-35: TopK transformation correctly applied.

The physical plan correctly replaces the Limit+Sort pattern with CalciteEnumerableTopK, matching the PR's optimization objective. The TopK parameters (sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC], fetch=[10000]) align with the LogicalSystemLimit specification, and the nested plan structure is properly preserved.

Test coverage for TopK scenarios is comprehensive: 30 total occurrences across test files covering DESC ordering (with nulls-last handling), single and multi-key sorts, diverse fetch values (25, 100, 10000), and realistic query patterns with joins, aggregations, and window functions.

integ-test/src/test/resources/expectedOutput/calcite/agg_composite_multi_terms_autodate_sort_agg_measure_not_push.yaml (1)

12-14: LGTM!

The TopK consolidation correctly preserves the sort key ($1), direction (ASC-nulls-first), and fetch limit (10000) from the logical plan's LogicalSystemLimit + LogicalSort combination. The downstream EnumerableCalc and CalciteEnumerableIndexScan with their PushDownContext remain appropriately intact.

integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push1.yaml (1)

11-12: LGTM!

The TopK operator correctly handles the multi-key sort scenario with sort0=[$0] and sort1=[$2], matching the logical plan's sorting specification. The aggregation pushdown context is preserved.

integ-test/src/test/resources/expectedOutput/calcite/chart_use_other.yaml (1)

21-30: LGTM!

The complex plan correctly applies CalciteEnumerableTopK at the top level while preserving the inner EnumerableSort (line 26) required for the EnumerableMergeJoin ordering. This test expectation covers an important edge case with joins, window functions, and aggregations combined with TopK optimization.

integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown5.json (1)

4-4: LGTM!

The test correctly demonstrates that CalciteEnumerableTopK is applied only to the inner Sort+Limit combination (fetch=[100]), while outer limits with offsets (offset=[10], fetch=[10]) remain as separate EnumerableLimit nodes. This is an important boundary condition test for the TopK optimization.

integ-test/src/test/resources/expectedOutput/calcite/explain_sort_agg_push.json (1)

4-4: LGTM!

The TopK consolidation correctly preserves sort semantics (sort0=[$0], dir0=[ASC-nulls-first], fetch=[10000]) with aggregation pushdown intact.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml (1)

14-17: LGTM!

This ClickBench Q29 test expectation correctly demonstrates the TopK optimization that addresses the OOM issue described in the PR objectives. The inner Sort+Limit(fetch=25) is consolidated into CalciteEnumerableTopK with DESC-nulls-last direction preserved, while the outer system limit remains as a separate EnumerableLimit(fetch=[10000]).

integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_multi_buckets_not_pushed.yaml (1)

11-12: LGTM!

The test correctly verifies TopK optimization for sorting on an aggregate measure ($0 = count) rather than a grouping key, with multi-bucket aggregation (state + histogram span). All sort parameters are preserved correctly.

integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml (1)

12-13: LGTM! TopK consolidation correctly applied.

The physical plan now uses CalciteEnumerableTopK to fuse the limit and sort operations, which aligns with the PR objective of reducing memory usage by avoiding full in-memory sorts before applying limits. The sort parameters and fetch value are correctly preserved.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortIndexScanRule.java (1)

43-50: LGTM! Correct exclusion of TopK nodes from sort pushdown.

The predicate isTopK correctly identifies CalciteEnumerableTopK instances to prevent the rule from attempting to push down sort for nodes that are already optimized TopK operators. This avoids redundant transformations and ensures the TopK optimization is preserved.

integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_distinct_count.yaml (1)

11-14: LGTM! TopK correctly applied with window functions.

The physical plan properly places CalciteEnumerableTopK above the window function chain, ensuring the streaming sequence ordering (__stream_seq__) is applied after window aggregations. The fetch limit and sort parameters are correctly preserved from the logical plan.

integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest.yaml (1)

11-14: LGTM! Correct TopK placement for earliest/latest streaming stats.

The CalciteEnumerableTopK correctly wraps the window function chain for ARG_MIN/ARG_MAX aggregations, maintaining the streaming order semantics while optimizing the limit+sort pattern.

integ-test/src/test/resources/expectedOutput/calcite/explain_in_uncorrelated_subquery.yaml (1)

13-17: LGTM! TopK correctly applied with semi-join for IN subquery.

The physical plan properly places CalciteEnumerableTopK above the EnumerableHashJoin (semi), maintaining the correct execution order: join first, then apply the top-k sort on salary. The subquery's limit is correctly pushed down to the inner index scan.

integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest_custom_time.yaml (1)

11-14: LGTM! TopK applied correctly for custom time field streaming stats.

The physical plan correctly uses CalciteEnumerableTopK while preserving the window aggregations that use the custom time field (created_at at $0) for ARG_MIN/ARG_MAX operations. This covers the edge case of user-specified time fields in streaming statistics.

integ-test/src/test/resources/expectedOutput/calcite/explain_timechart_count.yaml (1)

22-35: LGTM! Multi-column TopK correctly applied for timechart.

The CalciteEnumerableTopK correctly handles the two-column sort (@timestamp, host) required for timechart ordering. The complex join structure for "OTHER" bucketing and the window-based row numbering are preserved correctly.

integ-test/src/test/resources/expectedOutput/calcite/explain_exists_uncorrelated_subquery.yaml (1)

14-20: LGTM! TopK correctly applied with EXISTS subquery pattern.

The physical plan properly places CalciteEnumerableTopK above the EnumerableNestedLoopJoin that implements the EXISTS semantics. The inner subquery's filter and limit are correctly pushed down to the index scan, while the outer sort by salary is handled by TopK.

integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml (1)

16-21: LGTM - TopK applied to correlated EXISTS subquery.

The CalciteEnumerableTopK operator correctly replaces the limit+sort pattern for the EXISTS correlated subquery case, maintaining the sort on field $2 with DESC-nulls-last ordering and fetch limit of 10000. The correlation structure is preserved.

integ-test/src/test/resources/expectedOutput/calcite/explain_timechart.yaml (1)

22-39: Multi-key TopK correctly applied to timechart query.

The physical plan demonstrates proper multi-key TopK support with CalciteEnumerableTopK(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC], fetch=[10000]) sorting on @timestamp and host fields. The plan restructures the timechart aggregation to use EnumerableMergeJoin (left join) with sorted inputs, which is a sensible optimization given the top-level sort requirement. This test case provides valuable coverage for multi-column TopK scenarios.

integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_global_null_bucket.yaml (1)

16-31: TopK applied correctly with field index remapping.

The CalciteEnumerableTopK(sort0=[$11], dir0=[ASC], fetch=[10000]) correctly sorts on the __stream_seq__ field. Note that the field index changes from $17 in the logical plan to $11 in the physical plan due to projection remapping—the EnumerableWindow operation generates the row number at position 11 after the initial index scan projects 11 fields. The complex plan structure with EnumerableMergeJoin and nested aggregates properly implements the streamstats global null bucket semantics.

integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_null_bucket.yaml (1)

11-15: LGTM - TopK consolidation for streamstats null bucket.

The CalciteEnumerableTopK operator is correctly positioned at the top of the plan, sorting on field $11 (the __stream_seq__ generated by the EnumerableWindow at line 14) with fetch limit 10000. The field index mapping from logical $17 to physical $11 is consistent with the projection structure. The plan preserves the partitioned windowing semantics for streamstats with null bucket handling.

integ-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yaml (1)

16-22: Test expectation correctly defines TopK consolidation for correlated subquery optimization.

The physical plan correctly consolidates the limit and sort into CalciteEnumerableTopK with sort field $2 (salary), direction DESC-nulls-last, and fetch 10000, matching the logical plan's LogicalSystemLimit specification. The plan structure properly preserves the correlated subquery handling with EnumerableCorrelate and nested scans.

integ-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yaml (1)

22-31: LGTM! TopK optimization correctly applied.

The transformation from LogicalSystemLimit (with integrated sort parameters) to CalciteEnumerableTopK(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC], fetch=[10000]) is correct and aligns with the PR objectives to reduce memory usage by limiting sorting to top-K rows during execution.

Key verification points:

  • Sort columns ($0, $1) correctly match the aggregate output (timestamp, category)
  • Sort directions (ASC, ASC) and fetch size (10000) match the logical plan
  • The separate EnumerableSort on line 27 for the window function branch is appropriate
  • PushDownContext objects are consistent with the query structure

The test coverage is excellent, exercising TopK with aggregation, joins, window functions, CASE expressions, and complex pushdown scenarios. Test data is realistic with proper timestamp ranges, varied categories, and appropriate value distributions.

integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_reset_null_bucket.yaml (2)

18-18: TopK optimization successfully applied.

The introduction of CalciteEnumerableTopK replaces the previous Limit+Sort pattern, which aligns with the PR objectives to reduce memory usage for queries with limits over sorts.


26-40: No issues identified. The inner plan restructuring in lines 26-40 is the correct and expected Calcite optimization for this streaming statistics query with reset conditions and null bucket handling:

  1. Projection reduction ([gender, age]): This is a valid optimization. While the index scan projects only these two fields, the subsequent window and calc operations add the derived columns (__stream_seq__, __reset_before_flag__, __reset_after_flag__, __seg_id__) needed for aggregation and joining.

  2. Aggregation grouping (group=[{0, 1, 2}]): Correctly groups by gender, __stream_seq__, and __seg_id__—exactly the columns required by the logical plan's correlation join semantics.

  3. Hash join structure: The inner hash join with its dual aggregations and window operations is the expected Calcite translation of the correlated subquery for streaming stats with reset conditions.

The test (testStreamstatsResetNullBucketExplain) appropriately validates the EXPLAIN output matches this plan structure via YAML structure comparison. No functional issues or semantic problems exist in this query plan.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java (1)

142-145: Address the TODO comment and add JavaDoc.

The method logic is correct, but:

  1. The TODO comment "check on adding" is vague and unclear about what needs to be checked and when.
  2. Per coding guidelines, public methods should have proper JavaDoc documentation.
🔎 Proposed fix
-  // TODO check on adding
-  public boolean containsDigestOnTop(Object digest) {
+  /**
+   * Check if the most recently added operation has a digest equal to the provided digest.
+   *
+   * @param digest The digest object to compare against the top operation
+   * @return true if the top operation's digest equals the provided digest, false otherwise
+   */
+  public boolean containsDigestOnTop(Object digest) {
     return this.queue.peekLast() != null && this.queue.peekLast().digest().equals(digest);
   }
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java (1)

20-21: Add class-level JavaDoc.

Per coding guidelines, public classes should have proper JavaDoc. Consider expanding the single-line comment into a full JavaDoc explaining the TopK optimization purpose and when this node is used.

Suggested JavaDoc
-/** The different between this and {@link EnumerableLimitSort} is the cost. */
+/**
+ * A physical relational node representing a TopK operation with optimized cost computation.
+ *
+ * <p>This node extends {@link EnumerableLimitSort} but computes a lower cost to encourage
+ * the planner to prefer fusing separate Limit and Sort operators into this single TopK node,
+ * thereby reducing memory usage during execution.
+ */
 public class CalciteEnumerableTopK extends EnumerableLimitSort {
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKRule.java (1)

28-42: Add a copy() override to CalciteEnumerableTopK to preserve the custom type.

When Calcite's planning engine clones relational nodes, the parent EnumerableLimitSort.copy() method will be invoked. Without an override, this returns an EnumerableLimitSort instance instead of CalciteEnumerableTopK, losing the custom computeSelfCost() implementation. Following the pattern used in LogicalSystemLimit and other custom Calcite nodes in this codebase, override copy() to return a new CalciteEnumerableTopK instance with the updated parameters.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cbcdbd6 and 2a88ae6.

📒 Files selected for processing (16)
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown5.json
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multisearch_timestamp.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_distinct_count.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest_custom_time.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_global_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_reset_null_bucket.yaml
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java
🧰 Additional context used
📓 Path-based instructions (2)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKRule.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKRule.java
integ-test/src/test/resources/**/*

⚙️ CodeRabbit configuration file

integ-test/src/test/resources/**/*: - Verify test data is realistic and representative

  • Check data format matches expected schema
  • Ensure test data covers edge cases and boundary conditions

Files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest_custom_time.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_distinct_count.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown5.json
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_global_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_reset_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multisearch_timestamp.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest.yaml
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest_custom_time.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_null_bucket.yaml
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_distinct_count.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown5.json
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_global_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_reset_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multisearch_timestamp.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest.yaml
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown5.json
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_global_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_reset_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multisearch_timestamp.yaml
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown5.json
🧬 Code graph analysis (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKRule.java (2)
legacy/src/main/java/org/opensearch/sql/legacy/query/planner/logical/node/Sort.java (1)
  • Sort (16-48)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java (1)
  • CalciteEnumerableTopK (21-54)
🔇 Additional comments (16)
integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest_custom_time.yaml (1)

11-14: LGTM!

The test expectation correctly reflects the Limit + Sort fusion into EnumerableLimitSort. The sort key (sort0=[$5]), direction (dir0=[ASC]), and fetch limit (fetch=[10000]) are properly combined into a single operator, which aligns with the PR objective of reducing memory overhead by avoiding full in-memory sorts.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java (1)

420-429: LGTM!

The guard logic correctly prevents stack overflow by checking if an identical LimitDigest already exists at the top of the pushdown context. This is a sound fix for the recursive pushdown issue, and the precomputed digest variable is properly reused when adding to the context.

integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_reset_null_bucket.yaml (1)

18-40: LGTM!

The test expectation correctly reflects the plan restructuring:

  • EnumerableLimitSort fuses the limit and sort operations
  • EnumerableMergeJoin with 3-key condition requires the corresponding 3-key sorts on both inputs
  • The inner branch projections ([gender, age]) are appropriately minimized for the join/aggregation logic

The plan structure is consistent with the TopK optimization objective.

integ-test/src/test/resources/expectedOutput/calcite/explain_multisearch_timestamp.yaml (1)

16-27: LGTM!

The test expectation correctly reflects the multisearch physical plan with EnumerableMergeUnion. The LIMIT->5 appears twice in each PushDownContext (before and after PROJECT), which is consistent with limit pushdown occurring at multiple stages. The search filter terms (A, B for the first source; E, F for the second) are correctly represented in the query builders.

integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_global_null_bucket.yaml (1)

16-31: LGTM!

The test expectation correctly reflects the TopK optimization with EnumerableLimitSort fusing the limit and sort. The EnumerableMergeJoin with its required sorted inputs and the inner branch's optimized projections ([gender] and [gender, age]) are appropriately structured.

integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown5.json (1)

4-4: LGTM!

The test expectation correctly shows EnumerableLimitSort(sort0=[$0], dir0=[ASC-nulls-first], fetch=[100]) replacing the previous EnumerableSort(fetch=[100]). This fusion is consistent with the PR objective of combining limit and sort into a single TopK operator for improved memory efficiency.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml (1)

13-15: LGTM!

This is the key test expectation for the PR's main objective (issue #4982). The EnumerableLimitSort(sort0=[$0], dir0=[DESC-nulls-last], fetch=[25]) directly addresses the OOM issue by maintaining only the top 25 results in memory during sorting, rather than sorting all data first and then limiting. This pattern is correct and will prevent the memory exhaustion observed in Clickbench Q28.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml (1)

14-17: LGTM! Physical plan correctly reflects TopK fusion.

The expected output properly shows the optimization: the inner EnumerableLimit(fetch=[25]) + EnumerableSort pattern is fused into a single EnumerableLimitSort(sort0=[$0], dir0=[DESC-nulls-last], fetch=[25]), while preserving the outer system limit (EnumerableLimit(fetch=[10000])). This aligns with the PR objective to reduce memory usage by avoiding full in-memory sorts.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java (2)

44-46: LGTM! Rule registration follows existing patterns.

The new ENUMERABLE_TOP_K_RULE follows the established pattern for rule instantiation and naming conventions (UPPER_SNAKE_CASE for constants).


66-68: Verify rule ordering is intentional.

The ENUMERABLE_TOP_K_RULE is placed before EXPAND_COLLATION_ON_PROJECT_EXPR. Please confirm this ordering is intentional, as the TopK transformation should occur before collation expansion to ensure proper optimization.

integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml (1)

15-17: Verify semantic equivalence of operator reordering.

The physical plan now has EnumerableLimit(fetch=[10000]) above EnumerableCalc, whereas the previous structure had them in reverse order. While this may be a valid optimization, please verify that limiting before calculating (projection) is semantically equivalent in this correlated subquery context—especially since the projection changes the column ordering (id, name, salary from name, id, salary).

integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest.yaml (1)

11-14: LGTM! Correctly demonstrates Limit+Sort fusion.

The expected output properly reflects the TopK optimization: EnumerableLimit(fetch=[10000]) + EnumerableSort(sort0=[$5], dir0=[ASC]) are fused into EnumerableLimitSort(sort0=[$5], dir0=[ASC], fetch=[10000]). The downstream window operators and index scan remain unchanged.

integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_distinct_count.yaml (1)

11-14: LGTM! Consistent with TopK fusion pattern.

The expected output correctly shows the Limit+Sort operators fused into a single EnumerableLimitSort while preserving downstream window operators and the index scan context.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKRule.java (2)

17-26: LGTM! Well-documented rule implementation.

The class-level JavaDoc clearly explains the transformation purpose. The rule correctly extends InterruptibleRelRule and follows the established pattern in this codebase.


51-58: Rule predicate correctly identifies TopK candidates.

The predicate ensures the rule only matches LogicalSort nodes that have:

  1. A fetch or offset (making it a "top" operation)
  2. Non-empty field collations (actual ordering is specified)

This correctly filters for genuine TopK scenarios.

integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_null_bucket.yaml (1)

11-15: LGTM! Consistent TopK fusion pattern.

The expected output correctly demonstrates the Limit+Sort fusion into EnumerableLimitSort while preserving the windowing and index scan operators downstream.

yuancu
yuancu previously approved these changes Dec 24, 2025
Copy link
Collaborator

@yuancu yuancu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@LantaoJin
Copy link
Member Author

LantaoJin commented Dec 25, 2025

Let's suspend on merging until #4992 merged

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java (1)

192-195: Clarify or resolve the TODO comment.

The TODO comment "check on adding" is vague and doesn't explain what verification is needed. Consider either:

  1. Expanding the comment to describe what should be checked (e.g., "TODO: Consider checking digest equality during add() to prevent duplicates earlier")
  2. Removing it if no action is required

Additionally, consider adding JavaDoc for this public method to document its purpose and when it should be used versus containsDigest(Object).

🔎 Suggested improvement
-  // TODO check on adding
+  /**
+   * Check if the most recently pushed operation has the specified digest.
+   * This is useful for preventing duplicate consecutive operations.
+   *
+   * @param digest The digest to compare against the top operation
+   * @return true if the last operation's digest equals the provided digest
+   */
   public boolean containsDigestOnTop(Object digest) {
     return this.queue.peekLast() != null && this.queue.peekLast().digest().equals(digest);
   }
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java (1)

439-443: LGTM! Consider clarifying the comment.

The defensive guard against duplicate consecutive limit digests is a sound fix. Creating the digest once and reusing it is efficient.

The comment "avoid stack overflow" could be more descriptive about the root cause. Consider explaining the scenario that triggers recursive pushdown attempts:

🔎 Suggested comment improvement
         LimitDigest digest = new LimitDigest(limit, offset);
         if (pushDownContext.containsDigestOnTop(digest)) {
-          // avoid stack overflow
+          // Avoid infinite recursion: if the same limit was just pushed, the planner
+          // may re-invoke this rule on the same node. Return null to break the cycle.
           return null;
         }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2a88ae6 and 58012e9.

📒 Files selected for processing (9)
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multisearch_timestamp.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_top_k_then_sort_push.yaml
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java
🚧 Files skipped from review as they are similar to previous changes (1)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multisearch_timestamp.yaml
🧰 Additional context used
📓 Path-based instructions (2)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
integ-test/src/test/resources/**/*

⚙️ CodeRabbit configuration file

integ-test/src/test/resources/**/*: - Verify test data is realistic and representative

  • Check data format matches expected schema
  • Ensure test data covers edge cases and boundary conditions

Files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_top_k_then_sort_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_top_k_then_sort_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (58)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: test-sql-cli-integration (21)
  • GitHub Check: test-sql-cli-integration (21)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: WhiteSource Security Check
  • GitHub Check: WhiteSource Security Check
🔇 Additional comments (9)
integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml (1)

13-15: LGTM! TopK optimization correctly reflected in expected output.

The physical plan now uses EnumerableLimitSort to fuse the limit and sort operations into a single TopK operator, which aligns perfectly with the PR objective to avoid OOM by preventing full in-memory sorting of all rows.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml (1)

15-17: LGTM! Consistent TopK optimization across test cases.

The expected output correctly shows the EnumerableLimitSort fusion, demonstrating that the TopK optimization applies consistently across different query patterns.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java (2)

42-43: LGTM! New TopK rule properly declared.

The ENUMERABLE_TOP_K_RULE follows the established pattern for rule declarations in this class and uses the standard configuration approach.


63-63: LGTM! Rule positioning appears intentional.

The rule is placed after RARE_TOP_PUSH_DOWN and before EXPAND_COLLATION_ON_PROJECT_EXPR, which allows rare/top pushdown operations to complete before applying the TopK optimization. This ordering supports the goal of preventing OOM by ensuring TopK is applied at the appropriate stage of query optimization.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java (1)

444-449: Verify test coverage for the EnumerableTopKRule integration path with this limit pushdown logic.

The code change correctly reuses the precomputed digest object, and pushDownLimit itself has comprehensive test coverage across multiple test classes. However, tests for the new EnumerableTopKRule integration are missing. Consider adding tests that specifically verify:

  1. The EnumerableTopKRule optimization path exercising this pushdown logic
  2. The scenario where duplicate limits would have caused issues (as mentioned in the PR objective)
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml (1)

12-12: The composite source ordering correctly reflects the logical plan.

The state ordering has been updated to "desc" which correctly matches the logical plan's dir1=[DESC-nulls-last] for the state field ($3), and gender remains "asc" matching dir0=[ASC-nulls-first].

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml (1)

12-12: Ordering difference between primary and alternative expected outputs is intentional and represents non-deterministic physical plan generation.

The alternative expected output file is correctly set up to handle non-deterministic ordering of script_fields (new_state/new_gender vs new_gender/new_state) during physical plan generation. The test code explicitly loads and compares both primary and alternative files using assertYamlEqualsIgnoreId, confirming that both orderings are valid and acceptable. This pattern is consistently applied across all 8 alternative expected output files in the calcite directory, indicating that HashMap-based collections during plan generation can produce either ordering non-deterministically. The test structure is correct as-is.

integ-test/src/test/resources/expectedOutput/calcite/explain_top_k_then_sort_push.yaml (2)

17-17: Cosmetic formatting change to OpenSearchRequestBuilder.

The startFrom=0 parameter placement has been reformatted. This is a cosmetic change with no functional impact.


9-10: The concern about operator ordering is based on a misinterpretation of the query plan tree structure. In Calcite plans, indentation indicates parent-child relationships. EnumerableCalc is the parent (root) operator, and EnumerableLimit is a child operator beneath it. Data flows from the leaf (IndexScan) upward through Sort and Limit before reaching Calc at the root. This means Limit actually executes before the projection, filtering to 10,000 rows first and then projecting the age column—which is the optimal order. This operator arrangement is consistent across eight similar test files in the same directory and matches the expected behavior of the TopK optimization that pushes the balance sort (fetch=5) to the index scan.

Likely an incorrect or invalid review comment.

CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]])
physical: |
CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[AGGREGATION->rel#:LogicalAggregate.NONE.[](input=LogicalProject#,group={0, 1},agg#0=LITERAL_AGG(2)), LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"gender":{"terms":{"field":"gender.keyword","missing_bucket":false,"order":"asc"}}},{"state":{"terms":{"field":"state.keyword","missing_bucket":false,"order":"asc"}}}]},"aggregations":{"$f2":{"top_hits":{"from":0,"size":2,"version":false,"seq_no_primary_term":false,"explain":false,"_source":{"includes":["gender","state","account_number","age"],"excludes":[]},"script_fields":{"new_state":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["state.keyword"]}},"ignore_failure":false},"new_gender":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["gender.keyword"]}},"ignore_failure":false}}}}}}}}, requestedTotalSize=10000, pageSize=null, startFrom=0)])
CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[AGGREGATION->rel#:LogicalAggregate.NONE.[](input=LogicalProject#,group={0, 1},agg#0=LITERAL_AGG(2)), LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"gender":{"terms":{"field":"gender.keyword","missing_bucket":false,"order":"asc"}}},{"state":{"terms":{"field":"state.keyword","missing_bucket":false,"order":"desc"}}}]},"aggregations":{"$f2":{"top_hits":{"from":0,"size":2,"version":false,"seq_no_primary_term":false,"explain":false,"_source":{"includes":["gender","state","account_number","age"],"excludes":[]},"script_fields":{"new_gender":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["gender.keyword"]}},"ignore_failure":false},"new_state":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["state.keyword"]}},"ignore_failure":false}}}}}}}}, requestedTotalSize=10000, pageSize=null, startFrom=0)])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change {"field":"state.keyword","missing_bucket":false,"order":"asc"} to{"field":"state.keyword","missing_bucket":false,"order":"desc"}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we prevent push down CalciteEnumerableTopK in SortIndexScanRule? I think we should only push down an EnumerableSort for a physical level plan.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]])
physical: |
CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[AGGREGATION->rel#:LogicalAggregate.NONE.[](input=LogicalProject#,group={0, 1},agg#0=LITERAL_AGG(2)), LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"gender":{"terms":{"field":"gender.keyword","missing_bucket":false,"order":"asc"}}},{"state":{"terms":{"field":"state.keyword","missing_bucket":false,"order":"asc"}}}]},"aggregations":{"$f2":{"top_hits":{"from":0,"size":2,"version":false,"seq_no_primary_term":false,"explain":false,"_source":{"includes":["gender","state","account_number","age"],"excludes":[]},"script_fields":{"new_gender":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["gender.keyword"]}},"ignore_failure":false},"new_state":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["state.keyword"]}},"ignore_failure":false}}}}}}}}, requestedTotalSize=10000, pageSize=null, startFrom=0)])
CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[AGGREGATION->rel#:LogicalAggregate.NONE.[](input=LogicalProject#,group={0, 1},agg#0=LITERAL_AGG(2)), LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"gender":{"terms":{"field":"gender.keyword","missing_bucket":false,"order":"asc"}}},{"state":{"terms":{"field":"state.keyword","missing_bucket":false,"order":"desc"}}}]},"aggregations":{"$f2":{"top_hits":{"from":0,"size":2,"version":false,"seq_no_primary_term":false,"explain":false,"_source":{"includes":["gender","state","account_number","age"],"excludes":[]},"script_fields":{"new_state":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["state.keyword"]}},"ignore_failure":false},"new_gender":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["gender.keyword"]}},"ignore_failure":false}}}}}}}}, requestedTotalSize=10000, pageSize=null, startFrom=0)])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

yuancu
yuancu previously approved these changes Dec 26, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
integ-test/src/test/resources/expectedOutput/calcite/agg_composite_multi_terms_autodate_sort_agg_measure_not_push.yaml (1)

12-14: Physical plan transformation is correct; add edge case tests for fetch boundary values.

The transformation from separate EnumerableLimit+EnumerableSort to EnumerableLimitSort correctly preserves:

  • Sort column reference ($1 = cnt field)
  • Fetch limit (10000)
  • Downstream operators (EnumerableCalc, CalciteEnumerableIndexScan)
  • PushDownContext for aggregation pushdown

This aligns with the PR objective to reduce memory usage via TopK optimization.

However, test coverage lacks boundary conditions for fetch limits. While the test suite comprehensively covers various aggregation scenarios and sort directions (ASC-nulls-first, DESC-nulls-last), it consistently uses fetch=[10000]. Add tests with:

  • Small fetch values (e.g., fetch=[1], fetch=[10])
  • Edge cases (fetch=[0], fetch exceeding result size)
  • Missing sort direction variant (ASC-nulls-last)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 58012e9 and ba8d9df.

📒 Files selected for processing (24)
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_autodate_range_metric_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_autodate_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_multi_terms_autodate_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_range_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_null_str.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_use_other.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_multi_buckets_not_pushed.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_uncorrelated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_uncorrelated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_sort_agg_push.json
  • integ-test/src/test/resources/expectedOutput/calcite/explain_sort_pass_through_join_then_pushdown.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_timechart.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_timechart_count.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_top_k_then_sort_push.yaml
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKConverterRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKMergeRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
🧰 Additional context used
📓 Path-based instructions (2)
integ-test/src/test/resources/**/*

⚙️ CodeRabbit configuration file

integ-test/src/test/resources/**/*: - Verify test data is realistic and representative

  • Check data format matches expected schema
  • Ensure test data covers edge cases and boundary conditions

Files:

  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_autodate_range_metric_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_multi_terms_autodate_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_autodate_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_range_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_sort_agg_push.json
  • integ-test/src/test/resources/expectedOutput/calcite/chart_null_str.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_sort_pass_through_join_then_pushdown.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_uncorrelated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_timechart_count.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_uncorrelated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_timechart.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_multi_buckets_not_pushed.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_top_k_then_sort_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_use_other.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yaml
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKConverterRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKMergeRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKConverterRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKMergeRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_autodate_range_metric_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_multi_terms_autodate_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_autodate_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_range_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_sort_agg_push.json
  • integ-test/src/test/resources/expectedOutput/calcite/chart_null_str.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_sort_pass_through_join_then_pushdown.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_uncorrelated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_timechart_count.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_uncorrelated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_timechart.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_multi_buckets_not_pushed.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_top_k_then_sort_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_use_other.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yaml
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_range_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_sort_agg_push.json
  • integ-test/src/test/resources/expectedOutput/calcite/chart_null_str.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_sort_pass_through_join_then_pushdown.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_uncorrelated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_uncorrelated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_use_other.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yaml
🧬 Code graph analysis (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKConverterRule.java (3)
legacy/src/main/java/org/opensearch/sql/legacy/query/planner/logical/node/Sort.java (1)
  • Sort (16-48)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKMergeRule.java (1)
  • Value (19-56)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java (1)
  • CalciteEnumerableTopK (21-54)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKMergeRule.java (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKConverterRule.java (1)
  • Value (22-66)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java (1)
  • CalciteEnumerableTopK (21-54)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: test-sql-cli-integration (21)
  • GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (30)
integ-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yaml (1)

22-31: LGTM! The expected output correctly reflects the TopK optimization.

The physical plan now starts with EnumerableLimitSort (line 22), which properly fuses the limit and sort operations instead of using separate EnumerableLimit + EnumerableSort operators. This aligns with the PR's objective to reduce memory usage by using a TopK operator.

The plan structure is consistent with the logical plan:

  • fetch=[10000] matches LogicalSystemLimit
  • Sort directions and columns match LogicalSort
  • Downstream operators (aggregate, calc, merge join, index scans) are preserved correctly

Based on learnings, this change correctly tests the SQL optimization path for Calcite integration.

integ-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yaml (1)

21-35: LGTM! Physical plan correctly demonstrates the TopK optimization.

The expected output properly reflects the fusion of EnumerableLimit + EnumerableSort into a single EnumerableLimitSort at the top of the physical plan. The sort keys, directions, and fetch limit (10000) are consistent with the logical plan. The inner EnumerableSort operators (lines 27, 30) are appropriately preserved for the EnumerableMergeJoin which requires sorted inputs.

This test case is representative—covering multiple group keys, left joins, window functions, and aggregation pushdowns—and aligns with the PR objective of reducing memory usage for limit+sort patterns. Based on learnings, this validates the SQL generation and optimization paths for Calcite integration changes.

integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push2.yaml (1)

11-12: LGTM! Correct TopK optimization transformation.

The physical plan correctly consolidates the separate limit and sort operations from the logical plan into a single EnumerableLimitSort operator. The sort columns ($0, $1), directions (ASC-nulls-first), and fetch limit (10000) accurately match the logical plan specification.

integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push1.yaml (1)

11-12: LGTM! Correct TopK optimization with different sort columns.

The physical plan correctly transforms the nested limit-over-sort pattern into a unified EnumerableLimitSort operator. The sort parameters (sort0=[$0], sort1=[$2]) properly reflect the logical plan, providing complementary test coverage for different column index combinations.

integ-test/src/test/resources/expectedOutput/calcite/agg_composite_autodate_sort_agg_measure_not_push.yaml (1)

12-13: LGTM! Correct single-column TopK optimization.

The physical plan correctly applies the EnumerableLimitSort optimization for a single-column sort scenario. The parameters (sort0=[$1], dir0=[ASC-nulls-first], fetch=[10000]) accurately match the logical plan, and this test case provides valuable coverage for temporal aggregations with auto_date_histogram alongside the TopK transformation.

integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml (1)

14-21: LGTM! Physical plan correctly reflects the TopK optimization.

The consolidation of limit and sort into EnumerableLimitSort with fetch=[10000] is correct. The plan maintains:

  • Proper correlation handling via EnumerableCorrelate with $cor0
  • Correct pushdown of filter and limit to the inner scan (10000 subsearch limit)
  • Accurate projection of [name, id, salary] for the outer worker scan

The test expectation is realistic and representative of the correlated EXISTS subquery pattern. Based on learnings, this correctly tests the SQL optimization path for Calcite integration changes.

integ-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yaml (1)

14-22: LGTM! Correctly represents IN correlated subquery with TopK optimization.

The plan structure is well-formed:

  • EnumerableLimitSort at line 16 consolidates limit and sort as intended
  • The additional EnumerableCalc at line 17 correctly implements the IN condition ($t0=$t3) between outer and inner values
  • Inner scan properly pushes FILTER and LIMIT->10000 to OpenSearch

Test data covers the IN subquery edge case distinct from EXISTS, ensuring comprehensive test coverage for the optimization.

integ-test/src/test/resources/expectedOutput/calcite/explain_exists_uncorrelated_subquery.yaml (1)

14-20: LGTM! Correctly differentiates uncorrelated EXISTS from correlated pattern.

The use of EnumerableNestedLoopJoin(condition=[true]) instead of EnumerableCorrelate is semantically correct for uncorrelated subqueries. Key observations:

  • The join type inner with condition=[true] implements EXISTS correctly - if inner produces rows, the cross join succeeds
  • EnumerableAggregate(group=[{0}]) ensures single-row output for the EXISTS check
  • EnumerableLimitSort consolidation at line 15 applies the TopK optimization as intended

Test coverage is comprehensive with this file covering the uncorrelated EXISTS edge case alongside the correlated variants in other files.

integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml (1)

12-13: LGTM! Physical plan correctly reflects the TopK optimization.

The physical plan now uses the fused EnumerableLimitSort operator instead of the two-step EnumerableLimitEnumerableSort sequence. This aligns with the PR objective to reduce memory usage by applying TopK directly rather than sorting all results before limiting. The sort key ($7 = @timestamp), direction (DESC-nulls-last), and fetch limit (10000) are all preserved correctly.

Based on learnings, ensure that the corresponding test execution validates both the correctness of the results and the optimization path.

integ-test/src/test/resources/expectedOutput/calcite/explain_in_uncorrelated_subquery.yaml (1)

13-17: LGTM! Test expectation correctly reflects the TopK optimization.

The updated physical plan accurately demonstrates the PR's optimization goal:

  • Line 14 introduces EnumerableLimitSort, which combines the previously separate limit and sort operations into a single TopK operator
  • The sort field ($2 = salary), direction (DESC-nulls-last), and fetch limit (10000) are correctly preserved from the logical plan
  • The operator placement is correct: positioned between the EnumerableCalc projection and the EnumerableHashJoin to sort the join results before the final projection
  • This optimization reduces memory usage by avoiding a full in-memory sort before applying the limit, directly addressing the OOM issues described in issue #4982

Based on learnings, verify that the SQL generation and optimization paths work correctly end-to-end with this new operator.

integ-test/src/test/resources/expectedOutput/calcite/chart_use_other.yaml (1)

21-30: Transformation is correct; edge case coverage confirmed.

The conversion from EnumerableLimit + EnumerableSort to EnumerableLimitSort correctly preserves the sort keys, directions, and fetch limit while consolidating the operations. The test data is realistic (OTEL logs) and the schema is consistent.

The DESC-nulls-last edge case with small fetch values from issue #4982 is already covered: q28.yaml and q29.yaml both contain EnumerableLimitSort(sort0=[$0], dir0=[DESC-nulls-last], fetch=[25]). This test file appropriately covers the ASC multi-key sorting variant.

integ-test/src/test/resources/expectedOutput/calcite/agg_composite_range_sort_agg_measure_not_push.yaml (1)

12-13: LGTM: Correct TopK optimization applied.

The transformation from separate EnumerableLimit and EnumerableSort operators to a single EnumerableLimitSort correctly implements the TopK optimization described in the PR objectives. The fused operator preserves all parameters (sort on column $1, ASC-nulls-first direction, fetch=10000) and maintains the CalciteEnumerableIndexScan child with its aggregation pushdown context intact. This should reduce memory consumption by avoiding a full in-memory sort before applying the limit.

Based on learnings, this test validates the Calcite optimization path for the TopK transformation.

integ-test/src/test/resources/expectedOutput/calcite/chart_null_str.yaml (2)

22-22: LGTM! EnumerableLimitSort correctly fuses limit and sort operations.

The physical plan correctly applies the TopK optimization by replacing the separate LogicalSystemLimit + LogicalSort pattern (lines 3-4) with a single EnumerableLimitSort operator. This aligns with the PR objective to prevent OOM by avoiding full in-memory sorts when fetch limits are present.


23-39: Physical plan structure is consistent and test coverage is comprehensive.

The downstream operators correctly preserve the logical plan semantics:

  • Line 36 explicitly handles the IS NOT NULL filter condition matching the logical plan (line 15)
  • Join, aggregation, and window function operators are properly structured
  • Index scan pushdown contexts appropriately filter and project required fields

The test provides good coverage with null handling, multi-key sorting, joins, aggregations, and window functions.

integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_multi_buckets_not_pushed.yaml (1)

11-12: LGTM!

The expected output correctly reflects the TopK optimization: the separate EnumerableLimit and EnumerableSort operators are fused into a single EnumerableLimitSort with the appropriate sort key, direction, and fetch limit preserved. The underlying CalciteEnumerableIndexScan and its PushDownContext remain unchanged, which is expected behavior.

integ-test/src/test/resources/expectedOutput/calcite/explain_timechart.yaml (1)

22-39: LGTM!

The physical plan correctly demonstrates the TopK fusion at the top level with EnumerableLimitSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC], fetch=[10000]). The restructured plan maintains semantic equivalence while enabling the memory-efficient TopK operation. Both CalciteEnumerableIndexScan nodes retain their respective PushDownContext configurations.

integ-test/src/test/resources/expectedOutput/calcite/explain_timechart_count.yaml (1)

22-35: LGTM!

The expected output follows the same TopK fusion pattern as other timechart tests, correctly replacing the limit+sort sequence with EnumerableLimitSort. The plan structure maintains proper semantics for the count-based timechart query.

integ-test/src/test/resources/expectedOutput/calcite/explain_sort_agg_push.json (1)

4-4: LGTM!

The JSON expected output correctly reflects the TopK fusion: EnumerableLimitSort now directly wraps CalciteEnumerableIndexScan with the aggregation pushdown preserved in the PushDownContext. This aligns with the optimization objective.

integ-test/src/test/resources/expectedOutput/calcite/explain_top_k_then_sort_push.yaml (1)

9-16: LGTM!

This test correctly demonstrates the TopK-then-sort optimization pattern. The inner TopK (fetch=5, sorted by balance) is pushed down to OpenSearch via PushDownContext, while the outer sort (by age) with system limit (fetch=10000) uses EnumerableLimitSort. This is the expected behavior for queries requiring a secondary sort after initial TopK retrieval.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java (2)

42-45: LGTM!

The new TopK rules follow the established pattern used by other rules in this class. The Config.DEFAULT.toRule() instantiation is consistent with existing rule declarations.


65-66: Verify the rule ordering is intentional.

The TopK rules are placed after RARE_TOP_PUSH_DOWN and before EXPAND_COLLATION_ON_PROJECT_EXPR. Since Calcite rule ordering can affect optimization outcomes, please confirm this placement is intentional and that the converter rule should fire before the merge rule in the optimization sequence.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKMergeRule.java (3)

15-23: LGTM!

The class follows the established InterruptibleRelRule pattern used by other OpenSearch planning rules. The protected constructor and @Value.Enclosing annotation are correctly applied for Immutables integration.


25-33: Implementation correctly merges Limit + Sort into TopK.

The rule properly extracts the collation from EnumerableSort and the offset/fetch from EnumerableLimit, then creates a CalciteEnumerableTopK using the sort's input. This correctly bypasses both original operators.


35-55: Config correctly specifies the operand pattern.

The predicate sort.fetch == null ensures this rule only matches EnumerableSort nodes without their own fetch limit, which is the correct pattern for merging a separate EnumerableLimit with an underlying full sort. The oneInput constraint properly enforces the parent-child relationship.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKConverterRule.java (3)

17-27: LGTM!

The class structure and JavaDoc clearly describe the rule's purpose: converting LogicalSort nodes with fetch/offset and non-empty collation directly into CalciteEnumerableTopK. This complements EnumerableTopKMergeRule by handling cases where the limit is already part of the LogicalSort rather than a separate EnumerableLimit.


29-43: Convention conversion is handled correctly.

The convert() call properly transforms the input to EnumerableConvention before creating the CalciteEnumerableTopK. This ensures the physical operator receives properly converted children.


45-65: Config predicate correctly identifies TopK candidates.

The predicate (sort.fetch != null || sort.offset != null) && !sort.collation.getFieldCollations().isEmpty() ensures:

  1. The sort has limit semantics (fetch or offset present)
  2. The sort has actual ordering (non-empty collation)

This correctly filters out pure sorts without limits and pure limits without ordering.

integ-test/src/test/resources/expectedOutput/calcite/agg_composite_autodate_range_metric_sort_agg_measure_not_push.yaml (1)

12-13: Transformation correctly implements TopK optimization.

The physical plan correctly merges LogicalSystemLimit and LogicalSort into a single EnumerableLimitSort operator. Parameters match precisely:

  • Sort field: $1 (cnt field)
  • Sort direction: ASC-nulls-first
  • Fetch limit: 10000

This aligns with the PR objective to reduce memory consumption through TopK semantics. Edge cases are well-covered across the test suite: EnumerableLimitSort is used in 30 test files where both limit and sort are present; separate EnumerableSort appears in queries without limits; and standalone EnumerableLimit appears in queries without sort requirements.

integ-test/src/test/resources/expectedOutput/calcite/explain_sort_pass_through_join_then_pushdown.yaml (2)

17-22: Test data is valid and properly formatted for string comparison.

The YAML file uses a literal block scalar (pipe | syntax) where the SORT->[{...}] configuration is treated as plain text, not parsed as YAML structure. The test framework's assertYamlEqualsIgnoreId() method performs string comparison via assertEquals() after normalizing line breaks and IDs—it does not parse the embedded JSON-like syntax.

The multi-line SORT configuration is correct:

  • SORT->[{"firstname" : {"order" : "asc", "missing" : "_last"}}] matches the OpenSearchRequestBuilder sort specification
  • Indentation and format are consistent with similar working test files (e.g., asc_sort_timestamp.yaml, desc_sort_timestamp.yaml)
  • The test method testExplainSortPassThroughJoinThenPushdown() successfully compares this expected output against actual query results

No changes required—the test data is realistic, matches the expected schema, and correctly represents the join sort passthrough scenario.


13-14: Merge join optimization correctly implements sort pushdown; plan shows expected architecture.

The transformation from EnumerableLimit + EnumerableSort to EnumerableLimitSort aligns with the PR's TopK optimization. The switch to EnumerableMergeJoin is valid: both inputs are correctly sorted by their join keys—left by $13 (initial) and right by $15 (firstname)—as shown by the PushDownContext entries.

The right side PushDownContext explicitly includes SORT->[{"firstname":{"order":"asc","missing":"_last"}}] and the OpenSearch request builder reflects this sort directive. Similar merge join sort pushdown scenarios are validated in CalcitePPLJoinIT.testSimpleSortPushDownForSMJ and related tests with actual result verification, confirming the pattern works as intended.

Signed-off-by: Lantao Jin <[email protected]>
yuancu
yuancu previously approved these changes Dec 26, 2025
return offset > 0 ? sort.copy(sort.getTraitSet(), List.of(newScan)) : newScan;
} else {
LimitDigest digest = new LimitDigest(limit, offset);
if (pushDownContext.containsDigestOnTop(digest)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] Curious when and how could it happens with this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue seems only happens without pruning. Now I cannot reproduce it. But for multisearch test testExplainMultisearchTimestampInterleaving, we can remove the second duplicated LIMIT pushdown: LIMIT->5, LIMIT->5. We can keep this logic.

CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]])
physical: |
CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[AGGREGATION->rel#:LogicalAggregate.NONE.[](input=LogicalProject#,group={0, 1},agg#0=LITERAL_AGG(2)), LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"gender":{"terms":{"field":"gender.keyword","missing_bucket":false,"order":"asc"}}},{"state":{"terms":{"field":"state.keyword","missing_bucket":false,"order":"asc"}}}]},"aggregations":{"$f2":{"top_hits":{"from":0,"size":2,"version":false,"seq_no_primary_term":false,"explain":false,"_source":{"includes":["gender","state","account_number","age"],"excludes":[]},"script_fields":{"new_state":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["state.keyword"]}},"ignore_failure":false},"new_gender":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["gender.keyword"]}},"ignore_failure":false}}}}}}}}, requestedTotalSize=10000, pageSize=null, startFrom=0)])
CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[AGGREGATION->rel#:LogicalAggregate.NONE.[](input=LogicalProject#,group={0, 1},agg#0=LITERAL_AGG(2)), LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"gender":{"terms":{"field":"gender.keyword","missing_bucket":false,"order":"asc"}}},{"state":{"terms":{"field":"state.keyword","missing_bucket":false,"order":"desc"}}}]},"aggregations":{"$f2":{"top_hits":{"from":0,"size":2,"version":false,"seq_no_primary_term":false,"explain":false,"_source":{"includes":["gender","state","account_number","age"],"excludes":[]},"script_fields":{"new_gender":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["gender.keyword"]}},"ignore_failure":false},"new_state":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["state.keyword"]}},"ignore_failure":false}}}}}}}}, requestedTotalSize=10000, pageSize=null, startFrom=0)])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we prevent push down CalciteEnumerableTopK in SortIndexScanRule? I think we should only push down an EnumerableSort for a physical level plan.

import org.checkerframework.checker.nullness.qual.Nullable;

/** The different between this and {@link EnumerableLimitSort} is the cost. */
public class CalciteEnumerableTopK extends EnumerableLimitSort {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing Override copy method. If not, this class will downgrade to its parent later when do copy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing Override copy method. If not, this class will downgrade to its parent later when do copy.

@coderabbitai This is a good finding. Can you learn something from it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LantaoJin, absolutely! This is a valuable pattern to remember. I've noted this for future reviews.


✏️ Learnings added
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:03.491Z
Learning: In opensearch-project/sql, when creating custom Calcite RelNode classes that extend EnumerableLimitSort or other Calcite RelNode types, always override the `copy` method. Without overriding copy, the class will downgrade to its parent class type during copy operations, losing the custom implementation.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.

Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Signed-off-by: Lantao Jin <[email protected]>
@yuancu yuancu merged commit 08be6f9 into opensearch-project:main Dec 29, 2025
40 of 41 checks passed
@LantaoJin LantaoJin deleted the pr/issues/4982 branch December 29, 2025 07:08
opensearch-trigger-bot bot pushed a commit that referenced this pull request Dec 29, 2025
* Support enumerable TopK

Signed-off-by: Lantao Jin <[email protected]>

* Support merging ELimit+ESort to ETopK

Signed-off-by: Lantao Jin <[email protected]>

* remove the converter rule

Signed-off-by: Lantao Jin <[email protected]>

* address commments

Signed-off-by: Lantao Jin <[email protected]>

---------

Signed-off-by: Lantao Jin <[email protected]>
(cherry picked from commit 08be6f9)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
LantaoJin pushed a commit that referenced this pull request Dec 29, 2025
* Support enumerable TopK



* Support merging ELimit+ESort to ETopK



* remove the converter rule



* address commments



---------


(cherry picked from commit 08be6f9)

Signed-off-by: Lantao Jin <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19-dev enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Clickbench PPL q28 is easily OOM due to sort all data in memory

3 participants