Skip to content

Conversation

maytasm
Copy link
Contributor

@maytasm maytasm commented Aug 21, 2025

Description

Similar to #14312, Hll Sketch estimate with error bounds and Theta sketch estimate with error bounds can now be used as an expression. Hll Sketch estimate with error bounds and Theta sketch estimate with error bound have been used only as PostAggs as they work on groupBy and TopN queries. But postAggs are not supported for scan queries. In this PR we introduce HLL_SKETCH_ESTIMATE_WITH_ERROR_BOUNDS and THETA_SKETCH_ESTIMATE_WITH_ERROR_BOUNDS as expressions. These estimates work on a sketch column and has the same behavior as the postAggs. New test cases have been added to show how scan queries can work with these estimates.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@jtuglu1 jtuglu1 self-requested a review August 22, 2025 04:39
Copy link
Contributor

@jtuglu1 jtuglu1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few minor comments


public static class HLLSketchEstimateExprMacro implements ExprMacroTable.ExprMacro
{

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unnecessary whitespace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method has an extra random newline while the others don't and the OCD in me wants to remove it (so they all look the same).

return ExprEval.ofDoubleArray(new Double[]{0.0D, 0.0D, 0.0D});
}
HllSketchHolder sketch = HllSketchHolder.fromObj(valObj);
return ExprEval.ofDoubleArray(new Double[]{sketch.getEstimate(), sketch.getLowerBound(numStdDevs), sketch.getUpperBound(numStdDevs)});
Copy link
Contributor

@jtuglu1 jtuglu1 Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming it doesn't matter that SketchEstimateWithErrorBounds ctor used in ThetaSketch puts the parameter ordering like:

estimate
high
low
numstdev

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. HLL and Theta Sketch are different functions. The inconsistency already exists with the PostAggs (group by) versions for these functions. I am keeping the HLL consistence between using as an expression and as an PostAggs and similarly the Theta Sketch consistence between using as an expression and as an PostAggs.

SketchHolder thetaSketchHolder = (SketchHolder) valObj;
return ExprEval.ofComplex(THETA_SKETCH_ESTIMATE_WITH_ERROR_BOUNDS_TYPE, thetaSketchHolder.getEstimateWithErrorBounds(numStdDevs));
} else {
throw new IllegalArgumentException("requires a ThetaSketch as the argument");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: title-case exception message


public static class HllSketchEstimateWithErrorBoundsExpr extends ExprMacroTable.BaseScalarMacroFunctionExpr
{
private Expr estimateExpr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: might be worth marking these estimateExpr and numStdDev final in both HllSketchEstimateWithErrorBoundsExpr and ThetaSketchEstimateWithErrorBoundsExpr.

@maytasm maytasm merged commit 3cdf45f into apache:master Aug 26, 2025
46 checks passed
@maytasm maytasm deleted the with_error_bounds branch August 26, 2025 16:19
@maytasm
Copy link
Contributor Author

maytasm commented Aug 26, 2025

Thanks for the review @jtuglu1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants