-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Hll Sketch estimate with error bounds and Theta sketch estimate with error bounds can now be used as an expression #18426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few minor comments
|
||
public static class HLLSketchEstimateExprMacro implements ExprMacroTable.ExprMacro | ||
{ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unnecessary whitespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method has an extra random newline while the others don't and the OCD in me wants to remove it (so they all look the same).
return ExprEval.ofDoubleArray(new Double[]{0.0D, 0.0D, 0.0D}); | ||
} | ||
HllSketchHolder sketch = HllSketchHolder.fromObj(valObj); | ||
return ExprEval.ofDoubleArray(new Double[]{sketch.getEstimate(), sketch.getLowerBound(numStdDevs), sketch.getUpperBound(numStdDevs)}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming it doesn't matter that SketchEstimateWithErrorBounds
ctor used in ThetaSketch puts the parameter ordering like:
estimate
high
low
numstdev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. HLL and Theta Sketch are different functions. The inconsistency already exists with the PostAggs (group by) versions for these functions. I am keeping the HLL consistence between using as an expression and as an PostAggs and similarly the Theta Sketch consistence between using as an expression and as an PostAggs.
SketchHolder thetaSketchHolder = (SketchHolder) valObj; | ||
return ExprEval.ofComplex(THETA_SKETCH_ESTIMATE_WITH_ERROR_BOUNDS_TYPE, thetaSketchHolder.getEstimateWithErrorBounds(numStdDevs)); | ||
} else { | ||
throw new IllegalArgumentException("requires a ThetaSketch as the argument"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: title-case exception message
|
||
public static class HllSketchEstimateWithErrorBoundsExpr extends ExprMacroTable.BaseScalarMacroFunctionExpr | ||
{ | ||
private Expr estimateExpr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: might be worth marking these estimateExpr
and numStdDev
final in both HllSketchEstimateWithErrorBoundsExpr
and ThetaSketchEstimateWithErrorBoundsExpr
.
Thanks for the review @jtuglu1 |
Description
Similar to #14312, Hll Sketch estimate with error bounds and Theta sketch estimate with error bounds can now be used as an expression. Hll Sketch estimate with error bounds and Theta sketch estimate with error bound have been used only as PostAggs as they work on groupBy and TopN queries. But postAggs are not supported for scan queries. In this PR we introduce HLL_SKETCH_ESTIMATE_WITH_ERROR_BOUNDS and THETA_SKETCH_ESTIMATE_WITH_ERROR_BOUNDS as expressions. These estimates work on a sketch column and has the same behavior as the postAggs. New test cases have been added to show how scan queries can work with these estimates.
This PR has: