Skip to content

Commit 30152d0

Browse files
zhengruifengHyukjinKwon
authored andcommitted
[SPARK-49387][PYTHON] Fix type hint for accuracy in percentile_approx and approx_percentile
### What changes were proposed in this pull request? Fix type hint for `accuracy` in `percentile_approx` and `approx_percentile` ### Why are the changes needed? float `accuracy` is not supported: ``` In [9]: df.select(approx_percentile("value", [0.25, 0.5, 0.75], 1.1).alias("quantiles")).show() ... AnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "approx_percentile(value, array(0.25, 0.5, 0.75), 1.1)" due to data type mismatch: The third parameter requires the "INTEGRAL" type, however "1.1" has the type "DOUBLE". SQLSTATE: 42K09; ``` ### Does this PR introduce _any_ user-facing change? yes, minor doc change ### How was this patch tested? CI ### Was this patch authored or co-authored using generative AI tooling? No Closes #47869 from zhengruifeng/py_approx_percentile_acc. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
1 parent 8409da3 commit 30152d0

File tree

2 files changed

+6
-6
lines changed

2 files changed

+6
-6
lines changed

python/pyspark/sql/connect/functions/builtin.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1223,7 +1223,7 @@ def percentile(
12231223
def percentile_approx(
12241224
col: "ColumnOrName",
12251225
percentage: Union[Column, float, Sequence[float], Tuple[float]],
1226-
accuracy: Union[Column, float] = 10000,
1226+
accuracy: Union[Column, int] = 10000,
12271227
) -> Column:
12281228
percentage = lit(list(percentage)) if isinstance(percentage, (list, tuple)) else lit(percentage)
12291229
return _invoke_function_over_columns("percentile_approx", col, percentage, lit(accuracy))
@@ -1235,7 +1235,7 @@ def percentile_approx(
12351235
def approx_percentile(
12361236
col: "ColumnOrName",
12371237
percentage: Union[Column, float, Sequence[float], Tuple[float]],
1238-
accuracy: Union[Column, float] = 10000,
1238+
accuracy: Union[Column, int] = 10000,
12391239
) -> Column:
12401240
percentage = lit(list(percentage)) if isinstance(percentage, (list, tuple)) else lit(percentage)
12411241
return _invoke_function_over_columns("approx_percentile", col, percentage, lit(accuracy))

python/pyspark/sql/functions/builtin.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6339,7 +6339,7 @@ def percentile(
63396339
def percentile_approx(
63406340
col: "ColumnOrName",
63416341
percentage: Union[Column, float, Sequence[float], Tuple[float]],
6342-
accuracy: Union[Column, float] = 10000,
6342+
accuracy: Union[Column, int] = 10000,
63436343
) -> Column:
63446344
"""Returns the approximate `percentile` of the numeric column `col` which is the smallest value
63456345
in the ordered `col` values (sorted from least to greatest) such that no more than `percentage`
@@ -6360,7 +6360,7 @@ def percentile_approx(
63606360
When percentage is an array, each value of the percentage array must be between 0.0 and 1.0.
63616361
In this case, returns the approximate percentile array of column col
63626362
at the given percentage array.
6363-
accuracy : :class:`~pyspark.sql.Column` or float
6363+
accuracy : :class:`~pyspark.sql.Column` or int
63646364
is a positive numeric literal which controls approximation accuracy
63656365
at the cost of memory. Higher value of accuracy yields better accuracy,
63666366
1.0/accuracy is the relative error of the approximation. (default: 10000).
@@ -6397,7 +6397,7 @@ def percentile_approx(
63976397
def approx_percentile(
63986398
col: "ColumnOrName",
63996399
percentage: Union[Column, float, Sequence[float], Tuple[float]],
6400-
accuracy: Union[Column, float] = 10000,
6400+
accuracy: Union[Column, int] = 10000,
64016401
) -> Column:
64026402
"""Returns the approximate `percentile` of the numeric column `col` which is the smallest value
64036403
in the ordered `col` values (sorted from least to greatest) such that no more than `percentage`
@@ -6414,7 +6414,7 @@ def approx_percentile(
64146414
When percentage is an array, each value of the percentage array must be between 0.0 and 1.0.
64156415
In this case, returns the approximate percentile array of column col
64166416
at the given percentage array.
6417-
accuracy : :class:`~pyspark.sql.Column` or float
6417+
accuracy : :class:`~pyspark.sql.Column` or int
64186418
is a positive numeric literal which controls approximation accuracy
64196419
at the cost of memory. Higher value of accuracy yields better accuracy,
64206420
1.0/accuracy is the relative error of the approximation. (default: 10000).

0 commit comments

Comments
 (0)