-
Notifications
You must be signed in to change notification settings - Fork 1.9k
perf: optimize CASE WHEN lookup table (2.5-22.5 times faster) #18183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: optimize CASE WHEN lookup table (2.5-22.5 times faster) #18183
Conversation
I don't like that it is in `sort_properties.rs` but this struct is used in `get_properties` which seems like the most appropriate place
…to improve-performance-for-literal-mapping # Conflicts: # datafusion/physical-expr/benches/case_when.rs
## Which issue does this PR close? N/A But extracted from: - #18183 ## Rationale for this change I want to add optimization for lookup based `CASE WHEN` like: ```sql CASE company WHEN 1 THEN 'Apple' WHEN 5 THEN 'Samsung' WHEN 2 THEN 'Motorola' WHEN 3 THEN 'LG' ELSE 'Other' END ``` ## What changes are included in this PR? Added multiple benchmarks for testing lookup table from int to string and vice verca with different size of lookup table (5, 10, 20), different probabilities for having values generated to exist in the lookup map, and probabilities for the number of nulls ## Are these changes tested? N/A ## Are there any user-facing changes? nope <details> <summary>Current results</summary> ``` lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0 time: [572.27 µs 572.51 µs 572.78 µs] change: [-0.4311% -0.1953% +0.0524%] (p = 0.09 > 0.05) No change in performance detected. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0 time: [808.11 µs 808.63 µs 809.30 µs] change: [+0.2857% +0.4440% +0.6288%] (p = 0.00 < 0.05) Change within noise threshold. Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0 time: [341.97 µs 342.52 µs 343.21 µs] change: [-0.0740% +0.1913% +0.4541%] (p = 0.13 > 0.05) No change in performance detected. Found 9 outliers among 100 measurements (9.00%) 3 (3.00%) high mild 6 (6.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0 time: [446.34 µs 446.58 µs 446.83 µs] change: [-0.0381% +0.1947% +0.3941%] (p = 0.08 > 0.05) No change in performance detected. Found 12 outliers among 100 measurements (12.00%) 8 (8.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0 time: [804.30 µs 804.79 µs 805.33 µs] change: [+0.7523% +0.9613% +1.1731%] (p = 0.00 < 0.05) Change within noise threshold. Found 6 outliers among 100 measurements (6.00%) 4 (4.00%) high mild 2 (2.00%) high severe Benchmarking lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.1s, enable flat sampling, or reduce sample count to 50. lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0 time: [1.3972 ms 1.3979 ms 1.3987 ms] change: [-0.4150% -0.2455% -0.0613%] (p = 0.00 < 0.05) Change within noise threshold. Found 10 outliers among 100 measurements (10.00%) 7 (7.00%) high mild 3 (3.00%) high severe lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0 time: [342.75 µs 342.96 µs 343.22 µs] change: [+0.0122% +0.2433% +0.4781%] (p = 0.02 < 0.05) Change within noise threshold. Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) low mild 2 (2.00%) high mild 7 (7.00%) high severe lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0 time: [445.39 µs 445.56 µs 445.75 µs] change: [-0.4254% -0.2538% -0.0547%] (p = 0.00 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 5 (5.00%) high mild 3 (3.00%) high severe Benchmarking lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.9s, enable flat sampling, or reduce sample count to 60. lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0 time: [1.1731 ms 1.1738 ms 1.1746 ms] change: [+0.3589% +0.5605% +0.7873%] (p = 0.00 < 0.05) Change within noise threshold. Found 11 outliers among 100 measurements (11.00%) 1 (1.00%) low mild 5 (5.00%) high mild 5 (5.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0 time: [2.3819 ms 2.3832 ms 2.3845 ms] change: [+0.0355% +0.1016% +0.1663%] (p = 0.00 < 0.05) Change within noise threshold. Found 5 outliers among 100 measurements (5.00%) 5 (5.00%) high mild lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0 time: [341.53 µs 341.68 µs 341.85 µs] change: [-0.4361% -0.2031% +0.0398%] (p = 0.08 > 0.05) No change in performance detected. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) high mild 7 (7.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0 time: [444.97 µs 445.12 µs 445.30 µs] change: [-0.3776% -0.2148% -0.0460%] (p = 0.00 < 0.05) Change within noise threshold. Found 12 outliers among 100 measurements (12.00%) 2 (2.00%) high mild 10 (10.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1 time: [728.94 µs 729.23 µs 729.58 µs] change: [+0.2335% +0.4019% +0.5791%] (p = 0.00 < 0.05) Change within noise threshold. Found 14 outliers among 100 measurements (14.00%) 1 (1.00%) low mild 8 (8.00%) high mild 5 (5.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1 time: [966.21 µs 967.02 µs 968.03 µs] change: [+0.2251% +0.3997% +0.6015%] (p = 0.00 < 0.05) Change within noise threshold. Found 11 outliers among 100 measurements (11.00%) 2 (2.00%) high mild 9 (9.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.1 time: [485.63 µs 485.86 µs 486.11 µs] change: [+0.0990% +0.2832% +0.4909%] (p = 0.00 < 0.05) Change within noise threshold. Found 6 outliers among 100 measurements (6.00%) 4 (4.00%) high mild 2 (2.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.1 time: [565.86 µs 566.24 µs 566.70 µs] change: [+0.1038% +0.2811% +0.4814%] (p = 0.00 < 0.05) Change within noise threshold. Found 14 outliers among 100 measurements (14.00%) 5 (5.00%) high mild 9 (9.00%) high severe Benchmarking lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.2s, enable flat sampling, or reduce sample count to 60. lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1 time: [1.0237 ms 1.0243 ms 1.0250 ms] change: [+0.3817% +0.6095% +0.8366%] (p = 0.00 < 0.05) Change within noise threshold. Found 12 outliers among 100 measurements (12.00%) 4 (4.00%) high mild 8 (8.00%) high severe Benchmarking lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.2s, enable flat sampling, or reduce sample count to 50. lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1 time: [1.6139 ms 1.6145 ms 1.6151 ms] change: [-0.3413% -0.1799% -0.0094%] (p = 0.03 < 0.05) Change within noise threshold. Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) high mild 3 (3.00%) high severe lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.1 time: [485.50 µs 485.75 µs 486.07 µs] change: [+0.0362% +0.2597% +0.4990%] (p = 0.02 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.1 time: [564.83 µs 565.13 µs 565.49 µs] change: [-0.2292% -0.0443% +0.1429%] (p = 0.68 > 0.05) No change in performance detected. Found 13 outliers among 100 measurements (13.00%) 5 (5.00%) high mild 8 (8.00%) high severe Benchmarking lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.7s, enable flat sampling, or reduce sample count to 50. lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1 time: [1.5206 ms 1.5214 ms 1.5223 ms] change: [+0.1219% +0.2845% +0.4382%] (p = 0.00 < 0.05) Change within noise threshold. Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1 time: [2.7355 ms 2.7372 ms 2.7392 ms] change: [+0.1862% +0.2633% +0.3492%] (p = 0.00 < 0.05) Change within noise threshold. Found 16 outliers among 100 measurements (16.00%) 6 (6.00%) high mild 10 (10.00%) high severe lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.1 time: [485.64 µs 485.87 µs 486.12 µs] change: [-0.1317% +0.0342% +0.1974%] (p = 0.72 > 0.05) No change in performance detected. Found 17 outliers among 100 measurements (17.00%) 10 (10.00%) high mild 7 (7.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.1 time: [564.95 µs 565.22 µs 565.52 µs] change: [-0.2459% -0.0804% +0.1093%] (p = 0.42 > 0.05) No change in performance detected. Found 11 outliers among 100 measurements (11.00%) 5 (5.00%) high mild 6 (6.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5 time: [613.55 µs 613.90 µs 614.30 µs] change: [-0.1978% +0.0206% +0.2512%] (p = 0.88 > 0.05) No change in performance detected. Found 13 outliers among 100 measurements (13.00%) 2 (2.00%) high mild 11 (11.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5 time: [804.94 µs 805.27 µs 805.64 µs] change: [-0.3371% -0.2017% -0.0566%] (p = 0.00 < 0.05) Change within noise threshold. Found 7 outliers among 100 measurements (7.00%) 6 (6.00%) high mild 1 (1.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.5 time: [451.36 µs 451.55 µs 451.75 µs] change: [-0.1076% +0.0692% +0.2464%] (p = 0.48 > 0.05) No change in performance detected. Found 10 outliers among 100 measurements (10.00%) 6 (6.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.5 time: [516.12 µs 516.36 µs 516.64 µs] change: [-0.2179% +0.0030% +0.2181%] (p = 0.96 > 0.05) No change in performance detected. Found 15 outliers among 100 measurements (15.00%) 5 (5.00%) high mild 10 (10.00%) high severe lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5 time: [831.48 µs 831.89 µs 832.37 µs] change: [+0.2730% +0.4416% +0.6047%] (p = 0.00 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 7 (7.00%) high mild 2 (2.00%) high severe Benchmarking lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.6s, enable flat sampling, or reduce sample count to 60. lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5 time: [1.2999 ms 1.3006 ms 1.3014 ms] change: [+0.1551% +0.3508% +0.5933%] (p = 0.00 < 0.05) Change within noise threshold. Found 10 outliers among 100 measurements (10.00%) 7 (7.00%) high mild 3 (3.00%) high severe lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.5 time: [451.54 µs 451.76 µs 452.00 µs] change: [-0.0486% +0.1303% +0.3100%] (p = 0.17 > 0.05) No change in performance detected. Found 8 outliers among 100 measurements (8.00%) 6 (6.00%) high mild 2 (2.00%) high severe lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.5 time: [516.47 µs 516.73 µs 517.04 µs] change: [-0.2732% -0.0578% +0.1455%] (p = 0.66 > 0.05) No change in performance detected. Found 14 outliers among 100 measurements (14.00%) 4 (4.00%) high mild 10 (10.00%) high severe Benchmarking lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.2s, enable flat sampling, or reduce sample count to 60. lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5 time: [1.2276 ms 1.2283 ms 1.2290 ms] change: [+0.3032% +0.4998% +0.6974%] (p = 0.00 < 0.05) Change within noise threshold. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5 time: [2.1643 ms 2.1656 ms 2.1671 ms] change: [-0.2215% -0.1512% -0.0757%] (p = 0.00 < 0.05) Change within noise threshold. Found 16 outliers among 100 measurements (16.00%) 7 (7.00%) high mild 9 (9.00%) high severe lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.5 time: [451.48 µs 451.71 µs 451.96 µs] change: [-0.4533% -0.2697% -0.0701%] (p = 0.00 < 0.05) Change within noise threshold. Found 12 outliers among 100 measurements (12.00%) 7 (7.00%) high mild 5 (5.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.5 time: [516.17 µs 516.42 µs 516.71 µs] change: [-0.3951% -0.1759% +0.0285%] (p = 0.10 > 0.05) No change in performance detected. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) high mild 7 (7.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0 time: [581.53 µs 581.85 µs 582.22 µs] change: [-0.4681% -0.2375% +0.0205%] (p = 0.03 < 0.05) Change within noise threshold. Found 12 outliers among 100 measurements (12.00%) 6 (6.00%) high mild 6 (6.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0 time: [806.04 µs 806.40 µs 806.83 µs] change: [-0.3328% -0.1863% -0.0288%] (p = 0.00 < 0.05) Change within noise threshold. Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) high mild 6 (6.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0 time: [341.42 µs 341.59 µs 341.78 µs] change: [-0.5843% -0.3707% -0.1573%] (p = 0.00 < 0.05) Change within noise threshold. Found 11 outliers among 100 measurements (11.00%) 6 (6.00%) high mild 5 (5.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0 time: [445.02 µs 445.20 µs 445.42 µs] change: [-0.3079% -0.1435% +0.0361%] (p = 0.10 > 0.05) No change in performance detected. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0 time: [804.03 µs 804.52 µs 805.10 µs] change: [+0.0833% +0.2480% +0.4237%] (p = 0.00 < 0.05) Change within noise threshold. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) high mild 7 (7.00%) high severe Benchmarking lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.1s, enable flat sampling, or reduce sample count to 50. lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0 time: [1.3958 ms 1.3964 ms 1.3970 ms] change: [-0.3882% -0.2616% -0.1751%] (p = 0.00 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 5 (5.00%) high mild 3 (3.00%) high severe lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0 time: [341.66 µs 341.85 µs 342.06 µs] change: [-0.5154% -0.3585% -0.2438%] (p = 0.00 < 0.05) Change within noise threshold. Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) high mild 1 (1.00%) high severe lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0 time: [445.08 µs 445.29 µs 445.53 µs] change: [-0.1558% +0.0148% +0.1858%] (p = 0.88 > 0.05) No change in performance detected. Found 13 outliers among 100 measurements (13.00%) 7 (7.00%) high mild 6 (6.00%) high severe Benchmarking lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.9s, enable flat sampling, or reduce sample count to 60. lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0 time: [1.1700 ms 1.1709 ms 1.1719 ms] change: [+0.0124% +0.2123% +0.4009%] (p = 0.02 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 4 (4.00%) high mild 5 (5.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0 time: [2.3757 ms 2.3775 ms 2.3800 ms] change: [+0.0656% +0.1579% +0.2573%] (p = 0.00 < 0.05) Change within noise threshold. Found 12 outliers among 100 measurements (12.00%) 6 (6.00%) high mild 6 (6.00%) high severe lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0 time: [341.47 µs 341.77 µs 342.10 µs] change: [-0.2178% +0.0125% +0.2362%] (p = 0.92 > 0.05) No change in performance detected. Found 10 outliers among 100 measurements (10.00%) 5 (5.00%) high mild 5 (5.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0 time: [445.57 µs 445.82 µs 446.09 µs] change: [-0.0565% +0.1035% +0.2671%] (p = 0.23 > 0.05) No change in performance detected. Found 9 outliers among 100 measurements (9.00%) 5 (5.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1 time: [728.22 µs 728.56 µs 728.96 µs] change: [-0.3534% -0.1796% +0.0062%] (p = 0.03 < 0.05) Change within noise threshold. Found 13 outliers among 100 measurements (13.00%) 4 (4.00%) high mild 9 (9.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1 time: [965.27 µs 965.67 µs 966.17 µs] change: [-0.2803% -0.1597% -0.0369%] (p = 0.00 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 5 (5.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.1 time: [485.15 µs 485.40 µs 485.69 µs] change: [-0.1787% +0.0020% +0.1849%] (p = 0.98 > 0.05) No change in performance detected. Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) high mild 7 (7.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.1 time: [566.91 µs 567.13 µs 567.37 µs] change: [-0.1075% +0.0721% +0.2537%] (p = 0.47 > 0.05) No change in performance detected. Found 11 outliers among 100 measurements (11.00%) 3 (3.00%) high mild 8 (8.00%) high severe Benchmarking lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.2s, enable flat sampling, or reduce sample count to 60. lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1 time: [1.0272 ms 1.0278 ms 1.0286 ms] Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) high mild 7 (7.00%) high severe Benchmarking lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.2s, enable flat sampling, or reduce sample count to 50. lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1 time: [1.6216 ms 1.6224 ms 1.6232 ms] Found 8 outliers among 100 measurements (8.00%) 2 (2.00%) high mild 6 (6.00%) high severe lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.1 time: [485.95 µs 486.18 µs 486.46 µs] Found 13 outliers among 100 measurements (13.00%) 5 (5.00%) high mild 8 (8.00%) high severe lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.1 time: [566.82 µs 567.07 µs 567.34 µs] Found 15 outliers among 100 measurements (15.00%) 7 (7.00%) high mild 8 (8.00%) high severe Benchmarking lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.7s, enable flat sampling, or reduce sample count to 50. lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1 time: [1.5252 ms 1.5263 ms 1.5276 ms] Found 10 outliers among 100 measurements (10.00%) 7 (7.00%) high mild 3 (3.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1 time: [2.7479 ms 2.7492 ms 2.7507 ms] Found 15 outliers among 100 measurements (15.00%) 8 (8.00%) high mild 7 (7.00%) high severe lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.1 time: [485.18 µs 485.54 µs 486.05 µs] Found 13 outliers among 100 measurements (13.00%) 4 (4.00%) high mild 9 (9.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.1 time: [566.81 µs 567.09 µs 567.43 µs] Found 15 outliers among 100 measurements (15.00%) 7 (7.00%) high mild 8 (8.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5 time: [614.34 µs 614.75 µs 615.29 µs] Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) high mild 7 (7.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5 time: [808.13 µs 808.56 µs 809.05 µs] Found 11 outliers among 100 measurements (11.00%) 7 (7.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.5 time: [452.21 µs 452.46 µs 452.79 µs] Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) high mild 6 (6.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.5 time: [518.11 µs 518.36 µs 518.62 µs] Found 13 outliers among 100 measurements (13.00%) 7 (7.00%) high mild 6 (6.00%) high severe lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5 time: [834.45 µs 834.88 µs 835.36 µs] Found 9 outliers among 100 measurements (9.00%) 6 (6.00%) high mild 3 (3.00%) high severe Benchmarking lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.6s, enable flat sampling, or reduce sample count to 60. lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5 time: [1.3037 ms 1.3045 ms 1.3053 ms] Found 8 outliers among 100 measurements (8.00%) 7 (7.00%) high mild 1 (1.00%) high severe lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.5 time: [451.25 µs 451.57 µs 451.99 µs] Found 11 outliers among 100 measurements (11.00%) 5 (5.00%) high mild 6 (6.00%) high severe lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.5 time: [517.62 µs 517.86 µs 518.12 µs] Found 11 outliers among 100 measurements (11.00%) 5 (5.00%) high mild 6 (6.00%) high severe Benchmarking lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.2s, enable flat sampling, or reduce sample count to 60. lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5 time: [1.2297 ms 1.2310 ms 1.2328 ms] Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5 time: [2.1666 ms 2.1676 ms 2.1686 ms] Found 8 outliers among 100 measurements (8.00%) 7 (7.00%) high mild 1 (1.00%) high severe lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.5 time: [452.46 µs 452.66 µs 452.88 µs] Found 8 outliers among 100 measurements (8.00%) 5 (5.00%) high mild 3 (3.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.5 time: [517.20 µs 517.44 µs 517.72 µs] Found 17 outliers among 100 measurements (17.00%) 6 (6.00%) high mild 11 (11.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0 time: [582.53 µs 583.31 µs 584.58 µs] Found 6 outliers among 100 measurements (6.00%) 1 (1.00%) high mild 5 (5.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0 time: [807.24 µs 807.65 µs 808.09 µs] Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0 time: [341.89 µs 342.06 µs 342.25 µs] Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0 time: [445.32 µs 445.54 µs 445.80 µs] Found 9 outliers among 100 measurements (9.00%) 3 (3.00%) high mild 6 (6.00%) high severe lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0 time: [804.09 µs 804.53 µs 805.00 µs] Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) high mild 1 (1.00%) high severe Benchmarking lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.1s, enable flat sampling, or reduce sample count to 50. lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0 time: [1.3968 ms 1.3975 ms 1.3983 ms] Found 11 outliers among 100 measurements (11.00%) 4 (4.00%) high mild 7 (7.00%) high severe lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0 time: [341.27 µs 341.43 µs 341.60 µs] Found 9 outliers among 100 measurements (9.00%) 7 (7.00%) high mild 2 (2.00%) high severe lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0 time: [445.54 µs 445.86 µs 446.24 µs] Found 14 outliers among 100 measurements (14.00%) 6 (6.00%) high mild 8 (8.00%) high severe Benchmarking lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.9s, enable flat sampling, or reduce sample count to 60. lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0 time: [1.1703 ms 1.1710 ms 1.1717 ms] Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0 time: [2.3708 ms 2.3724 ms 2.3743 ms] Found 11 outliers among 100 measurements (11.00%) 8 (8.00%) high mild 3 (3.00%) high severe lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0 time: [341.94 µs 342.15 µs 342.41 µs] Found 9 outliers among 100 measurements (9.00%) 5 (5.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0 time: [445.15 µs 445.42 µs 445.74 µs] Found 11 outliers among 100 measurements (11.00%) 4 (4.00%) high mild 7 (7.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1 time: [729.65 µs 729.94 µs 730.26 µs] Found 11 outliers among 100 measurements (11.00%) 6 (6.00%) high mild 5 (5.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1 time: [966.46 µs 966.97 µs 967.58 µs] Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) high mild 5 (5.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0.1 time: [486.14 µs 486.36 µs 486.61 µs] Found 9 outliers among 100 measurements (9.00%) 4 (4.00%) high mild 5 (5.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0.1 time: [566.81 µs 567.07 µs 567.34 µs] Found 13 outliers among 100 measurements (13.00%) 6 (6.00%) high mild 7 (7.00%) high severe Benchmarking lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.2s, enable flat sampling, or reduce sample count to 60. lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1 time: [1.0273 ms 1.0278 ms 1.0283 ms] Found 9 outliers among 100 measurements (9.00%) 4 (4.00%) high mild 5 (5.00%) high severe Benchmarking lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.2s, enable flat sampling, or reduce sample count to 50. lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1 time: [1.6239 ms 1.6248 ms 1.6258 ms] Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0.1 time: [485.73 µs 486.04 µs 486.43 µs] Found 14 outliers among 100 measurements (14.00%) 6 (6.00%) high mild 8 (8.00%) high severe lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0.1 time: [567.22 µs 567.54 µs 567.93 µs] Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) high mild 6 (6.00%) high severe Benchmarking lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.7s, enable flat sampling, or reduce sample count to 50. lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1 time: [1.5275 ms 1.5282 ms 1.5290 ms] Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1 time: [2.7513 ms 2.7532 ms 2.7553 ms] Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) high mild 6 (6.00%) high severe lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0.1 time: [486.38 µs 486.58 µs 486.78 µs] Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0.1 time: [566.15 µs 566.42 µs 566.75 µs] Found 13 outliers among 100 measurements (13.00%) 7 (7.00%) high mild 6 (6.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0 time: [582.47 µs 582.74 µs 583.04 µs] Found 14 outliers among 100 measurements (14.00%) 9 (9.00%) high mild 5 (5.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0 time: [807.70 µs 808.17 µs 808.74 µs] Found 11 outliers among 100 measurements (11.00%) 2 (2.00%) high mild 9 (9.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 1, nulls: 0 time: [341.62 µs 341.80 µs 342.03 µs] Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 1, nulls: 0 time: [445.48 µs 445.87 µs 446.40 µs] Found 13 outliers among 100 measurements (13.00%) 6 (6.00%) high mild 7 (7.00%) high severe lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0 time: [804.56 µs 805.03 µs 805.52 µs] Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) high mild 1 (1.00%) high severe Benchmarking lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.1s, enable flat sampling, or reduce sample count to 50. lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0 time: [1.3995 ms 1.4004 ms 1.4015 ms] Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 1, nulls: 0 time: [341.46 µs 341.64 µs 341.85 µs] Found 9 outliers among 100 measurements (9.00%) 6 (6.00%) high mild 3 (3.00%) high severe lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 1, nulls: 0 time: [445.91 µs 446.16 µs 446.47 µs] Found 9 outliers among 100 measurements (9.00%) 3 (3.00%) high mild 6 (6.00%) high severe Benchmarking lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.9s, enable flat sampling, or reduce sample count to 60. lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0 time: [1.1708 ms 1.1716 ms 1.1725 ms] Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0 time: [2.3735 ms 2.3748 ms 2.3763 ms] Found 12 outliers among 100 measurements (12.00%) 5 (5.00%) high mild 7 (7.00%) high severe lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 1, nulls: 0 time: [342.14 µs 342.31 µs 342.53 µs] Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) high mild 5 (5.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 1, nulls: 0 time: [444.92 µs 445.09 µs 445.29 µs] Found 15 outliers among 100 measurements (15.00%) 6 (6.00%) high mild 9 (9.00%) high severe ``` </details>
|
Implementation lgtm. The test failure looks like a transient network issue. Clippy lint still needs fixing. |
| paste = { workspace = true } | ||
| petgraph = "0.8.3" | ||
| tokio = { workspace = true } | ||
| half = { workspace = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick, but it looks like these should be kept in alphabetical order.
|
I will check this one out shortly |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started looking at this code -- it looks like quite a cool idea
This comment was marked as outdated.
This comment was marked as outdated.
|
run benchmark case_when |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
I didn't know you can use specific rust test in run benchmark! And I LOVE the new bot logo |
|
Looking at the benchmark results, a nice extension/followup to this work might be to pattern match and transform it to A similar further followup I'm considering is applying the same "compile lookup data structure" technique for patterns like assuming |
I already have impl for that. Note that you need to make sure the |
Nice! I'll hold off working on the the comparison variant until that lands then since the pattern matching will be highly similar. |
Thank you . I am quite pleased with that myself 😁 |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @rluvaton and @pepijnve -- I found this code really well written, easy to understand, and easy to navigate
There are several places where I think we should not be ignoring errors (.ok()?) as I think that will make it hard to understand / debug failures in the future
I have some other small suggestions but nothing I think that is required to merge
I also did some code coverage runs
cargo llvm-cov test --html --test sqllogictests
cargo llvm-cov test --html --all-features -p datafusion-physical-expr
And I think there isn't good coverage for CASE with WHEN values of strings / boolean values. I'll make a PR to add some more of that.
Thank you again -- this is really cool
| { | ||
| // Since it is not common to have dictionary encoded boolean arrays | ||
| // at all than it is ok to do the cast here to simplify the implementation. | ||
| let converted = arrow::compute::cast(array.as_ref(), &DataType::Boolean)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could make this code simpler by always calling cast to boolean (which is a noop for an already BooleanArray).
However, this structure will have more specific errors, so I don't think any changes are required
| } | ||
| } | ||
|
|
||
| fn try_get_bytes_iterator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend pulling this into a utility / other type method -- it might be helpful when implementing other string-like functions
Perhaps datafusion/physical-expr/src/utils/bytes_iter.rs or something
| ) -> datafusion_common::Result<Box<dyn Iterator<Item = Option<&[u8]>> + '_>> { | ||
| Ok(match array.values().data_type() { | ||
| DataType::Utf8 => Box::new( | ||
| array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit is that method is inconsistent with the branch for Utf8 above which uses as_string::<i32>()
I recommend having this code follow the same pattern as above as it is less verbose
DataType::Utf8 => Box::new(array.as_string::<i32>().into_iter().map(|item| {This comment applies to the other types here too
| /// ``` | ||
| /// | ||
| /// # Improvement idea | ||
| /// TODO - we should think of unwrapping the `IN` expressions into multiple equality comparisons |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend filing a ticket to track this idea (and leave a link here), otherwise this TODO may never be found
| { | ||
| let when_data_type = when[0].data_type(); | ||
|
|
||
| // If not all the WHEN literals are the same data type we cannot use this optimization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the time we get here I would expect that all when literals are the same type (so I would expect this code not to be called)
| .chain(std::iter::once(&else_value)) | ||
| .cloned(), | ||
| ) | ||
| .ok()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why (silently) ignore any errors? (I don't expect that this would ever fail given the checks above, but it might help debugging to explicitly return an internal error or something in this case
Silently ignoring the error (via .ok()) seems like it could mask errors that are preventing this optimization from actually happening. Also, we had some problems in the past with performance when using ok() because each DataFusionError requires a string allocation, so creating one and ignoring it causes non trivial overhead.
Since this code is called once per expression, I suspect it isn't a big performance problem, but I do think the masking of errors is concerning
| // Zero-copy conversion | ||
| let take_indices = UInt32Array::from(take_indices); | ||
|
|
||
| // An optimize version would depend on the type of the values_to_take_from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the take kernel already implements the optimization this comment is referring to (this is one of the benefits of re-using the arrow kernels). Thus I think this comment no longer applies
| DataType::Dictionary(_, value_type) | ||
| if value_type.as_ref() == &self.data_type => | ||
| { | ||
| // Cast here to simplify the implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above, you can probably reduce the size of this code by always calling cast
| /// not implementing actual [`PartialEq`] as it is not dyn compatible so we cannot implement it for | ||
| /// `dyn` [`literal_lookup_table::WhenLiteralIndexMap`]. | ||
| /// | ||
| /// So we always return true as the data is derived from `PhysicalExpr` s which are already compared |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is ok to return true here but don't quite follow this argument (aka I recommend updating this comment and the one for hash)
It seems to me it is ok to return true here because a CaseExpr has a body and an eval method:
https://github.com/apache/datafusion/blob/0cc93a6a750ef3082e8fa94c8d8248d980e49e5f/datafusion/physical-expr/src/expressions/case.rs#L249-L254Since the eval field is deterministically determined (only) from the body then if the body is equal, we are guaranteed that eval will also be the same and no further checks are required
|
Follow on PR for more coverage |
…#19143) ## Which issue does this PR close? - follow on to #18183 from @rluvaton and @pepijnve ## Rationale for this change I want to ensure we have test coverage for constant value lookup tables in the SLT tests ## What changes are included in this PR? 1. Add some more slt tests that use constant value lookup tables (e.g. when the case values are all constants) I also ran ```shell cargo llvm-cov test --html --test sqllogictests -- case ``` To ensure this was covering the new code ## Are these changes tested? Only tests ## Are there any user-facing changes? NO
…or-literal-mapping
…or-literal-mapping
|
I merged up from main to resolve a conflict |
Half of the changes are new tests
Which issue does this PR close?
N/A
Part of:
Benchmark is in:
CASE WHEN#18203Rationale for this change
Optimize for Lookup table like
CASE WHEN:CASE company WHEN 1 THEN 'Apple' WHEN 5 THEN 'Samsung' WHEN 2 THEN 'Motorola' WHEN 3 THEN 'LG' ELSE 'Other' ENDWhat changes are included in this PR?
Implement the case when as a lookup table
Are these changes tested?
Yes, a lot
Are there any user-facing changes?
Nope
Benchmark results
(run against main before the #18152 was merged)
Env
Formatted output
Lookup table from
i32toutf8Details
Lookup table from
utf8toi32Details
criterion output:
Benchmark results