-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
There is a significant amount of code generated for array functions.
This both bloats binaries built with DataFusion as well as makes compile times slow.
To Reproduce
cd datafusion/datafusion-cli
cargo bloat File .text Size Crate Name
0.1% 0.2% 151.2KiB datafusion_physical_expr datafusion_physical_expr::array_expressions::array_replace_all
0.1% 0.2% 151.2KiB datafusion_physical_expr datafusion_physical_expr::array_expressions::array_replace_n
0.1% 0.2% 151.2KiB datafusion_physical_expr datafusion_physical_expr::array_expressions::array_replace
0.1% 0.2% 150.3KiB parquet brotli::enc::prior_eval::PriorEval<Alloc>::update_cost_base
0.1% 0.2% 124.6KiB datafusion_physical_expr datafusion_physical_expr::array_expressions::array_repeat
0.1% 0.2% 121.5KiB blake2 blake2::Blake2bVarCore::compress
0.0% 0.1% 81.5KiB blake2 blake2::Blake2sVarCore::compress
0.0% 0.1% 73.2KiB blake3 blake3::portable::compress_in_place
0.0% 0.1% 65.2KiB chrono_tz <chrono_tz::timezones::Tz as chrono_tz::timezone_impl::TimeSpans>::timespans
0.0% 0.1% 61.1KiB sqlparser <sqlparser::ast::Statement as core::fmt::Display>::fmt
0.0% 0.1% 61.0KiB datafusion_physical_expr datafusion_physical_expr::array_expressions::array_append
0.0% 0.1% 61.0KiB datafusion_physical_expr datafusion_physical_expr::array_expressions::array_prepend
0.0% 0.1% 60.5KiB h2 h2::codec::framed_read::decode_frame
0.0% 0.1% 59.3KiB datafusion datafusion::physical_planner::DefaultPhysicalPlanner::create_initial_plan::{{closure}}
0.0% 0.1% 56.4KiB datafusion_physical_expr datafusion_physical_expr::array_expressions::array_remove_all
0.0% 0.1% 56.4KiB datafusion_physical_expr datafusion_physical_expr::array_expressions::array_remove_n
0.0% 0.1% 56.4KiB datafusion_physical_expr datafusion_physical_expr::array_expressions::array_remove
0.0% 0.1% 52.4KiB h2 h2::frame::headers::HeaderBlock::load::{{closure}}
0.0% 0.1% 51.4KiB datafusion_optimizer <datafusion_optimizer::simplify_expressions::expr_simplifier::Simplifier<S> as datafusion_common::tree_node::...
0.0% 0.1% 48.9KiB datafusion_physical_expr datafusion_physical_expr::datetime_expressions::date_part
35.4% 97.6% 67.1MiB And 290367 smaller methods. Use -n N to show more.
36.3% 100.0% 68.7MiB .text section size, the file size is 189.3MiB
Expected behavior
I would like the array_replace_all, array_replace_n, array_replace functions to be implemented in terms of arrow kernels (such as eq, and take) and manipulations of offset buffers rather than directly creating new lists.
For example, the large macro expansion here:
https://github.com/apache/arrow-datafusion/blob/bb1d7f9343532d5fa8df871ff42000fbe836d7d7/datafusion/physical-expr/src/array_expressions.rs#L1431-L1437
I believe generates a bunch of specialized code for each different list element data type 😢
Additional context
No response
tustvold
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working