Skip to content

Conversation

emilk
Copy link
Contributor

@emilk emilk commented Sep 7, 2025

This is part of an attempt to improve the error reporting of arrow-rs, datafusion, and any other 3rd party crates.

I believe that error messages should be as readable as possible. Aim for rustc more than gcc.

Here's an example of how this PR improves some existing error messages:

Before:

Casting from Map(Field { name: "entries", data_type: Struct([Field { name: "key", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "value", data_type: Interval(DayTime), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, false) to Map(Field { name: "entries", data_type: Struct([Field { name: "key", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "value", data_type: Duration(Second), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, true) not supported

After:

Casting from Map(Field { "entries": Struct(key Utf8, value nullable Interval(DayTime)) }, false) to Map(Field { "entries": Struct(key Utf8, value Duration(Second)) }, true) not supported

Which issue does this PR close?

Rationale for this change

DataType:s are often shown in error messages. Making these error messages readable is very important.

What changes are included in this PR?

Unify Debug and Display

The Display and Debug of DataType are now the SAME.

Why? Both are frequently used in error messages (both in arrow, and datafusion), and both benefit from being readable yet reversible.

Reverted based on PR feedback. I will try to improve the Debug formatting in a future PR, with clever use of https://doc.rust-lang.org/std/fmt/struct.Formatter.html#method.debug_struct

Improve Display of lists

Improve the Display formatting of

  • DataType::List
  • DataType::LargeList
  • DataType::FixedSizeList

Before: List(Field { name: \"item\", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} })
After: List(nullable Int32)

Before: FixedSizeList(Field { name: \"item\", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, 5)
After: FixedSizeList(5 x Int32)

Better formatting of DataType::Struct

The formatting of Struct is now reversible, including nullability and metadata.

Improve Debug format of Field

Best understood with this diff for an existing test:

Screenshot 2025-09-07 at 18 30 44

EDIT: reverted

Are these changes tested?

Yes - new tests cover them

Are there any user-facing changes?

Display/to_string has changed, and so this is a BREAKING CHANGE.

Care has been taken that the formatting contains all necessary information (i.e. is reversible), though the actual FromStr implementation is still not written (it is missing on main, and missing in this PR - so no change).


Let me know if I went to far… or not far enough 😆

@github-actions github-actions bot added the arrow Changes to the arrow crate label Sep 7, 2025
@emilk emilk changed the title Improve Display and Debug for DataType Improve Display and Debug for DataType and Field Sep 7, 2025
@github-actions github-actions bot added the parquet Changes to the parquet crate label Sep 7, 2025
@emilk emilk marked this pull request as ready for review September 7, 2025 17:28
@mbrobbel mbrobbel added the next-major-release the PR has API changes and it waiting on the next major version label Sep 8, 2025

impl fmt::Display for DataType {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
// NOTE: `Display` and `Debug` formatting are ALWAYS the same,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is more common in Rust code to have the Debug implementation try and print out something close to the underlying representation, and Display is for human consumption

Specifically https://doc.rust-lang.org/std/fmt/trait.Debug.html

Debug should format the output in a programmer-facing, debugging context.

Generally speaking, you should just derive a Debug implementation.

vs https://doc.rust-lang.org/std/fmt/trait.Display.html

Copy link
Contributor Author

@emilk emilk Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is uncommon, but unfortunately so many error messages in datafusion and arrow use the Debug formatting of DataType instead of Display, which means we end up with huge difficult-to-read error messages.

There are three solutions to this, afaict:

A) Use Display=Debug, like this PR.
It's still programmer-facing, because it contains ALL the info (metadata etc)

B) Replace all uses of {:?} with {} when printing datatypes in datafusion, arrow, and other third party crates.
This is VERY hard to do, as I know of no automated tool to find all these places.

C) Improve Debug formatting by omitting empty/default fields. This will help, but the Debug format for DataType::List will still be very ugly, since it wraps a Field.

(or maybe I mistakingly think a lot of places use Debug instead of Display because the old Display implementation for List used the Debug formatting…)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment to the code to motivate this choice

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

B) Replace all uses of {:?} with {} when printing datatypes in datafusion, arrow, and other third party crates.
This is VERY hard to do, as I know of no automated tool to find all these places.

I think grepping for type:?} and {:?} would catch most of them. Maybe a good think to ask some AI tool to do

(venv) andrewlamb@Andrews-MacBook-Pro-3:~/Software/arrow-rs$ grep -r 'type:?' `find . -name '*.rs'`
./arrow-schema/src/datatype_parse.rs:        println!("Input '{data_type_string}' ({data_type:?})");
./arrow-schema/src/datatype_parse.rs:            println!("Parsing '{data_type_string}', expecting '{expected_data_type:?}'");
./arrow-data/src/transform/run.rs:        _ => panic!("Invalid run end type for RunEndEncoded array: {run_end_type:?}"),
./arrow-data/src/transform/run.rs:                _ => panic!("Invalid run end type for RunEndEncoded array: {dest_run_end_type:?}",),
./arrow-ipc/src/compression.rs:                "compression type {other_type:?} not supported "
./arrow-string/src/like.rs:                        "{value_type:?} «{value}» like {pattern_type:?} «{pattern}»"
./arrow-string/src/like.rs:                        "{value_type:?} «{value}» ilike {pattern_type:?} «{pattern}»"
./arrow-string/src/like.rs:                        "{value_type:?} «{value}» nlike {pattern_type:?} «{pattern}»"
./arrow-string/src/like.rs:                        "{value_type:?} «{value}» nilike {pattern_type:?} «{pattern}»"
./arrow-csv/src/reader/mod.rs:                            "Unsupported dictionary key type {key_type:?}"
./arrow-row/src/list.rs:                "Expected FixedSizeListArray, found: {list_type:?}",
./arrow-array/src/array/fixed_size_list_array.rs:                panic!("FixedSizeListArray data should contain a FixedSizeList data type, got {data_type:?}")
./arrow-array/src/array/primitive_array.rs:        write!(f, "PrimitiveArray<{data_type:?}>\n[\n")?;
./arrow-array/src/array/primitive_array.rs:                            "Cast error: Failed to convert {v} to temporal for {data_type:?}"
./arrow-array/src/array/primitive_array.rs:                            "Cast error: Failed to convert {v} to temporal for {data_type:?}"
./arrow-array/src/record_batch.rs:                "column types must match schema types, expected {field_type:?} but found {col_type:?} at column index {i}")));
./arrow-array/src/ffi.rs:                "The datatype \"{data_type:?}\" doesn't expect buffer at index 0. Please verify that the C data interface is correctly implemented."
./arrow-array/src/ffi.rs:                "The datatype \"{data_type:?}\" expects 2 buffers, but requested {i}. Please verify that the C data interface is correctly implemented."
./arrow-array/src/ffi.rs:                "The datatype \"{data_type:?}\" expects 2 buffers, but requested {i}. Please verify that the C data interface is correctly implemented."
./arrow-array/src/ffi.rs:                "The datatype \"{data_type:?}\" expects 2 buffers, but requested {i}. Please verify that the C data interface is correctly implemented."
./arrow-array/src/ffi.rs:                "The datatype \"{data_type:?}\" expects 2 buffers, but requested {i}. Please verify that the C data interface is correctly implemented."
./arrow-array/src/ffi.rs:                "The datatype \"{data_type:?}\" expects 3 buffers, but requested {i}. Please verify that the C data interface is correctly implemented."
./arrow-array/src/ffi.rs:                "The datatype \"{data_type:?}\" expects 3 buffers, but requested {i}. Please verify that the C data interface is correctly implemented."
./arrow-array/src/ffi.rs:                "The datatype \"{data_type:?}\" expects 1 buffer, but requested {i}. Please verify that the C data interface is correctly implemented."
./arrow-array/src/ffi.rs:                "The datatype \"{data_type:?}\" expects 2 buffer, but requested {i}. Please verify that the C data interface is correctly implemented."
./arrow-array/src/ffi.rs:                "The datatype \"{data_type:?}\" doesn't expect buffer at index 0. Please verify that the C data interface is correctly implemented."
./arrow-array/src/ffi.rs:                "The datatype \"{data_type:?}\" is still not supported in Rust implementation"
./arrow-array/src/builder/mod.rs:                    panic!("Data type {t:?} with key type {key_type:?} is not currently supported")
./arrow-cast/src/cast/mod.rs:                "Casting from dictionary type {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs:                "Casting from type {from_type:?} to dictionary type {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs:            "Casting from {from_type:?} to {to_type:?} not supported"
./arrow-cast/src/cast/mod.rs:            "Casting from {from_type:?} to {to_type:?} not supported"
./arrow-cast/src/cast/mod.rs:                "Casting from {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs:                "Casting from {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs:                "Casting from {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs:                "Casting from {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs:                "Casting from {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs:                "Casting from {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs:                "Casting from {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs:                "Casting from {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs:            "Casting from {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs:            "Casting from {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs:            "Casting from {from_type:?} to {to_type:?} not supported"
./arrow-cast/src/cast/mod.rs:            "Casting from {from_type:?} to {to_type:?} not supported"
./arrow-cast/src/cast/dictionary.rs:                        "Unsupported type {to_index_type:?} for dictionary index"
./arrow-cast/src/cast/dictionary.rs:            "Unsupported output type for dictionary packing: {dict_value_type:?}"
./parquet/benches/arrow_reader_row_filter.rs:            let benchmark_name = format!("{filter_type:?}/{proj_case}",);
./parquet/src/record/reader.rs:                        "Map key type is expected to be a primitive type, but found {key_type:?}"
./parquet/src/arrow/arrow_reader/mod.rs:                    "data type: {data_type:?}, expected: {expected_err}, got: {err}"
./parquet/src/arrow/arrow_reader/mod.rs:                    "data type: {data_type:?}, expected: {expected_err}, got: {err}"
./parquet/src/arrow/arrow_writer/mod.rs:                    "Attempting to write an Arrow type {data_type:?} to parquet that is not yet implemented"
./parquet/src/arrow/buffer/view_buffer.rs:            _ => panic!("Unsupported data type: {data_type:?}"),
./parquet/src/schema/visitor.rs:                panic!("{list_type:?} is a list type and must be a group type")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @emilk and @mbrobbel -- I think this is much better than what is currently present.

my only real concern is changing the Debug format -- I would personally recommend we leave the Debug format as is (#derive) and just improve the Display implementation

};

let name = field.name();
let maybe_nullable = if field.is_nullable() { "nullable " } else { "" };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this pattern

@emilk
Copy link
Contributor Author

emilk commented Sep 15, 2025

Should I revert the changes to Debug formatting, or is this good to merge?

@mbrobbel
Copy link
Member

Should I revert the changes to Debug formatting, or is this good to merge?

+1 for reverting Debug and using the improved formatting for Display.

@emilk emilk changed the title Improve Display and Debug for DataType and Field Improve Display for DataType and Field Sep 15, 2025
@github-actions github-actions bot added the parquet-variant parquet-variant* crates label Sep 15, 2025
@mbrobbel mbrobbel added the api-change Changes to the arrow API label Sep 15, 2025
Copy link
Member

@mbrobbel mbrobbel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @emilk

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @emilk and @mbrobbel -- I think this looks great now. Once we get the CI to pass let's merge it in

};
assert_eq!(
t,
r#"Casting from Map(Field { "entries": Struct(key Utf8, value nullable Utf8) }, false) to Map(Field { "entries": Struct(key Utf8, value Utf8) }, true) not supported"#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is certainly much nicer!

@emilk
Copy link
Contributor Author

emilk commented Sep 15, 2025

Green!

@mbrobbel
Copy link
Member

We can merge after #7836.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API arrow Changes to the arrow crate next-major-release the PR has API changes and it waiting on the next major version parquet Changes to the parquet crate parquet-variant parquet-variant* crates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve human readable display for DataType::List
4 participants