-
Notifications
You must be signed in to change notification settings - Fork 1k
Improve Display
for DataType
and Field
#8290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Display
and Debug
for DataType
Display
and Debug
for DataType
and Field
arrow-schema/src/datatype_format.rs
Outdated
|
||
impl fmt::Display for DataType { | ||
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { | ||
// NOTE: `Display` and `Debug` formatting are ALWAYS the same, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is more common in Rust code to have the Debug implementation try and print out something close to the underlying representation, and Display
is for human consumption
Specifically https://doc.rust-lang.org/std/fmt/trait.Debug.html
Debug should format the output in a programmer-facing, debugging context.
Generally speaking, you should just derive a Debug implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is uncommon, but unfortunately so many error messages in datafusion and arrow use the Debug
formatting of DataType
instead of Display
, which means we end up with huge difficult-to-read error messages.
There are three solutions to this, afaict:
A) Use Display
=Debug
, like this PR.
It's still programmer-facing, because it contains ALL the info (metadata etc)
B) Replace all uses of {:?}
with {}
when printing datatypes in datafusion, arrow, and other third party crates.
This is VERY hard to do, as I know of no automated tool to find all these places.
C) Improve Debug
formatting by omitting empty/default fields. This will help, but the Debug format for DataType::List
will still be very ugly, since it wraps a Field
.
(or maybe I mistakingly think a lot of places use Debug
instead of Display
because the old Display
implementation for List
used the Debug
formatting…)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment to the code to motivate this choice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
B) Replace all uses of {:?} with {} when printing datatypes in datafusion, arrow, and other third party crates.
This is VERY hard to do, as I know of no automated tool to find all these places.
I think grepping for type:?}
and {:?}
would catch most of them. Maybe a good think to ask some AI tool to do
(venv) andrewlamb@Andrews-MacBook-Pro-3:~/Software/arrow-rs$ grep -r 'type:?' `find . -name '*.rs'`
./arrow-schema/src/datatype_parse.rs: println!("Input '{data_type_string}' ({data_type:?})");
./arrow-schema/src/datatype_parse.rs: println!("Parsing '{data_type_string}', expecting '{expected_data_type:?}'");
./arrow-data/src/transform/run.rs: _ => panic!("Invalid run end type for RunEndEncoded array: {run_end_type:?}"),
./arrow-data/src/transform/run.rs: _ => panic!("Invalid run end type for RunEndEncoded array: {dest_run_end_type:?}",),
./arrow-ipc/src/compression.rs: "compression type {other_type:?} not supported "
./arrow-string/src/like.rs: "{value_type:?} «{value}» like {pattern_type:?} «{pattern}»"
./arrow-string/src/like.rs: "{value_type:?} «{value}» ilike {pattern_type:?} «{pattern}»"
./arrow-string/src/like.rs: "{value_type:?} «{value}» nlike {pattern_type:?} «{pattern}»"
./arrow-string/src/like.rs: "{value_type:?} «{value}» nilike {pattern_type:?} «{pattern}»"
./arrow-csv/src/reader/mod.rs: "Unsupported dictionary key type {key_type:?}"
./arrow-row/src/list.rs: "Expected FixedSizeListArray, found: {list_type:?}",
./arrow-array/src/array/fixed_size_list_array.rs: panic!("FixedSizeListArray data should contain a FixedSizeList data type, got {data_type:?}")
./arrow-array/src/array/primitive_array.rs: write!(f, "PrimitiveArray<{data_type:?}>\n[\n")?;
./arrow-array/src/array/primitive_array.rs: "Cast error: Failed to convert {v} to temporal for {data_type:?}"
./arrow-array/src/array/primitive_array.rs: "Cast error: Failed to convert {v} to temporal for {data_type:?}"
./arrow-array/src/record_batch.rs: "column types must match schema types, expected {field_type:?} but found {col_type:?} at column index {i}")));
./arrow-array/src/ffi.rs: "The datatype \"{data_type:?}\" doesn't expect buffer at index 0. Please verify that the C data interface is correctly implemented."
./arrow-array/src/ffi.rs: "The datatype \"{data_type:?}\" expects 2 buffers, but requested {i}. Please verify that the C data interface is correctly implemented."
./arrow-array/src/ffi.rs: "The datatype \"{data_type:?}\" expects 2 buffers, but requested {i}. Please verify that the C data interface is correctly implemented."
./arrow-array/src/ffi.rs: "The datatype \"{data_type:?}\" expects 2 buffers, but requested {i}. Please verify that the C data interface is correctly implemented."
./arrow-array/src/ffi.rs: "The datatype \"{data_type:?}\" expects 2 buffers, but requested {i}. Please verify that the C data interface is correctly implemented."
./arrow-array/src/ffi.rs: "The datatype \"{data_type:?}\" expects 3 buffers, but requested {i}. Please verify that the C data interface is correctly implemented."
./arrow-array/src/ffi.rs: "The datatype \"{data_type:?}\" expects 3 buffers, but requested {i}. Please verify that the C data interface is correctly implemented."
./arrow-array/src/ffi.rs: "The datatype \"{data_type:?}\" expects 1 buffer, but requested {i}. Please verify that the C data interface is correctly implemented."
./arrow-array/src/ffi.rs: "The datatype \"{data_type:?}\" expects 2 buffer, but requested {i}. Please verify that the C data interface is correctly implemented."
./arrow-array/src/ffi.rs: "The datatype \"{data_type:?}\" doesn't expect buffer at index 0. Please verify that the C data interface is correctly implemented."
./arrow-array/src/ffi.rs: "The datatype \"{data_type:?}\" is still not supported in Rust implementation"
./arrow-array/src/builder/mod.rs: panic!("Data type {t:?} with key type {key_type:?} is not currently supported")
./arrow-cast/src/cast/mod.rs: "Casting from dictionary type {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs: "Casting from type {from_type:?} to dictionary type {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs: "Casting from {from_type:?} to {to_type:?} not supported"
./arrow-cast/src/cast/mod.rs: "Casting from {from_type:?} to {to_type:?} not supported"
./arrow-cast/src/cast/mod.rs: "Casting from {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs: "Casting from {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs: "Casting from {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs: "Casting from {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs: "Casting from {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs: "Casting from {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs: "Casting from {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs: "Casting from {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs: "Casting from {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs: "Casting from {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs: "Casting from {from_type:?} to {to_type:?} not supported"
./arrow-cast/src/cast/mod.rs: "Casting from {from_type:?} to {to_type:?} not supported"
./arrow-cast/src/cast/dictionary.rs: "Unsupported type {to_index_type:?} for dictionary index"
./arrow-cast/src/cast/dictionary.rs: "Unsupported output type for dictionary packing: {dict_value_type:?}"
./parquet/benches/arrow_reader_row_filter.rs: let benchmark_name = format!("{filter_type:?}/{proj_case}",);
./parquet/src/record/reader.rs: "Map key type is expected to be a primitive type, but found {key_type:?}"
./parquet/src/arrow/arrow_reader/mod.rs: "data type: {data_type:?}, expected: {expected_err}, got: {err}"
./parquet/src/arrow/arrow_reader/mod.rs: "data type: {data_type:?}, expected: {expected_err}, got: {err}"
./parquet/src/arrow/arrow_writer/mod.rs: "Attempting to write an Arrow type {data_type:?} to parquet that is not yet implemented"
./parquet/src/arrow/buffer/view_buffer.rs: _ => panic!("Unsupported data type: {data_type:?}"),
./parquet/src/schema/visitor.rs: panic!("{list_type:?} is a list type and must be a group type")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we really need to enforce this is:
But I've done my best in this PR, and in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arrow-schema/src/datatype_format.rs
Outdated
}; | ||
|
||
let name = field.name(); | ||
let maybe_nullable = if field.is_nullable() { "nullable " } else { "" }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this pattern
Should I revert the changes to |
+1 for reverting |
Display
and Debug
for DataType
and Field
Display
for DataType
and Field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @emilk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
}; | ||
assert_eq!( | ||
t, | ||
r#"Casting from Map(Field { "entries": Struct(key Utf8, value nullable Utf8) }, false) to Map(Field { "entries": Struct(key Utf8, value Utf8) }, true) not supported"# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is certainly much nicer!
Green! |
We can merge after #7836. |
This is part of an attempt to improve the error reporting of
arrow-rs
,datafusion
, and any other 3rd party crates.I believe that error messages should be as readable as possible. Aim for
rustc
more thangcc
.Here's an example of how this PR improves some existing error messages:
Before:
After:
Which issue does this PR close?
DataType::List
#7048DataType::List
#7051Display/Debug
formatting #8291Display
formatting ofDataType
:s in error messagwe datafusion#17565Display
forDataType
#8351Rationale for this change
DataType:s are often shown in error messages. Making these error messages readable is very important.
What changes are included in this PR?
UnifyDebug
andDisplay
TheDisplay
andDebug
ofDataType
are now the SAME.Why? Both are frequently used in error messages (both in arrow, anddatafusion
), and both benefit from being readable yet reversible.Reverted based on PR feedback. I will try to improve the
Debug
formatting in a future PR, with clever use of https://doc.rust-lang.org/std/fmt/struct.Formatter.html#method.debug_structImprove
Display
of listsImprove the
Display
formatting ofDataType::List
DataType::LargeList
DataType::FixedSizeList
Before:
List(Field { name: \"item\", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} })
After:
List(nullable Int32)
Before:
FixedSizeList(Field { name: \"item\", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, 5)
After:
FixedSizeList(5 x Int32)
Better formatting of
DataType::Struct
The formatting of
Struct
is now reversible, including nullability and metadata.ImproveDebug
format ofField
Best understood with this diff for an existing test:EDIT: reverted
Are these changes tested?
Yes - new tests cover them
Are there any user-facing changes?
Display/to_string
has changed, and so this is a BREAKING CHANGE.Care has been taken that the formatting contains all necessary information (i.e. is reversible), though the actual
FromStr
implementation is still not written (it is missing onmain
, and missing in this PR - so no change).Let me know if I went to far… or not far enough 😆