-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-46773: [GLib] Add GArrowFixedSizeListDataType #46774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-46773: [GLib] Add GArrowFixedSizeListDataType #46774
Conversation
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds support for FixedSizeListType in the GLib binding by introducing a new GArrowFixedSizeListDataType.
- Declare a new
GArrowFixedSizeListDataTypeand its constructors in the header. - Implement the new data type in the C++ binding.
- Add Ruby tests covering basic construction and string representation.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| c_glib/arrow-glib/composite-data-type.h | Declare GArrowFixedSizeListDataType and its new_data_type/new_field functions |
| c_glib/arrow-glib/composite-data-type.cpp | Define and register the new type; implement the constructor methods |
| c_glib/test/test-fixed-size-list-data-type.rb | Add tests for FixedSizeListDataType.new, #id, #name, and #to_s |
Comments suppressed due to low confidence (2)
c_glib/test/test-fixed-size-list-data-type.rb:48
- It would be helpful to add a test for
list_size(e.g.assert_equal(5, @data_type.list_size)) to ensure the size is exposed correctly.
assert_equal("fixed_size_list<item: bool>[5]", @data_type.to_s)
c_glib/arrow-glib/composite-data-type.cpp:787
- The description "The size of value" is a bit vague; consider "The number of elements in each fixed-size list" for clarity.
* @list_size: The size of value.
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
| def test_field | ||
| field = Arrow::Field.new(@field_name, @value_type) | ||
| data_type = Arrow::FixedSizeListDataType.new(field, @list_size); | ||
| assert_equal(Arrow::Type::FIXED_SIZE_LIST, data_type.id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also check data_type.field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean we need implement this method?
p data_type.field
Error: test_field(TestFixedSizeListDataType::.new): NoMethodError: undefined method 'field' for an instance of Arrow::FixedSizeListDataType
Does this mean assert_equal(Arrow::Type::BOOLEAN, @value_type.id);?
If this means We should check the item data type in the data_type, I didn't know how to get data_type from arrow::FixedSizeListType instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it doesn't exist, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test must check whether the given field is used or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think this change?
+ data_type = Arrow::FixedSizeListDataType.new(field, @list_size)
+ assert_equal([Arrow::Type::FIXED_SIZE_LIST, "fixed_size_list<bool_field: bool>[5]"],
+ [data_type.id, data_type.to_s])We can check size: 5, fileld_name: bool_field, and type: bool.
It seems FixedSizeListDataType class can't get size and type separately.
arrow/cpp/src/arrow/type_test.cc
Lines 1680 to 1698 in 57439e0
| TEST(TestFixedSizeListType, Basics) { | |
| std::shared_ptr<DataType> vt = std::make_shared<UInt8Type>(); | |
| FixedSizeListType fixed_size_list_type(vt, 4); | |
| ASSERT_EQ(fixed_size_list_type.id(), Type::FIXED_SIZE_LIST); | |
| ASSERT_EQ(4, fixed_size_list_type.list_size()); | |
| ASSERT_EQ("fixed_size_list", fixed_size_list_type.name()); | |
| ASSERT_EQ("fixed_size_list<item: uint8>[4]", fixed_size_list_type.ToString()); | |
| ASSERT_EQ(fixed_size_list_type.value_type()->id(), vt->id()); | |
| ASSERT_EQ(fixed_size_list_type.value_type()->id(), vt->id()); | |
| std::shared_ptr<DataType> st = std::make_shared<StringType>(); | |
| std::shared_ptr<DataType> lt = std::make_shared<FixedSizeListType>(st, 3); | |
| ASSERT_EQ("fixed_size_list<item: string>[3]", lt->ToString()); | |
| FixedSizeListType lt2(lt, 7); | |
| ASSERT_EQ("fixed_size_list<item: fixed_size_list<item: string>[3]>[7]", lt2.ToString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We don't need
data_type.idbecausedata_type.to_sincludesfixed_size_list. - It's OK with
#to_sin this PR but we should have readers ofvalue_fieldandlist_sizein the future.
We can move garrow_list_data_type_get_field() to garrow_base_list_data_type_get_field() for value_field:
diff --git a/c_glib/arrow-glib/composite-data-type.cpp b/c_glib/arrow-glib/composite-data-type.cpp
index 8af1b0c862..ac5ed6809f 100644
--- a/c_glib/arrow-glib/composite-data-type.cpp
+++ b/c_glib/arrow-glib/composite-data-type.cpp
@@ -65,6 +65,25 @@ garrow_base_list_data_type_class_init(GArrowBaseListDataTypeClass *klass)
{
}
+/**
+ * garrow_base_list_data_type_get_field:
+ * @base_list_data_type: A #GArrowBaseListDataType.
+ *
+ * Returns: (transfer full): The field of value.
+ *
+ * Since: 21.0.0
+ */
+GArrowField *
+garrow_base_list_data_type_get_field(GArrowBaseListDataType *base_list_data_type)
+{
+ auto data_type = GARROW_DATA_TYPE(base_list_data_type);
+ auto arrow_data_type = garrow_data_type_get_raw(data_type);
+ auto arrow_base_list_data_type = std::static_pointer_cast<arrow::BaseListType>(arrow_data_type);
+
+ auto arrow_field = arrow_base_list_data_type->value_field();
+ return garrow_field_new_raw(&arrow_field, nullptr);
+}
+
G_DEFINE_TYPE(GArrowListDataType, garrow_list_data_type, GARROW_TYPE_BASE_LIST_DATA_TYPE)
static void
@@ -116,16 +135,14 @@ garrow_list_data_type_get_value_field(GArrowListDataType *list_data_type)
* Returns: (transfer full): The field of value.
*
* Since: 0.13.0
+ *
+ * Deprecated: 21.0.0:
+ * Use garrow_base_list_data_type_get_field() instead.
*/
GArrowField *
garrow_list_data_type_get_field(GArrowListDataType *list_data_type)
{
- auto data_type = GARROW_DATA_TYPE(list_data_type);
- auto arrow_data_type = garrow_data_type_get_raw(data_type);
- auto arrow_list_data_type = static_cast<arrow::ListType *>(arrow_data_type.get());
-
- auto arrow_field = arrow_list_data_type->value_field();
- return garrow_field_new_raw(&arrow_field, nullptr);
+ return garrow_base_list_data_type_get_field(GARROW_BASE_LIST_DATA_TYPE(list_data_type));
}
G_DEFINE_TYPE(GArrowLargeListDataType, garrow_large_list_data_type, GARROW_TYPE_DATA_TYPE)
diff --git a/c_glib/arrow-glib/composite-data-type.h b/c_glib/arrow-glib/composite-data-type.h
index de9449c41c..02de84ec50 100644
--- a/c_glib/arrow-glib/composite-data-type.h
+++ b/c_glib/arrow-glib/composite-data-type.h
@@ -38,6 +38,10 @@ struct _GArrowBaseListDataTypeClass
GArrowDataTypeClass parent_class;
};
+GARROW_AVAILABLE_IN_21_0
+GArrowField *
+garrow_base_list_data_type_get_field(GArrowBaseListDataType *base_list_data_type);
+
#define GARROW_TYPE_LIST_DATA_TYPE (garrow_list_data_type_get_type())
GARROW_AVAILABLE_IN_ALL
G_DECLARE_DERIVABLE_TYPE(GArrowListDataType,We can add garrow_fixed_size_list_data_type_get_list_size() (or list-size property) as the binding of
Line 1389 in 9133598
| int32_t list_size() const { return list_size_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks I simplified tests. I'll Create a separate PR to add garrow_base_list_data_type_get_field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented list_size property.
Shall I update this PR or create another PR?
def setup
@value_type = Arrow::BooleanDataType.new
@list_size = 5
@field_name = "bool_field"
end
def test_field
field = Arrow::Field.new(@field_name, @value_type)
data_type = Arrow::FixedSizeListDataType.new(field, @list_size)
assert_equal(["bool_field", @value_type, @list_size],
[data_type.field.name,
data_type.field.data_type,
data_type.list_size])
enddiff --git a/c_glib/arrow-glib/composite-data-type.cpp b/c_glib/arrow-glib/composite-data-type.cpp
index 035846bad1..1193ca9b42 100644
--- a/c_glib/arrow-glib/composite-data-type.cpp
+++ b/c_glib/arrow-glib/composite-data-type.cpp
@@ -67,6 +67,25 @@ garrow_base_list_data_type_class_init(GArrowBaseListDataTypeClass *klass)
{
}
+/**
+ * garrow_base_list_data_type_get_field:
+ * @base_list_data_type: A #GArrowBaseListDataType.
+ *
+ * Returns: (transfer full): The field of value.
+ *
+ * Since: 21.0.0
+ */
+GArrowField *
+garrow_base_list_data_type_get_field(GArrowBaseListDataType *base_list_data_type)
+{
+ auto data_type = GARROW_DATA_TYPE(base_list_data_type);
+ auto arrow_data_type = garrow_data_type_get_raw(data_type);
+ auto arrow_base_list_data_type = std::static_pointer_cast<arrow::BaseListType>(arrow_data_type);
+
+ auto arrow_field = arrow_base_list_data_type->value_field();
+ return garrow_field_new_raw(&arrow_field, nullptr);
+}
+
G_DEFINE_TYPE(GArrowListDataType, garrow_list_data_type, GARROW_TYPE_BASE_LIST_DATA_TYPE)
static void
@@ -118,16 +137,14 @@ garrow_list_data_type_get_value_field(GArrowListDataType *list_data_type)
* Returns: (transfer full): The field of value.
*
* Since: 0.13.0
+ *
+ * Deprecated: 21.0.0:
+ * Use garrow_base_list_data_type_get_field() instead.
*/
GArrowField *
garrow_list_data_type_get_field(GArrowListDataType *list_data_type)
{
- auto data_type = GARROW_DATA_TYPE(list_data_type);
- auto arrow_data_type = garrow_data_type_get_raw(data_type);
- auto arrow_list_data_type = static_cast<arrow::ListType *>(arrow_data_type.get());
-
- auto arrow_field = arrow_list_data_type->value_field();
- return garrow_field_new_raw(&arrow_field, nullptr);
+ return garrow_base_list_data_type_get_field(GARROW_BASE_LIST_DATA_TYPE(list_data_type));
}
G_DEFINE_TYPE(GArrowLargeListDataType, garrow_large_list_data_type, GARROW_TYPE_DATA_TYPE)
@@ -769,17 +786,55 @@ garrow_run_end_encoded_data_type_get_value_data_type(
return garrow_data_type_new_raw(&arrow_value_data_type);
}
+enum {
+ PROP_LIST_SIZE = 1
+};
+
G_DEFINE_TYPE(GArrowFixedSizeListDataType,
garrow_fixed_size_list_data_type,
GARROW_TYPE_BASE_LIST_DATA_TYPE)
static void
-garrow_fixed_size_list_data_type_init(GArrowFixedSizeListDataType *object)
-{
+garrow_fixed_size_list_data_type_get_property(GObject *object,
+ guint prop_id,
+ GValue *value,
+ GParamSpec *pspec)
+{
+ auto arrow_data_type = garrow_data_type_get_raw(GARROW_DATA_TYPE(object));
+ const auto arrow_fixed_size_list_type =
+ std::static_pointer_cast<arrow::FixedSizeListType>(arrow_data_type);
+
+ switch (prop_id) {
+ case PROP_LIST_SIZE:
+ g_value_set_int(value, arrow_fixed_size_list_type->list_size());
+ break;
+ default:
+ G_OBJECT_WARN_INVALID_PROPERTY_ID(object, prop_id, pspec);
+ break;
+ }
}
static void
garrow_fixed_size_list_data_type_class_init(GArrowFixedSizeListDataTypeClass *klass)
+{
+ GObjectClass *gobject_class;
+ GParamSpec *spec;
+
+ gobject_class = G_OBJECT_CLASS(klass);
+ gobject_class->get_property = garrow_fixed_size_list_data_type_get_property;
+
+ spec = g_param_spec_int("list-size",
+ "List size",
+ "The list size of the elements",
+ G_MININT,
+ G_MAXINT,
+ 0,
+ G_PARAM_READABLE);
+ g_object_class_install_property(gobject_class, PROP_LIST_SIZE, spec);
+}
+
+static void
+garrow_fixed_size_list_data_type_init(GArrowFixedSizeListDataType *object)
{
}
diff --git a/c_glib/arrow-glib/composite-data-type.h b/c_glib/arrow-glib/composite-data-type.h
index 8369ae4987..207647bd46 100644
--- a/c_glib/arrow-glib/composite-data-type.h
+++ b/c_glib/arrow-glib/composite-data-type.h
@@ -38,6 +38,10 @@ struct _GArrowBaseListDataTypeClass
GArrowDataTypeClass parent_class;
};
+GARROW_AVAILABLE_IN_21_0
+GArrowField *
+garrow_base_list_data_type_get_field(GArrowBaseListDataType *base_list_data_type);
+
#define GARROW_TYPE_LIST_DATA_TYPE (garrow_list_data_type_get_type())
GARROW_AVAILABLE_IN_ALL
G_DECLARE_DERIVABLE_TYPE(GArrowListDataType,
diff --git a/c_glib/test/test-fixed-size-list-data-type.rb b/c_glib/test/test-fixed-size-list-data-type.rb
index d8faddb9d9..9546d27405 100644
--- a/c_glib/test/test-fixed-size-list-data-type.rb
+++ b/c_glib/test/test-fixed-size-list-data-type.rb
@@ -26,20 +26,26 @@ class TestFixedSizeListDataType < Test::Unit::TestCase
def test_field
field = Arrow::Field.new(@field_name, @value_type)
data_type = Arrow::FixedSizeListDataType.new(field, @list_size)
- # TODO: check value_field and list_size separately.
- assert_equal("fixed_size_list<bool_field: bool>[5]", data_type.to_s)
+ assert_equal(["bool_field", @value_type, @list_size],
+ [data_type.field.name,
+ data_type.field.data_type,
+ data_type.list_size])
end
def test_data_type
data_type = Arrow::FixedSizeListDataType.new(@value_type, @list_size)
- # TODO: check value_field and list_size separately.
- assert_equal("fixed_size_list<item: bool>[5]", data_type.to_s)
+ assert_equal(["item", @value_type, @list_size],
+ [data_type.field.name,
+ data_type.field.data_type,
+ data_type.list_size])
end
end
sub_test_case("instance_methods") do
def setup
- @data_type = Arrow::FixedSizeListDataType.new(Arrow::BooleanDataType.new, 5)
+ @list_size = 5
+ @value_type =Arrow::BooleanDataType.new
+ @data_type = Arrow::FixedSizeListDataType.new(@value_type, @list_size)
end
def test_name
@@ -49,5 +55,14 @@ class TestFixedSizeListDataType < Test::Unit::TestCase
def test_to_s
assert_equal("fixed_size_list<item: bool>[5]", @data_type.to_s)
end
+
+ def test_list_size
+ assert_equal(@list_size, @data_type.list_size)
+ end
+
+ def test_field
+ field = Arrow::Field.new("item", @value_type)
+ assert_equal(field, @data_type.field)
+ end
end
endThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I updated this PR. Please take a look when you get a chance.
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
|
Thanks. All suggested change applied. |
kou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 93cae03. There were 119 benchmark results with an error:
There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
### Rationale for this change GLib should be able to use `arrow::FixedSizeListType`. ### What changes are included in this PR? Add `GArrowFixedSizeListDataType`. ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. * GitHub Issue: apache#46773 Lead-authored-by: Hiroyuki Sato <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
Rationale for this change
GLib should be able to use
arrow::FixedSizeListType.What changes are included in this PR?
Add
GArrowFixedSizeListDataType.Are these changes tested?
Yes.
Are there any user-facing changes?
Yes.