Skip to content

Conversation

@hiroyuki-sato
Copy link
Collaborator

@hiroyuki-sato hiroyuki-sato commented Jun 11, 2025

Rationale for this change

GLib should be able to use arrow::FixedSizeListType.

What changes are included in this PR?

Add GArrowFixedSizeListDataType.

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes.

@hiroyuki-sato hiroyuki-sato requested a review from kou as a code owner June 11, 2025 01:04
@github-actions
Copy link

⚠️ GitHub issue #46773 has been automatically assigned in GitHub to PR creator.

@kou kou requested a review from Copilot June 12, 2025 01:57
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds support for FixedSizeListType in the GLib binding by introducing a new GArrowFixedSizeListDataType.

  • Declare a new GArrowFixedSizeListDataType and its constructors in the header.
  • Implement the new data type in the C++ binding.
  • Add Ruby tests covering basic construction and string representation.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
c_glib/arrow-glib/composite-data-type.h Declare GArrowFixedSizeListDataType and its new_data_type/new_field functions
c_glib/arrow-glib/composite-data-type.cpp Define and register the new type; implement the constructor methods
c_glib/test/test-fixed-size-list-data-type.rb Add tests for FixedSizeListDataType.new, #id, #name, and #to_s
Comments suppressed due to low confidence (2)

c_glib/test/test-fixed-size-list-data-type.rb:48

  • It would be helpful to add a test for list_size (e.g. assert_equal(5, @data_type.list_size)) to ensure the size is exposed correctly.
assert_equal("fixed_size_list<item: bool>[5]", @data_type.to_s)

c_glib/arrow-glib/composite-data-type.cpp:787

  • The description "The size of value" is a bit vague; consider "The number of elements in each fixed-size list" for clarity.
* @list_size: The size of value.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Jun 12, 2025
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 12, 2025
hiroyuki-sato and others added 14 commits June 12, 2025 11:27
def test_field
field = Arrow::Field.new(@field_name, @value_type)
data_type = Arrow::FixedSizeListDataType.new(field, @list_size);
assert_equal(Arrow::Type::FIXED_SIZE_LIST, data_type.id);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also check data_type.field.

Copy link
Collaborator Author

@hiroyuki-sato hiroyuki-sato Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we need implement this method?

p data_type.field
Error: test_field(TestFixedSizeListDataType::.new): NoMethodError: undefined method 'field' for an instance of Arrow::FixedSizeListDataType

Does this mean assert_equal(Arrow::Type::BOOLEAN, @value_type.id);?

If this means We should check the item data type in the data_type, I didn't know how to get data_type from arrow::FixedSizeListType instance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it doesn't exist, yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test must check whether the given field is used or not.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think this change?

+      data_type = Arrow::FixedSizeListDataType.new(field, @list_size)
+      assert_equal([Arrow::Type::FIXED_SIZE_LIST, "fixed_size_list<bool_field: bool>[5]"],
+                   [data_type.id, data_type.to_s])

We can check size: 5, fileld_name: bool_field, and type: bool.

It seems FixedSizeListDataType class can't get size and type separately.

TEST(TestFixedSizeListType, Basics) {
std::shared_ptr<DataType> vt = std::make_shared<UInt8Type>();
FixedSizeListType fixed_size_list_type(vt, 4);
ASSERT_EQ(fixed_size_list_type.id(), Type::FIXED_SIZE_LIST);
ASSERT_EQ(4, fixed_size_list_type.list_size());
ASSERT_EQ("fixed_size_list", fixed_size_list_type.name());
ASSERT_EQ("fixed_size_list<item: uint8>[4]", fixed_size_list_type.ToString());
ASSERT_EQ(fixed_size_list_type.value_type()->id(), vt->id());
ASSERT_EQ(fixed_size_list_type.value_type()->id(), vt->id());
std::shared_ptr<DataType> st = std::make_shared<StringType>();
std::shared_ptr<DataType> lt = std::make_shared<FixedSizeListType>(st, 3);
ASSERT_EQ("fixed_size_list<item: string>[3]", lt->ToString());
FixedSizeListType lt2(lt, 7);
ASSERT_EQ("fixed_size_list<item: fixed_size_list<item: string>[3]>[7]", lt2.ToString());

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We don't need data_type.id because data_type.to_s includes fixed_size_list.
  • It's OK with #to_s in this PR but we should have readers of value_field and list_size in the future.

We can move garrow_list_data_type_get_field() to garrow_base_list_data_type_get_field() for value_field:

diff --git a/c_glib/arrow-glib/composite-data-type.cpp b/c_glib/arrow-glib/composite-data-type.cpp
index 8af1b0c862..ac5ed6809f 100644
--- a/c_glib/arrow-glib/composite-data-type.cpp
+++ b/c_glib/arrow-glib/composite-data-type.cpp
@@ -65,6 +65,25 @@ garrow_base_list_data_type_class_init(GArrowBaseListDataTypeClass *klass)
 {
 }
 
+/**
+ * garrow_base_list_data_type_get_field:
+ * @base_list_data_type: A #GArrowBaseListDataType.
+ *
+ * Returns: (transfer full): The field of value.
+ *
+ * Since: 21.0.0
+ */
+GArrowField *
+garrow_base_list_data_type_get_field(GArrowBaseListDataType *base_list_data_type)
+{
+  auto data_type = GARROW_DATA_TYPE(base_list_data_type);
+  auto arrow_data_type = garrow_data_type_get_raw(data_type);
+  auto arrow_base_list_data_type = std::static_pointer_cast<arrow::BaseListType>(arrow_data_type);
+
+  auto arrow_field = arrow_base_list_data_type->value_field();
+  return garrow_field_new_raw(&arrow_field, nullptr);
+}
+
 G_DEFINE_TYPE(GArrowListDataType, garrow_list_data_type, GARROW_TYPE_BASE_LIST_DATA_TYPE)
 
 static void
@@ -116,16 +135,14 @@ garrow_list_data_type_get_value_field(GArrowListDataType *list_data_type)
  * Returns: (transfer full): The field of value.
  *
  * Since: 0.13.0
+ *
+ * Deprecated: 21.0.0:
+ *   Use garrow_base_list_data_type_get_field() instead.
  */
 GArrowField *
 garrow_list_data_type_get_field(GArrowListDataType *list_data_type)
 {
-  auto data_type = GARROW_DATA_TYPE(list_data_type);
-  auto arrow_data_type = garrow_data_type_get_raw(data_type);
-  auto arrow_list_data_type = static_cast<arrow::ListType *>(arrow_data_type.get());
-
-  auto arrow_field = arrow_list_data_type->value_field();
-  return garrow_field_new_raw(&arrow_field, nullptr);
+  return garrow_base_list_data_type_get_field(GARROW_BASE_LIST_DATA_TYPE(list_data_type));
 }
 
 G_DEFINE_TYPE(GArrowLargeListDataType, garrow_large_list_data_type, GARROW_TYPE_DATA_TYPE)
diff --git a/c_glib/arrow-glib/composite-data-type.h b/c_glib/arrow-glib/composite-data-type.h
index de9449c41c..02de84ec50 100644
--- a/c_glib/arrow-glib/composite-data-type.h
+++ b/c_glib/arrow-glib/composite-data-type.h
@@ -38,6 +38,10 @@ struct _GArrowBaseListDataTypeClass
   GArrowDataTypeClass parent_class;
 };
 
+GARROW_AVAILABLE_IN_21_0
+GArrowField *
+garrow_base_list_data_type_get_field(GArrowBaseListDataType *base_list_data_type);
+
 #define GARROW_TYPE_LIST_DATA_TYPE (garrow_list_data_type_get_type())
 GARROW_AVAILABLE_IN_ALL
 G_DECLARE_DERIVABLE_TYPE(GArrowListDataType,

We can add garrow_fixed_size_list_data_type_get_list_size() (or list-size property) as the binding of

int32_t list_size() const { return list_size_; }
.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks I simplified tests. I'll Create a separate PR to add garrow_base_list_data_type_get_field.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented list_size property.
Shall I update this PR or create another PR?

    def setup
      @value_type = Arrow::BooleanDataType.new
      @list_size = 5
      @field_name = "bool_field"
    end

    def test_field
      field = Arrow::Field.new(@field_name, @value_type)
      data_type = Arrow::FixedSizeListDataType.new(field, @list_size)
      assert_equal(["bool_field", @value_type, @list_size],
                   [data_type.field.name,
                    data_type.field.data_type,
                    data_type.list_size])
    end
diff --git a/c_glib/arrow-glib/composite-data-type.cpp b/c_glib/arrow-glib/composite-data-type.cpp
index 035846bad1..1193ca9b42 100644
--- a/c_glib/arrow-glib/composite-data-type.cpp
+++ b/c_glib/arrow-glib/composite-data-type.cpp
@@ -67,6 +67,25 @@ garrow_base_list_data_type_class_init(GArrowBaseListDataTypeClass *klass)
 {
 }
 
+/**
+ * garrow_base_list_data_type_get_field:
+ * @base_list_data_type: A #GArrowBaseListDataType.
+ *
+ * Returns: (transfer full): The field of value.
+ *
+ * Since: 21.0.0
+ */
+GArrowField *
+garrow_base_list_data_type_get_field(GArrowBaseListDataType *base_list_data_type)
+{
+  auto data_type = GARROW_DATA_TYPE(base_list_data_type);
+  auto arrow_data_type = garrow_data_type_get_raw(data_type);
+  auto arrow_base_list_data_type = std::static_pointer_cast<arrow::BaseListType>(arrow_data_type);
+
+  auto arrow_field = arrow_base_list_data_type->value_field();
+  return garrow_field_new_raw(&arrow_field, nullptr);
+}
+
 G_DEFINE_TYPE(GArrowListDataType, garrow_list_data_type, GARROW_TYPE_BASE_LIST_DATA_TYPE)
 
 static void
@@ -118,16 +137,14 @@ garrow_list_data_type_get_value_field(GArrowListDataType *list_data_type)
  * Returns: (transfer full): The field of value.
  *
  * Since: 0.13.0
+ *
+ * Deprecated: 21.0.0:
+ *   Use garrow_base_list_data_type_get_field() instead.
  */
 GArrowField *
 garrow_list_data_type_get_field(GArrowListDataType *list_data_type)
 {
-  auto data_type = GARROW_DATA_TYPE(list_data_type);
-  auto arrow_data_type = garrow_data_type_get_raw(data_type);
-  auto arrow_list_data_type = static_cast<arrow::ListType *>(arrow_data_type.get());
-
-  auto arrow_field = arrow_list_data_type->value_field();
-  return garrow_field_new_raw(&arrow_field, nullptr);
+  return garrow_base_list_data_type_get_field(GARROW_BASE_LIST_DATA_TYPE(list_data_type));
 }
 
 G_DEFINE_TYPE(GArrowLargeListDataType, garrow_large_list_data_type, GARROW_TYPE_DATA_TYPE)
@@ -769,17 +786,55 @@ garrow_run_end_encoded_data_type_get_value_data_type(
   return garrow_data_type_new_raw(&arrow_value_data_type);
 }
 
+enum {
+  PROP_LIST_SIZE = 1
+};
+
 G_DEFINE_TYPE(GArrowFixedSizeListDataType,
               garrow_fixed_size_list_data_type,
               GARROW_TYPE_BASE_LIST_DATA_TYPE)
 
 static void
-garrow_fixed_size_list_data_type_init(GArrowFixedSizeListDataType *object)
-{
+garrow_fixed_size_list_data_type_get_property(GObject *object,
+                                              guint prop_id,
+                                              GValue *value,
+                                              GParamSpec *pspec)
+{
+  auto arrow_data_type = garrow_data_type_get_raw(GARROW_DATA_TYPE(object));
+  const auto arrow_fixed_size_list_type =
+    std::static_pointer_cast<arrow::FixedSizeListType>(arrow_data_type);
+
+  switch (prop_id) {
+  case PROP_LIST_SIZE:
+    g_value_set_int(value, arrow_fixed_size_list_type->list_size());
+    break;
+  default:
+    G_OBJECT_WARN_INVALID_PROPERTY_ID(object, prop_id, pspec);
+    break;
+  }
 }
 
 static void
 garrow_fixed_size_list_data_type_class_init(GArrowFixedSizeListDataTypeClass *klass)
+{
+  GObjectClass *gobject_class;
+  GParamSpec *spec;
+
+  gobject_class = G_OBJECT_CLASS(klass);
+  gobject_class->get_property = garrow_fixed_size_list_data_type_get_property;
+
+  spec =  g_param_spec_int("list-size",
+                           "List size",
+                           "The list size of the elements",
+                           G_MININT,
+                           G_MAXINT,
+                           0,
+                           G_PARAM_READABLE);
+  g_object_class_install_property(gobject_class, PROP_LIST_SIZE, spec);
+}
+
+static void
+garrow_fixed_size_list_data_type_init(GArrowFixedSizeListDataType *object)
 {
 }
 
diff --git a/c_glib/arrow-glib/composite-data-type.h b/c_glib/arrow-glib/composite-data-type.h
index 8369ae4987..207647bd46 100644
--- a/c_glib/arrow-glib/composite-data-type.h
+++ b/c_glib/arrow-glib/composite-data-type.h
@@ -38,6 +38,10 @@ struct _GArrowBaseListDataTypeClass
   GArrowDataTypeClass parent_class;
 };
 
+GARROW_AVAILABLE_IN_21_0
+GArrowField *
+garrow_base_list_data_type_get_field(GArrowBaseListDataType *base_list_data_type);
+
 #define GARROW_TYPE_LIST_DATA_TYPE (garrow_list_data_type_get_type())
 GARROW_AVAILABLE_IN_ALL
 G_DECLARE_DERIVABLE_TYPE(GArrowListDataType,
diff --git a/c_glib/test/test-fixed-size-list-data-type.rb b/c_glib/test/test-fixed-size-list-data-type.rb
index d8faddb9d9..9546d27405 100644
--- a/c_glib/test/test-fixed-size-list-data-type.rb
+++ b/c_glib/test/test-fixed-size-list-data-type.rb
@@ -26,20 +26,26 @@ class TestFixedSizeListDataType < Test::Unit::TestCase
     def test_field
       field = Arrow::Field.new(@field_name, @value_type)
       data_type = Arrow::FixedSizeListDataType.new(field, @list_size)
-      # TODO: check value_field and list_size separately.
-      assert_equal("fixed_size_list<bool_field: bool>[5]", data_type.to_s)
+      assert_equal(["bool_field", @value_type, @list_size],
+                   [data_type.field.name,
+                    data_type.field.data_type,
+                    data_type.list_size])
     end
 
     def test_data_type
       data_type = Arrow::FixedSizeListDataType.new(@value_type, @list_size)
-      # TODO: check value_field and list_size separately.
-      assert_equal("fixed_size_list<item: bool>[5]", data_type.to_s)
+      assert_equal(["item", @value_type, @list_size],
+                   [data_type.field.name,
+                    data_type.field.data_type,
+                    data_type.list_size])
     end
   end
 
   sub_test_case("instance_methods") do
     def setup
-      @data_type = Arrow::FixedSizeListDataType.new(Arrow::BooleanDataType.new, 5)
+      @list_size = 5
+      @value_type =Arrow::BooleanDataType.new
+      @data_type = Arrow::FixedSizeListDataType.new(@value_type, @list_size)
     end
 
     def test_name
@@ -49,5 +55,14 @@ class TestFixedSizeListDataType < Test::Unit::TestCase
     def test_to_s
       assert_equal("fixed_size_list<item: bool>[5]", @data_type.to_s)
     end
+
+    def test_list_size
+      assert_equal(@list_size, @data_type.list_size)
+    end
+
+    def test_field
+      field = Arrow::Field.new("item", @value_type)
+      assert_equal(field, @data_type.field)
+    end
   end
 end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I updated this PR. Please take a look when you get a chance.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jun 16, 2025
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 16, 2025
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jun 17, 2025
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 17, 2025
@hiroyuki-sato
Copy link
Collaborator Author

Thanks. All suggested change applied.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@kou kou merged commit 93cae03 into apache:main Jun 17, 2025
10 checks passed
@kou kou removed the awaiting change review Awaiting change review label Jun 17, 2025
@github-actions github-actions bot added the awaiting merge Awaiting merge label Jun 17, 2025
@hiroyuki-sato hiroyuki-sato deleted the topic/fixed-size-list-data-type branch June 17, 2025 03:47
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 93cae03.

There were 119 benchmark results with an error:

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

alinaliBQ pushed a commit to Bit-Quill/arrow that referenced this pull request Jun 17, 2025
### Rationale for this change

GLib should be able to use `arrow::FixedSizeListType`.

### What changes are included in this PR?

Add `GArrowFixedSizeListDataType`.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.
* GitHub Issue: apache#46773

Lead-authored-by: Hiroyuki Sato <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants