-
Notifications
You must be signed in to change notification settings - Fork 4k
Open
Description
Describe the bug, including details regarding any error messages, version, and platform.
In pyarrow version 11.0.0 and 10.0.1, if I create a dense array with some null elements, pa.compute.is_null() returns that they are not null. Repro:
import pyarrow as pa
types = pa.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1], type=pa.int8())
value_offsets = pa.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 0], type=pa.int32())
array1 = pa.array([1, 2, 3, 1, None, 3, None, None, None, None, 1])
array2 = pa.array(["b"])
dense = pa.UnionArray.from_dense(types, value_offsets, [array1, array2])
print(dense)
# <pyarrow.lib.UnionArray object at 0x285dfbc40>
# -- is_valid: all not null
# -- type_ids: [
# 0,
# 0,
# 0,
# 0,
# 0,
# 0,
# 0,
# 0,
# 0,
# 0,
# 0,
# 1
# ]
# -- value_offsets: [
# 0,
# 1,
# 2,
# 3,
# 4,
# 5,
# 6,
# 7,
# 8,
# 9,
# 10,
# 0
# ]
# -- child 0 type: int64
# [
# 1,
# 2,
# 3,
# 1,
# null,
# 3,
# null,
# null,
# null,
# null,
# 1
# ]
# -- child 1 type: string
# [
# "b"
# ]
Illustration of the first issue:
pa.compute.is_null(dense)
# expected: BooleanArray [false, false, false, false, true, false, true, true, true, true, false, false]
# actual:
# <pyarrow.lib.BooleanArray object at 0x285dfbdc0>
# [
# false,
# false,
# false,
# false,
# false,
# false,
# false,
# false,
# false,
# false,
# false,
# false
# ]Illustration of the second issue:
I do pa.compute.is_null() on a null element of the array, i get a segfault:
null_element = dense[4]
print(null_element)
# <pyarrow.UnionScalar: None>
pa.compute.is_null(null_element)
# Fatal Python error: Segmentation faultComponent(s)
Python