-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Describe the enhancement requested
Certain types don't have the validity defined by the bitmap buffer's presence and bit values:
NASPARSE_UNIONDENSE_UNIONRUN_END_ENCODED
This means that recently inline bool ArrayData::IsValid() had to grow:
arrow/cpp/src/arrow/array/data.h
Lines 201 to 216 in cd607d0
| bool IsValid(int64_t i) const { | |
| if (buffers[0] != NULLPTR) { | |
| return bit_util::GetBit(buffers[0]->data(), i + offset); | |
| } | |
| const auto type = this->type->id(); | |
| if (type == Type::SPARSE_UNION) { | |
| return !internal::IsNullSparseUnion(*this, i); | |
| } | |
| if (type == Type::DENSE_UNION) { | |
| return !internal::IsNullDenseUnion(*this, i); | |
| } | |
| if (type == Type::RUN_END_ENCODED) { | |
| return !internal::IsNullRunEndEncoded(*this, i); | |
| } | |
| return null_count.load() != length; | |
| } |
Many type-specific loops in Arrow don't use IsValid and check the bitmap directly instead (good to reduce the branching in hot loops or to do lower-level operations with the bitmaps).
To be future-proof, these loops would have to migrate to using IsValid, but since doing that would be overkill for most cases (unions and REEs are rare), I propose the creation of a function close to the special validity checking functions that checks if a type allows for validity checks to be performed solely through the bitmap:
constexpr bool is_validity_defined_by_bitmap(Type::type id) {
switch (id) {
case Type::NA:
case Type::SPARSE_UNION:
case Type::DENSE_UNION:
case Type::RUN_END_ENCODED:
return false;
default:
return true;
}
}This means that if a template or function that only looks at the validity bitmap starts being used for these special types we can get static_assert, assert and DCHECK violations.
Component(s)
C++