-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-82: Initial IPC support for ListArray #59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
cc7f851
01c50be
3895d34
20f984b
1374485
45e41c0
5f87aef
61b0481
a2e1e52
39c57ed
aa0602c
8e464b5
53d37bc
8ab5315
e71810b
2e6c477
3b219a1
10e6651
be04b3e
8982723
5e15815
6e57728
7789205
0af558b
0c5162d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| ipc-adapter-test.cc |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -28,6 +28,7 @@ | |
| namespace arrow { | ||
|
|
||
| class Buffer; | ||
| class Status; | ||
|
|
||
| // Immutable data array with some logical type and some length. Any memory is | ||
| // owned by the respective Buffer instance (or its parents). | ||
|
|
@@ -39,7 +40,7 @@ class Array { | |
| Array(const std::shared_ptr<DataType>& type, int32_t length, int32_t null_count = 0, | ||
| const std::shared_ptr<Buffer>& null_bitmap = nullptr); | ||
|
|
||
| virtual ~Array() {} | ||
| virtual ~Array() = default; | ||
|
|
||
| // Determine if a slot is null. For inner loops. Does *not* boundscheck | ||
| bool IsNull(int i) const { | ||
|
|
@@ -58,6 +59,9 @@ class Array { | |
|
|
||
| bool EqualsExact(const Array& arr) const; | ||
| virtual bool Equals(const std::shared_ptr<Array>& arr) const = 0; | ||
| // Determines if the array is internally consistent. Defaults to always | ||
| // returning Status::OK. This can be an expensive check. | ||
| virtual Status Validate() const; | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Pardon my dumb question, my understanding of Status just seeing its usage in the code I thought it's signaling either an operation succeeded or failed, but validate seems to imply it's either in a valid state or invalid state (like a bool). Why not just use bool? Or does Status can also encode error information?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Status can encode error information.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was some work last year in Impala in which they analyzed the generated x86 instructions to see how to minimize the CPU cycles associated with verifying Status::OK: apache/impala@1afe728 |
||
|
|
||
| protected: | ||
| std::shared_ptr<DataType> type_; | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -25,6 +25,25 @@ | |
|
|
||
| namespace arrow { | ||
|
|
||
| Status ArrayBuilder::AppendToBitmap(bool is_valid) { | ||
| if (length_ == capacity_) { | ||
| // If the capacity was not already a multiple of 2, do so here | ||
| // TODO(emkornfield) doubling isn't great default allocation practice | ||
| // see https://github.com/facebook/folly/blob/master/folly/docs/FBVector.md | ||
| // fo discussion | ||
| RETURN_NOT_OK(Resize(util::next_power2(capacity_ + 1))); | ||
| } | ||
| UnsafeAppendToBitmap(is_valid); | ||
| return Status::OK(); | ||
| } | ||
|
|
||
| Status ArrayBuilder::AppendToBitmap(const uint8_t* valid_bytes, int32_t length) { | ||
| RETURN_NOT_OK(Reserve(length)); | ||
|
|
||
| UnsafeAppendToBitmap(valid_bytes, length); | ||
| return Status::OK(); | ||
| } | ||
|
|
||
| Status ArrayBuilder::Init(int32_t capacity) { | ||
| capacity_ = capacity; | ||
| int32_t to_alloc = util::ceil_byte(capacity) / 8; | ||
|
|
@@ -36,6 +55,7 @@ Status ArrayBuilder::Init(int32_t capacity) { | |
| } | ||
|
|
||
| Status ArrayBuilder::Resize(int32_t new_bits) { | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. By the way I was thinking it would be nice to make Resize and Init virtual methods, I think it would reduce code repetition and potential bugs for classes the don't have any reason to override Reserve. Thoughts?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good to me, go ahead |
||
| if (!null_bitmap_) { return Init(new_bits); } | ||
| int32_t new_bytes = util::ceil_byte(new_bits) / 8; | ||
| int32_t old_bytes = null_bitmap_->size(); | ||
| RETURN_NOT_OK(null_bitmap_->Resize(new_bytes)); | ||
|
|
@@ -56,10 +76,46 @@ Status ArrayBuilder::Advance(int32_t elements) { | |
|
|
||
| Status ArrayBuilder::Reserve(int32_t elements) { | ||
| if (length_ + elements > capacity_) { | ||
| // TODO(emkornfield) power of 2 growth is potentially suboptimal | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Aside: should we do 1.5x growth everywhere (this is the folly approach IIRC -- is there more research on this subject?)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes this is what folly uses. Seems like 1.5 edges out 2 on the very light survey done here: https://en.wikipedia.org/wiki/Dynamic_array#Growth_factor Ideally, we would benchmark once we have some real data in place. In the absence of that 1.5 seems like a good default. |
||
| int32_t new_capacity = util::next_power2(length_ + elements); | ||
| return Resize(new_capacity); | ||
| } | ||
| return Status::OK(); | ||
| } | ||
|
|
||
| Status ArrayBuilder::SetNotNull(int32_t length) { | ||
| RETURN_NOT_OK(Reserve(length)); | ||
| UnsafeSetNotNull(length); | ||
| return Status::OK(); | ||
| } | ||
|
|
||
| void ArrayBuilder::UnsafeAppendToBitmap(bool is_valid) { | ||
| if (is_valid) { | ||
| util::set_bit(null_bitmap_data_, length_); | ||
| } else { | ||
| ++null_count_; | ||
| } | ||
| ++length_; | ||
| } | ||
|
|
||
| void ArrayBuilder::UnsafeAppendToBitmap(const uint8_t* valid_bytes, int32_t length) { | ||
| if (valid_bytes == nullptr) { | ||
| UnsafeSetNotNull(length); | ||
| return; | ||
| } | ||
| for (int32_t i = 0; i < length; ++i) { | ||
| // TODO(emkornfield) Optimize for large values of length? | ||
| UnsafeAppendToBitmap(valid_bytes[i] > 0); | ||
| } | ||
| } | ||
|
|
||
| void ArrayBuilder::UnsafeSetNotNull(int32_t length) { | ||
| const int32_t new_length = length + length_; | ||
| // TODO(emkornfield) Optimize for large values of length? | ||
| for (int32_t i = length_; i < new_length; ++i) { | ||
| util::set_bit(null_bitmap_data_, i); | ||
| } | ||
| length_ = new_length; | ||
| } | ||
|
|
||
| } // namespace arrow | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside: I tried to find definitive documentation on when to use
= defaultvs= 0in C++11, if you have any pointersThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is default == {}, and destructor = 0, should be used rarely. From: http://en.cppreference.com/w/cpp/language/destructor "A destructor may be declared pure virtual, for example in a base class which needs to be made abstract, but has no other suitable functions that could be declared pure virtual. Such destructor must have a definition, since all base class destructors are always called when the derived class is destroyed" Although the second sentence seems to contradict the first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed to default purely for style (I likely should have done this as a separate PR and discussed), for some reason it reads more elegantly to me.