Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
cc7f851
add Validate method to array and implementation for ListArray
emkornfield Apr 13, 2016
01c50be
Add utility methods for managing null bitmap directly to ArrayBuilder
emkornfield Apr 13, 2016
3895d34
Refactor list builder to use ArrayBuilders bitmap methods and a separ…
emkornfield Apr 13, 2016
20f984b
refactor primitive builders to use parent builders bitmap
emkornfield Apr 13, 2016
1374485
augment python unittest to have null element in list
emkornfield Apr 13, 2016
45e41c0
Make BufferBuilder more useable for appending primitives
emkornfield Apr 13, 2016
5f87aef
Refactor ipc-adapter-test to make it paramaterizable. add unit test …
emkornfield Apr 13, 2016
61b0481
small fixes to naming/style for c++ and potential bugs
emkornfield Apr 13, 2016
a2e1e52
native popcount
emkornfield Apr 13, 2016
39c57ed
add potentially useful methods for generative arrays to ipc test-common
emkornfield Apr 13, 2016
aa0602c
add potentially useful pool factories to test utils
emkornfield Apr 13, 2016
8e464b5
Fixes per tidy and lint
emkornfield Apr 13, 2016
53d37bc
filter out ipc-adapter-test from tidy
emkornfield Apr 13, 2016
8ab5315
make clang tidy ignore a little bit less hacky
emkornfield Apr 17, 2016
e71810b
make Resize and Init virtual on builder
emkornfield Apr 17, 2016
2e6c477
add missing RETURN_NOT_OK
emkornfield Apr 17, 2016
3b219a1
Make append is_null parameter is_valid for api consistency
emkornfield Apr 18, 2016
10e6651
add in maximum recursion depth, surfaced possible recursion issue wit…
emkornfield Apr 18, 2016
be04b3e
add unit tests for zero length row batches and non-null batches. fix…
emkornfield Apr 18, 2016
8982723
remaining style cleanup
emkornfield Apr 18, 2016
5e15815
fix make lint
emkornfield Apr 18, 2016
6e57728
make format fixes
emkornfield Apr 18, 2016
7789205
make clang-format-3.7 happy
emkornfield Apr 18, 2016
0af558b
remove a now unnecessary NOLINT, but mostly to trigger another travis…
emkornfield Apr 18, 2016
0c5162d
another format fix
emkornfield Apr 18, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions cpp/src/arrow/builder.cc
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,25 @@

namespace arrow {

Status ArrayBuilder::AppendToBitmap(bool is_null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a matter of consistency, should this be is_valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I did it this way because it was closer to how the code previously worked. But I will change it.

if (length_ == capacity_) {
// If the capacity was not already a multiple of 2, do so here
// TODO(emkornfield) doubling isn't great default allocation practice
// see https://github.com/facebook/folly/blob/master/folly/docs/FBVector.md
// fo discussion
RETURN_NOT_OK(Resize(util::next_power2(capacity_ + 1)));
}
UnsafeAppendToBitmap(is_null);
return Status::OK();
}

Status ArrayBuilder::AppendToBitmap(const uint8_t* valid_bytes, int32_t length) {
Reserve(length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reserve may return error status

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. I wonder how hard it would be write a clang-tidy rule to always ensure methods with a Status field are checked.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


UnsafeAppendToBitmap(valid_bytes, length);
return Status::OK();
}

Status ArrayBuilder::Init(int32_t capacity) {
capacity_ = capacity;
int32_t to_alloc = util::ceil_byte(capacity) / 8;
Expand All @@ -36,6 +55,7 @@ Status ArrayBuilder::Init(int32_t capacity) {
}

Status ArrayBuilder::Resize(int32_t new_bits) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way I was thinking it would be nice to make Resize and Init virtual methods, I think it would reduce code repetition and potential bugs for classes the don't have any reason to override Reserve. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me, go ahead

if (!null_bitmap_) { return Init(new_bits); }
int32_t new_bytes = util::ceil_byte(new_bits) / 8;
int32_t old_bytes = null_bitmap_->size();
RETURN_NOT_OK(null_bitmap_->Resize(new_bytes));
Expand All @@ -56,10 +76,46 @@ Status ArrayBuilder::Advance(int32_t elements) {

Status ArrayBuilder::Reserve(int32_t elements) {
if (length_ + elements > capacity_) {
// TODO(emkornfield) power of 2 growth is potentially suboptimal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: should we do 1.5x growth everywhere (this is the folly approach IIRC -- is there more research on this subject?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is what folly uses. Seems like 1.5 edges out 2 on the very light survey done here: https://en.wikipedia.org/wiki/Dynamic_array#Growth_factor

Ideally, we would benchmark once we have some real data in place. In the absence of that 1.5 seems like a good default.

int32_t new_capacity = util::next_power2(length_ + elements);
return Resize(new_capacity);
}
return Status::OK();
}

Status ArrayBuilder::SetNotNull(int32_t length) {
RETURN_NOT_OK(Reserve(length));
UnsafeSetNotNull(length);
return Status::OK();
}

void ArrayBuilder::UnsafeAppendToBitmap(bool is_null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I think so. I will make the change.

if (is_null) {
++null_count_;
} else {
util::set_bit(null_bitmap_data_, length_);
}
++length_;
}

void ArrayBuilder::UnsafeAppendToBitmap(const uint8_t* valid_bytes, int32_t length) {
if (valid_bytes == nullptr) {
UnsafeSetNotNull(length);
return;
}
for (int32_t i = 0; i < length; ++i) {
// TODO(emkornfield) Optimize for large values of length?
AppendToBitmap(valid_bytes[i] == 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be UnsafeAppendToBitmap? This presumes that the user has called Reserve(...) beforehand

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct. I will change.

}
}

void ArrayBuilder::UnsafeSetNotNull(int32_t length) {
const int32_t new_length = length + length_;
// TODO(emkornfield) Optimize for large values of length?
for (int32_t i = length_; i < new_length; ++i) {
util::set_bit(null_bitmap_data_, i);
}
length_ = new_length;
}

} // namespace arrow
27 changes: 25 additions & 2 deletions cpp/src/arrow/builder.h
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,10 @@ class PoolBuffer;

static constexpr int32_t MIN_BUILDER_CAPACITY = 1 << 5;

// Base class for all data array builders
// Base class for all data array builders.
// This class provides a facilities for incrementally building the null bitmap
// (see Append methods) and as a side effect the current number of slots and
// the null count.
class ArrayBuilder {
public:
explicit ArrayBuilder(MemoryPool* pool, const TypePtr& type)
Expand All @@ -58,6 +61,14 @@ class ArrayBuilder {
int32_t null_count() const { return null_count_; }
int32_t capacity() const { return capacity_; }

// Append to null bitmap
Status AppendToBitmap(bool is_null);
// Vector append. Treat each zero byte as a null. If valid_bytes is null
// assume all of length bits are valid.
Status AppendToBitmap(const uint8_t* valid_bytes, int32_t length);
// Set the next length bits to not null (i.e. valid).
Status SetNotNull(int32_t length);

// Allocates requires memory at this level, but children need to be
// initialized independently
Status Init(int32_t capacity);
Expand All @@ -75,7 +86,7 @@ class ArrayBuilder {
const std::shared_ptr<PoolBuffer>& null_bitmap() const { return null_bitmap_; }

// Creates new array object to hold the contents of the builder and transfers
// ownership of the data
// ownership of the data. This resets all variables on the builder.
virtual std::shared_ptr<Array> Finish() = 0;

const std::shared_ptr<DataType>& type() const { return type_; }
Expand All @@ -97,6 +108,18 @@ class ArrayBuilder {
// Child value array builders. These are owned by this class
std::vector<std::unique_ptr<ArrayBuilder>> children_;

//
// Unsafe operations (don't check capacity/don't resize)
//

// Append to null bitmap.
void UnsafeAppendToBitmap(bool is_null);
// Vector append. Treat each zero byte as a nullzero. If valid_bytes is null
// assume all of length bits are valid.
void UnsafeAppendToBitmap(const uint8_t* valid_bytes, int32_t length);
// Set the next length bits to not null (i.e. valid).
void UnsafeSetNotNull(int32_t length);

private:
DISALLOW_COPY_AND_ASSIGN(ArrayBuilder);
};
Expand Down