-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-655: [C++/Python] Implement DecimalArray #403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6e49399 to
49a5780
Compare
|
Nice! I will review this in more detail when I can. I did some refactoring in #404 which will introduce some minor rebase conflicts under arrow/ipc, but it shouldn't be too bad to fix. |
cpp/src/arrow/util/decimal.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MSVC doesn't seem to have int128_t, does boost have something to backfill this? https://msdn.microsoft.com/en-us/library/cc953fe1.aspx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I first started this patch, I was using boost::multiprecision::int128_t, which would work across platforms. It's significantly more complex. Of course "more complex and working" is infinitely better than "less complex and not working at all" :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha, i reckon that can be tackled in a follow up patch. as long as this patch compiles in Appveyor we are good
|
Going to redo parts of the patch with boost. Lots of issues are showing up with |
|
Sorry about the rebase conflicts. They shouldn't be too bad, so to summarize
|
cpp/src/arrow/util/decimal.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create int128 atoi128(std::string s) utility function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to refactor this to use boost multiprecision, which has conversion functions so there's no need for this.
be209e1 to
390be34
Compare
|
Not quite ready for review, need to make sure I'm not including anything from previous commits and fix some of the styling crap. |
|
We need to use at least boost 1.60 for this PR to work. |
|
@wesm Is there a hard requirement to maintain compatibility with boost < 1.60? |
|
Actually, I think I can work around it for now. |
|
hm, boost on Ubuntu 14.04 is 1.57, which would be useful to support, but i understand there's been ongoing work in boost::multiprecision in recent releases |
dbb08ef to
7fd7185
Compare
cpp/src/arrow/array-decimal-test.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to redo this test, because we don't actually check here that values are all compatible with their respective types (i.e., same precision and scale).
cpp/src/arrow/builder.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is wrong, and needs to be fixed. I need to poke around to see if there are some functions lying around to reserve memory for bitmaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, but that would be useful -- there's other places in the builders where bitmaps need to get expanded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I'm right in the middle of writing them :)
|
This needs a rebase, sorry about that. |
4efcfcd to
57ef5de
Compare
wesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this was hairy, nice work. I mostly noted C++ style nitpicks, questions around symbol visibility (we'll see what Travis CI / Appveyor have to say about that), toolchain questions, with some error checking stuff
ci/travis_before_script_cpp.sh
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this do something other than -std=c++11?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, we should either
a) set this variable inside our CMakeLists.txt so it doesn't have to be specified on the command line
b) rely on the existing -std=c++11 being set in the CXX_COMMON_FLAGS https://github.com/apache/arrow/blob/master/cpp/cmake_modules/SetupCxxFlags.cmake#L47
but not both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed. I will remove it here and add a follow up patch to fix it more generically, and enforce that we must use c++11
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
ci/travis_script_python.sh
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same questions as above, in theory this should be handled internally in the CMAKE_CXX_FLAGS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the above link
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removing, see above comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm I thought I removed this.
cpp/src/arrow/array-decimal-test.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto expected_data = std::make_shared<Buffer>(...) is a bit more typical
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
cpp/src/arrow/array-decimal-test.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same with assignment here (auto foo = test::...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
cpp/src/arrow/array-decimal-test.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this std::dynamic_pointer_cast needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope, fixed.
cpp/src/arrow/util/decimal-test.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
cpp/src/arrow/util/decimal.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const backend_type& backend = decimal_value.backend()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can't do that because i'm mutating the backend with resize and memcpy below.
cpp/src/arrow/util/decimal.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could also assign to *reinterpret_cast<int32_t*>(*bytes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
cpp/src/arrow/util/decimal.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use using instead of typedef
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
python/pyarrow/schema.pyx
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could use box_data_type here
|
Oops, looks like 1c66097 broke glib, fixing now |
cpp/src/arrow/CMakeLists.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't have to link this library here, or in any library that links to libarrow. I think you only need to add boost_regex to these lists of link libs: https://github.com/apache/arrow/blob/master/cpp/CMakeLists.txt#L695
cpp/src/arrow/array.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're leaking boost/multiprecision headers in the public API here, can this be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep I can fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
cpp/src/arrow/builder.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaking boost headers in public API, we can't do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
cpp/src/arrow/ipc/CMakeLists.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't be needed in any of these linker statements
|
Based on whether https://github.com/apache/arrow/blob/master/cpp/CMakeLists.txt#L798 |
f6aed22 to
ef5c159
Compare
cpp/src/arrow/type.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly, this is leaking boost/multiprecision
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok, will fix
|
Looks like I'm leaking in |
|
indeed, I am OK with making helpers.h an internal header also, maybe remove it from api.h https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/api.h |
|
cool, removing. |
This reverts commit 7f6ea7e4ef95063471b4037fc7614bcf15e59864.
|
I'll remove the |
wesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, really great to get this done. Will merge once the build passes
|
Build passed except for conda s3 timeout. Any idea why this keeps happening? |
|
Hm, not sure. I wonder if this might be a symptom of rate limiting. |
This depends on: - [x] [ARROW-1607](apache#1128) - [x] [ARROW-1656](apache#1184) - [x] [ARROW-1588](apache#1211) - [x] Add tests for writing different sizes of values Author: Phillip Cloud <[email protected]> Author: Wes McKinney <[email protected]> Closes apache#403 from cpcloud/PARQUET-1095 and squashes the following commits: 8c3d222 [Phillip Cloud] Remove loop from BytesToInteger 63018bc [Wes McKinney] Suppress C4996 due to arrow/util/variant.h e4b02d3 [Phillip Cloud] Refactor types.h 83948ec [Phillip Cloud] Add last_value_ init 51965cd [Phillip Cloud] Min commit that contains the unique kernel in arrow e25c59b [Phillip Cloud] Fix reader writer test for unique kernel addition da0a7eb [Phillip Cloud] Update for ARROW-1811 16935de [Phillip Cloud] Reverse operand order and explicit cast 6036ca5 [Phillip Cloud] ARROW-1811 c5c4294 [Phillip Cloud] Fix issues 32a4abe [Phillip Cloud] Cleanup iteration a bit 920832a [Phillip Cloud] Update arrow version 9f97c1d [Phillip Cloud] Update for ARROW-1794: rename DecimalArray to Decimal128Array b2e0290 [Phillip Cloud] IWYU 64748a8 [Phillip Cloud] Copy from arrow for now 6c9e2a7 [Phillip Cloud] Reduce the number of decimal test cases 7ab2e5c [Phillip Cloud] Parameterize on precision 30655d6 [Phillip Cloud] Use arrow random_decimals 9ff7eb4 [Phillip Cloud] Remove specific template parameters 1eee6a9 [Phillip Cloud] Remove specific randint call 8808e4c [Phillip Cloud] Bump arrow version 659fbc1 [Phillip Cloud] Fix deprecated API call e162ca1 [Phillip Cloud] Allocate scratch space to hold the byteswapped values 5c9292b [Phillip Cloud] Proper dcheck call 1782da0 [Phillip Cloud] Use arrow 3d243d5 [Phillip Cloud] Checkpoint [ci skip] 028fb03 [Phillip Cloud] Remove garbage values 46dff15 [Phillip Cloud] Clean up uint32 test 613255e [Phillip Cloud] Do not use std::copy when reinterpret_cast will suffice 2917a62 [Phillip Cloud] PARQUET-1095: [C++] Read and write Arrow decimal values Change-Id: Ibe81cd5a5961bbe86c66db811ec8b770ae48c38b
Adds Decimal support for C++ and Python.
TODOs:
__int128_tare not exported: https://bugs.llvm.org//show_bug.cgi?id=26156.