-
Notifications
You must be signed in to change notification settings - Fork 191
PARQUET-712: Add library to read into Arrow memory #158
Conversation
|
The failing BooleanType test is not because of arrow but there seems to be a general problem inside parquet-cpp. Will make a PR with a separate failing test. |
|
I think we can leave item 3 for a follow up JIRA. The output for OS X (all the Thrift linker warnings) looks worrisome, not sure how to fix that off hand |
| PROPERTIES | ||
| BUILD_WITH_INSTALL_RPATH ON | ||
| INSTALL_NAME_DIR "@rpath") | ||
| endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: building static libparquet_arrow.a
|
@xhochy my understanding from google searches is that throwing a shared library's exception types outside its boundaries can be unreliable. So we may need to come up with a different way for Parquet library implementers to signal failure of an interface extension type's methods to parquet-cpp |
|
@xhochy as a workaround you can add a protected API to the base classes that allow subclasses in other shared libraries to throw ParquetException without using it directly. e.g. in subclass if (... /* error */) {
RaiseException("it failed");
}in ParquetAllocator (etc.) void RaiseException(const std::string& msg) {
throw ParquetException(msg);
} |
|
Or maybe add the exception thrower function to the public API |
|
Why is the output of the unit tests not visible? We should fix the Travis CI here to be like Arrow so we can see which test failed. Does my solution to the exception issue work? I think this is the only offending line that needs to be replaced with a function call from |
|
Just have to add https://github.com/apache/arrow/blob/master/ci/travis_script_cpp.sh#L19 |
|
Locally I encounter #161 now with a static build for |
|
I can merge this whenever the build is green |
|
There are still open issues I have to fix, so please wait until I remove the WIP prefix. |
|
With correct static linkage, we don't have the ParquetException problems. I'll open a JIRA for faster reads and a benchmarks for Arrow-Parquet I/O. This PR is so far done. @wesm conda builds are broken. Not sure if we can fix them / we want to fix them as they probably involve a cyclic Arrow<->Parquet dependency. |
|
@xhochy I'm going to set up conda builds in conda-forge with pinned git revisions -- the builds here were a stop gap so I'll remove them once conda-forge is all working |
|
+1, thank you! let's wait to remove arrow_parquet from Arrow until ARROW-280 is merged |
| break; | ||
| case ArrowType::UINT32: | ||
| if (properties.version() == ::parquet::ParquetVersion::PARQUET_1_0) { | ||
| type = ParquetType::INT64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xhochy Can you explain why UINT_32 is not emitted for Parquet version 1.0? There's zero comment and I can't find any explanation otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, I saw the comment elsewhere:
// Parquet 1.0 reader cannot read the UINT_32 logical type. Thus we need
// to use the larger Int64Type to store them lossless.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's still not very informative. Which "Parquet 1.0 reader" is it? parquet-mr? parquet-cpp? A particular version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems it is unclear about when it's acceptable to use the unsigned integer types (whether the "1.0" format version versus "2.0" format version -- here 1.0 refers to the Parquet format version).
At the moment this is just move of the existing code into the state where it compiles. Outstanding work includes: