-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-15709: [C++] Compilation of ARROW_ENGINE fails if doing an "inline" build #12457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-15709: [C++] Compilation of ARROW_ENGINE fails if doing an "inline" build #12457
Conversation
|
@kou @pitrou We could use input on this approach. The Substrait integration is a little unique because the Protobuf is also a bit tricky because the version of the code generation tool (protoc) must match the version of the linked protobuf library exactly (this is different from flatbuffers which doesn't require a runtime dependency). Previously we were downloading the proto files using git and running protoc within the engine directory. This had some negative side effects (#12454) mainly due to our use of git. Even beyond git however it was rather strange that this was resulting in a call to Any suggestions on a better approach or feedback on our current approach would be very much welcomed. |
a049bdd to
fd6e5d4
Compare
|
How about using just an OBJECT library https://cmake.org/cmake/help/latest/command/add_library.html#object-lib instead of diff --git a/cpp/cmake_modules/BuildUtils.cmake b/cpp/cmake_modules/BuildUtils.cmake
index 391c43e0ac..3afecfc562 100644
--- a/cpp/cmake_modules/BuildUtils.cmake
+++ b/cpp/cmake_modules/BuildUtils.cmake
@@ -297,6 +297,14 @@ function(ADD_ARROW_LIB LIB_NAME)
target_precompile_headers(${LIB_NAME}_objlib PRIVATE ${ARG_PRECOMPILED_HEADERS})
endif()
set(LIB_DEPS $<TARGET_OBJECTS:${LIB_NAME}_objlib>)
+ # We need to specify $<TARGET_OBJECTS:XXX> explicitly to SHARED
+ # library and STATIC library because $<TARGET_OBJECTS:XXX> in
+ # OBJECT isn't propagated to SHARED library and STATIC library.
+ foreach(ARG_SOURCE ${ARG_SOURCES})
+ if(ARG_SOURCE MATCHES "^\\$<TARGET_OBJECTS:")
+ list(APPEND LIB_DEPS ${ARG_SOURCE})
+ endif()
+ endforeach()
set(LIB_INCLUDES)
set(EXTRA_DEPS)
diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake b/cpp/cmake_modules/ThirdpartyToolchain.cmake
index 8b53272452..9143550df9 100644
--- a/cpp/cmake_modules/ThirdpartyToolchain.cmake
+++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake
@@ -1687,22 +1687,13 @@ macro(build_substrait)
add_custom_target(substrait_gen ALL DEPENDS ${SUBSTRAIT_PROTO_GEN_ALL})
- add_arrow_lib(arrow_substrait
- SOURCES
- ${SUBSTRAIT_SOURCES}
- DEPENDENCIES
- substrait_gen
- SHARED_LINK_LIBS
- ${ARROW_PROTOBUF_LIBPROTOBUF}
- STATIC_LINK_LIBS
- ${ARROW_PROTOBUF_LIBPROTOBUF}
- PRIVATE_INCLUDES
- ${SUBSTRAIT_CPP_DIR})
-
- set(SUBSTRAIT_DEPENDENCIES substrait_gen)
- set(SUBSTRAIT_SHARED arrow_substrait_shared)
- set(SUBSTRAIT_STATIC arrow_substrait_static)
- set(SUBSTRAIT_INCLUDES ${SUBSTRAIT_CPP_DIR})
+ add_library(substrait OBJECT ${SUBSTRAIT_SOURCES})
+ set_target_properties(substrait
+ PROPERTIES
+ INCLUDE_DIRECTORIES ${SUBSTRAIT_CPP_DIR}
+ POSITION_INDEPENDENT_CODE ON)
+ add_dependencies(substrait substrait_gen)
+ get_target_property(SUBSTRAIT_INCLUDES substrait INCLUDE_DIRECTORIES)
endmacro()
if(ARROW_WITH_SUBSTRAIT)
diff --git a/cpp/src/arrow/engine/CMakeLists.txt b/cpp/src/arrow/engine/CMakeLists.txt
index 50a84464c0..ff88f47e4a 100644
--- a/cpp/src/arrow/engine/CMakeLists.txt
+++ b/cpp/src/arrow/engine/CMakeLists.txt
@@ -36,9 +36,10 @@ add_arrow_lib(arrow_engine
OUTPUTS
ARROW_ENGINE_LIBRARIES
DEPENDENCIES
- ${SUBSTRAIT_DEPENDENCIES}
+ substrait
SOURCES
${ARROW_ENGINE_SRCS}
+ $<TARGET_OBJECTS:substrait>
PRECOMPILED_HEADERS
"$<$<COMPILE_LANGUAGE:CXX>:arrow/engine/pch.h>"
SHARED_LINK_FLAGS
@@ -46,11 +47,9 @@ add_arrow_lib(arrow_engine
SHARED_LINK_LIBS
arrow_shared
arrow_dataset_shared
- ${SUBSTRAIT_SHARED}
STATIC_LINK_LIBS
arrow_static
arrow_dataset_static
- ${SUBSTRAIT_STATIC}
PRIVATE_INCLUDES
${SUBSTRAIT_INCLUDES})
|
|
@kou Thanks for the input! It's certainly a lot nicer to do it that way, but unfortunately, we're not quite there yet... I got it working in most build environments using your diff with some minor changes (had to add protobuf to the include list for the object library, and add its runtime library to arrow_engine), but it fails here: I feel like the addition to BuildUtils.cmake is either causing this or supposed to resolve this (and failing to do so in this particular case), but other than that, my skill level for CMake is insufficient to see what exactly is going on here. |
| set(SUBSTRAIT_SOURCE_URL "$ENV{ARROW_SUBSTRAIT_URL}") | ||
| else() | ||
| set_urls(SUBSTRAIT_SOURCE_URL | ||
| "https://github.com/substrait-io/substrait/archive/${ARROW_SUBSTRAIT_BUILD_VERSION}.tar.gz" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, this will be much better than involving some git commands.
cpp/src/arrow/engine/CMakeLists.txt
Outdated
| substrait | ||
| SOURCES | ||
| ${ARROW_ENGINE_SRCS} | ||
| $<TARGET_OBJECTS:substrait> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... is there a particular reason for this? Should we instead create a static library containing these objects? Or perhaps I'm entirely misunderstanding this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use a static library instead of an object library without using add_arrow_lib(). We can just use raw add_library().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like that's working, 577979b. I wasn't sure how generally-applicable linking static libraries with -fPIC into shared libraries would be. I'm also not sure why I still need to specify PRIVATE_INCLUDES ${SUBSTRAIT_INCLUDES} for the add_arrow_lib call (shouldn't CMake be propagating public include directories to dependent libraries?), but I couldn't get it to work without it.
| # under the License. | ||
|
|
||
| # Currently, we can only build Substrait from source. | ||
| set(SUBSTRAIT_FOUND NO) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this file?
It seems that this file is never used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cpp/src/arrow/engine/CMakeLists.txt
Outdated
| DEPENDENCIES | ||
| substrait |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this?
This will be automatically added by using substrait target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't build (multithreaded) without it. I think it ensures that substrait is built completely prior to compilation of arrow_engine, rather than just prior to linking, which matters because arrow_engine uses generated code. Making it depend on substrait_gen would also work, if you prefer that option (IMO, the less arrow_engine depends on implementation details of ThirdpartyToolchain.cmake the better, so I set it to just substrait).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that this is a problem of our add_arrow_lib():
diff --git a/cpp/cmake_modules/BuildUtils.cmake b/cpp/cmake_modules/BuildUtils.cmake
index 391c43e0ac..7517e4e806 100644
--- a/cpp/cmake_modules/BuildUtils.cmake
+++ b/cpp/cmake_modules/BuildUtils.cmake
@@ -374,6 +374,13 @@ function(ADD_ARROW_LIB LIB_NAME)
"$<INSTALL_INTERFACE:${ARG_SHARED_INSTALL_INTERFACE_LIBS}>"
LINK_PRIVATE
${ARG_SHARED_PRIVATE_LINK_LIBS})
+ if(USE_OBJLIB)
+ foreach(ARG_SHARED_LINK_LIB ${ARG_SHARED_LINK_LIBS})
+ if(TARGET ${ARG_SHARED_LINK_LIB})
+ add_dependencies(${LIB_NAME}_objlib ${ARG_SHARED_LINK_LIB})
+ endif()
+ endforeach()
+ endif()
if(ARROW_RPATH_ORIGIN)
if(APPLE)
@@ -449,6 +456,13 @@ function(ADD_ARROW_LIB LIB_NAME)
if(ARG_STATIC_LINK_LIBS)
target_link_libraries(${LIB_NAME}_static LINK_PRIVATE
"$<BUILD_INTERFACE:${ARG_STATIC_LINK_LIBS}>")
+ if(USE_OBJLIB)
+ foreach(ARG_STATIC_LINK_LIB ${ARG_STATIC_LINK_LIBS})
+ if(TARGET ${ARG_STATIC_LINK_LIB})
+ add_dependencies(${LIB_NAME}_objlib ${ARG_STATIC_LINK_LIB})
+ endif()
+ endforeach()
+ endif()
endif()
install(TARGETS ${LIB_NAME}_static ${INSTALL_IS_OPTIONAL}
diff --git a/cpp/src/arrow/engine/CMakeLists.txt b/cpp/src/arrow/engine/CMakeLists.txt
index 13d50b4244..edb878939f 100644
--- a/cpp/src/arrow/engine/CMakeLists.txt
+++ b/cpp/src/arrow/engine/CMakeLists.txt
@@ -35,8 +35,6 @@ add_arrow_lib(arrow_engine
arrow-engine
OUTPUTS
ARROW_ENGINE_LIBRARIES
- DEPENDENCIES
- substrait
SOURCES
${ARROW_ENGINE_SRCS}
PRECOMPILED_HEADERSThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that fixes it on my end, too. fffa65c
| set_target_properties(substrait PROPERTIES POSITION_INDEPENDENT_CODE ON) | ||
| target_include_directories(substrait PUBLIC ${SUBSTRAIT_INCLUDES}) | ||
| target_link_libraries(substrait INTERFACE ${ARROW_PROTOBUF_LIBPROTOBUF}) | ||
| add_dependencies(substrait substrait_gen) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add substrait to ARROW_BUNDLED_STATIC_LIBS?
| add_dependencies(substrait substrait_gen) | |
| list(APPEND ARROW_BUNDLED_STATIC_LIBS substrait) |
If we don't add this, users can't link to libarrow_engine.a statically with libarrow_bundled_dependencies.a.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in 3d7bac9
The removal of add_dependencies(substrait substrait_gen) seems unrelated. I suppose it's technically redundant, since CMake will generate the .cc files by way of needing them as input for compiling substrait, and the .h files just so happen to be generated by the same command. If for whatever reason only the generated .h files are removed though, CMake won't know to recreate them without that rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry. It's my fault.
|
Failure of AMD64 Ubuntu 20.04 GLib & Ruby for |
|
@github-actions crossbow submit python-sdist |
|
Revision: fffa65c Submitted crossbow builds: ursacomputing/crossbow @ actions-1676
|
|
@kou @pitrou @westonpace I believe this is ready to be merged, or am I still missing something? |
| LINK_PRIVATE | ||
| ${ARG_SHARED_PRIVATE_LINK_LIBS}) | ||
|
|
||
| if(USE_OBJLIB) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment explaining this snippet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right.
cpp/cmake_modules/BuildUtils.cmake
Outdated
| ${ARG_SHARED_PRIVATE_LINK_LIBS}) | ||
|
|
||
| if(USE_OBJLIB) | ||
| foreach(ARG_SHARED_LINK_LIB ${ARG_SHARED_LINK_LIBS}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor, but naming ARG_SOMETHING a variable that is not one of the original function arguments is confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| add_custom_command(OUTPUT "${SUBSTRAIT_PROTO_GEN}.${EXT}" | ||
| COMMAND ${ARROW_PROTOBUF_PROTOC} "-I${SUBSTRAIT_LOCAL_DIR}/proto" | ||
| "--cpp_out=${SUBSTRAIT_CPP_DIR}" | ||
| "${SUBSTRAIT_LOCAL_DIR}/proto/substrait/${SUBSTRAIT_PROTO}.proto" | ||
| DEPENDS ${PROTO_DEPENDS} substrait_ep) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like it compiles each proto file twice (once per extension)? Is that desired?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, probably not. I noticed it too when I inherited this code, but incorrectly assumed it was written like that because add_custom_command only supported a single output file. 7ad2e1f
|
Perhaps @kou can also double-check the CMake changes. |
westonpace
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have nothing further to add. I think this looks much nicer than what we had before. We might try and take some snippets from here and push them upstream into the Substrait project's documentation in some kind of "getting started with C++" section.
kou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
| LINK_PRIVATE | ||
| ${ARG_SHARED_PRIVATE_LINK_LIBS}) | ||
|
|
||
| if(USE_OBJLIB) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right.
|
Benchmark runs are scheduled for baseline = acfd1d2 and contender = 89cc6b3. 89cc6b3 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
…ine" build This should fix: - inline builds in general (ARROW-15709); - [weird stuff with inline builds causing non-tracked files to be deleted](https://github.com/apache/arrow;/pull/12444#issuecomment-1043143303) that the [previous fix](apache#12444) for the above [caused](apache#12454) - dependencies on git for downloading dependencies (ARROW-15760); - the build process for Substrait previously being treated as something too special to use Arrow's normal method for dealing with third-party dependencies (i.e. `ThirdpartyToolchain.cmake`) --- Initial attempt at making something functional to solve this issue properly. The use of `add_arrow_lib` in `ThirdpartyToolchain.cmake` is certainly odd, and I'm sure I'm not following best practices in that file in general. I could use some advice on what the proper way to do this would be. Some of the issues: - The CMake property specifying that a path refers to a generated file is scoped only to the current CMake file, so only moving the `externalproject_add` over to `ThirdpartyToolchain.cmake` resulted in the `add_arrow_lib` in `src/arrow/engine` failing due to missing source files. An object library didn't seem to resolve it either, and there are probably portability issues with that anyway, so that's why I ended up just using `add_arrow_lib`. - Unlike all the other third-party dependencies (AFAICT), Substrait can't currently be installed, so `Substrait_SOURCE=SYSTEM` makes no sense. In the end I just decided to override it, but that's probably not ideal. - Substrait doesn't have releases yet, so I had to resort to a git hash instead. </details> Closes apache#12457 from jvanstraten/ARROW-15709-Compilation-of-ARROW-ENGINE-fails-if-doi Authored-by: Jeroen van Straten <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]> (cherry picked from commit 89cc6b3)
* ARROW-15238: [C++] ARROW_ENGINE module with substrait consumer Continuation of apache#11707. I'm taking over from @bkietz for now because he's unavailable right now for personal reasons. Closes apache#12279 from jvanstraten/substrait-consumer Lead-authored-by: Benjamin Kietzman <[email protected]> Co-authored-by: Jeroen van Straten <[email protected]> Co-authored-by: Weston Pace <[email protected]> Signed-off-by: Weston Pace <[email protected]> * ARROW-15700: [C++] Compilation error on Ubuntu 18.04 Modified Substrait consumer interaction with libprotobuf to support versions down to 3.0.0, which is the minimum required due to Substrait's usage of proto3 syntax. Tested locally with: ``` export ARROW_PROTOBUF_URL=https://github.com/protocolbuffers/protobuf/releases/download/v3.0.0/protobuf-cpp-3.0.0.tar.gz cmake \ --preset ninja-debug \ -DProtobuf_SOURCE=BUNDLED \ -DARROW_PROTOBUF_BUILD_VERSION=v3.0.0 \ -DARROW_PROTOBUF_BUILD_SHA256_CHECKSUM=318e8f375fb4e5333975a40e0d1215e855b4a8c581d692eb0eb7df70db1a8d4e ``` (Is there an easier way to do this without modifying versions.txt or 751fb9d? Also, the env var is needed only because Google isn't at all consistent with their release file naming that far back.) It'd also be nice to add this to CI, but it's probably excessive to always run for a PR, unless combined with some other run. Closes apache#12448 from jvanstraten/ARROW-15700-Compilation-error-on-Ubuntu-18-04 Authored-by: Jeroen van Straten <[email protected]> Signed-off-by: Weston Pace <[email protected]> (cherry picked from commit 7d16a78) * ARROW-15258: [C++] Easy options to create a source node from a table This PR includes the addition of `TableSourceNode` to create a `ExecNode` easily using a table as the data source. ### TODO - [x] Fix test case for chunk_size Closes apache#12267 from vibhatha/arrow-15258-rb Authored-by: Vibhatha Abeykoon <[email protected]> Signed-off-by: Weston Pace <[email protected]> (cherry picked from commit fffdca2) * ARROW-15709: [C++] Compilation of ARROW_ENGINE fails if doing an "inline" build This should fix: - inline builds in general (ARROW-15709); - [weird stuff with inline builds causing non-tracked files to be deleted](https://github.com/apache/arrow;/pull/12444#issuecomment-1043143303) that the [previous fix](apache#12444) for the above [caused](apache#12454) - dependencies on git for downloading dependencies (ARROW-15760); - the build process for Substrait previously being treated as something too special to use Arrow's normal method for dealing with third-party dependencies (i.e. `ThirdpartyToolchain.cmake`) --- Initial attempt at making something functional to solve this issue properly. The use of `add_arrow_lib` in `ThirdpartyToolchain.cmake` is certainly odd, and I'm sure I'm not following best practices in that file in general. I could use some advice on what the proper way to do this would be. Some of the issues: - The CMake property specifying that a path refers to a generated file is scoped only to the current CMake file, so only moving the `externalproject_add` over to `ThirdpartyToolchain.cmake` resulted in the `add_arrow_lib` in `src/arrow/engine` failing due to missing source files. An object library didn't seem to resolve it either, and there are probably portability issues with that anyway, so that's why I ended up just using `add_arrow_lib`. - Unlike all the other third-party dependencies (AFAICT), Substrait can't currently be installed, so `Substrait_SOURCE=SYSTEM` makes no sense. In the end I just decided to override it, but that's probably not ideal. - Substrait doesn't have releases yet, so I had to resort to a git hash instead. </details> Closes apache#12457 from jvanstraten/ARROW-15709-Compilation-of-ARROW-ENGINE-fails-if-doi Authored-by: Jeroen van Straten <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]> (cherry picked from commit 89cc6b3) * ARROW-15830: [C++] Ensure target directory exists before running Substrait generation Closes apache#12548 from pitrou/ARROW-15830-ubuntu-cpp-bundled Authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]> (cherry picked from commit e989fb3) * ARROW-15850: [C++] Engine substrait headers missing from install Closes apache#12569 from westonpace/feature/ARROW-15850--substrait-headers-missing Authored-by: Weston Pace <[email protected]> Signed-off-by: David Li <[email protected]> (cherry picked from commit 8fce593) * Enable TPCH Q6 & Q1 Co-authored-by: Benjamin Kietzman <[email protected]> Co-authored-by: Jeroen van Straten <[email protected]> Co-authored-by: Weston Pace <[email protected]> Co-authored-by: Vibhatha Abeykoon <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]>
This should fix:
ThirdpartyToolchain.cmake)Initial attempt at making something functional to solve this issue properly.
The use of
add_arrow_libinThirdpartyToolchain.cmakeis certainly odd, and I'm sure I'm not following best practices in that file in general. I could use some advice on what the proper way to do this would be. Some of the issues:externalproject_addover toThirdpartyToolchain.cmakeresulted in theadd_arrow_libinsrc/arrow/enginefailing due to missing source files. An object library didn't seem to resolve it either, and there are probably portability issues with that anyway, so that's why I ended up just usingadd_arrow_lib.Substrait_SOURCE=SYSTEMmakes no sense. In the end I just decided to override it, but that's probably not ideal.