Skip to content

Conversation

@shyambits2004
Copy link

Fix all the issues reported by valgrind and also enable option ARROW_TRAVIS_VALGRIND.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really being left uninitialized?

FWIW we zero the memory in arrow::BooleanBuilder because valgrind doesn't like in-place modifications of uninitialized bytes

https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/builder_primitive.cc#L167

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gandiva doesn't use the builders - it allocates the buffers directly in cpp code (for the batch), and update the buffers in IR code. using builders is tricky since they expect the updates also to happen through the builder APIs (eg. for tracking length).

Is it really being left uninitialized?

gandiva only updates the relevant bits. eg. for a projector with expression "a < b" having a batch of 6 elements, gandiva will update 6 bits in the output boolean vector (to either 0 or 1 depending on the values of a and b). The remaining 2 bits are left uninitialized.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't suggesting that you use the builders, just noting that we've also experienced valgrind issues with boolean arrays

Uninitialized bits are not an issue. I was curious why a whole byte is uninitialized

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still curious about the answer to this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what it means is that avoid Valgrind errors with uninitialized bits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, valgrind does require us to zero out the entire bitmap. @shyambits2004, can you please check if we do have a unit test that projects more than 8 elements?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitrou agreed, we zero out all of our bytes in BooleanBuilder for example. I'm trying to understand why the last byte. That is what seems weird to me -- per @pravindra it may be that the testing is not comprehensive enough

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, but Valgrind is able to detect individual uninitialized bits. So only the trailing bits in the last byte would be a problem.

It also depends which exact operation is used for setting or clearing the bits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware that valgrind had bit-level precision (http://valgrind.org/docs/memcheck2005.pdf) so this is probably it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wesm, you are right. valgrind complained when I modified the test to have 12 output elements. gandiva uses arrow::util::SetBitTo() to update bitmaps, and there's a comment in the function that it confuses valgrind.

I've opened ARROW-4115 for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::vector<uint8_t> bitmap(bitmap_capacity) might be a bit more idiomatic

@pravindra
Copy link
Contributor

@shyambits2004 there still seem to be some valgrind failures in the CI. Maybe, the CI is using different flags ?

@shyambits2004
Copy link
Author

I used "cmake .. -DARROW_GANDIVA=ON -DARROW_VALGRIND=on -DARROW_GANDIVA_BUILD_TESTS=ON -DARROW_TEST_MEMCHECK=ON" to fix all the valgrind errors. But run on travis seems to be failing with new error. I am trying to check the travis scripts to see if I missed a flag.

@pitrou
Copy link
Member

pitrou commented Dec 17, 2018

The errors are legitimate and point to the same pattern: "Mismatched free() / delete / delete []".

As @wesm said, it would be more idiomatic and less error-prone to use std::vector<T> when allocating an array of data, rather than std::unique_ptr<T[]>.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... Before we add suppressions for these, perhaps it would be possible to instead reclaim the memory automatically? (for example by using unique_ptr or shared_ptr instead of raw pointers).

@pitrou
Copy link
Member

pitrou commented Dec 17, 2018

For example, in eval_batch.h you have:

std::unique_ptr<uint8_t*> buffers_array_;

But also:

buffers_array_.reset(new uint8_t*[num_buffers]);

This means the declaration should really be:

std::unique_ptr<uint8_t*[]> buffers_array_;

This mistake can mostly go silently, but not always.

@shyambits2004
Copy link
Author

I was more interested in reproing the issue in my env and then fixing it. Did not want to use CI as testing. Did i miss a flag, not that I can see till now. Please let me know

@pitrou
Copy link
Member

pitrou commented Dec 17, 2018

@shyambits2004 I don't know. This might simply be a different compiler or libstdc++ version.

@wesm
Copy link
Member

wesm commented Dec 17, 2018

The CI environment is Ubuntu 14.04 / gcc 4.9 I think

@shyambits2004
Copy link
Author

shyambits2004 commented Dec 17, 2018

The reason I am more interested in reproducing the issue is I do not want ci to test bed.

My environment :
Description: Ubuntu 14.04.5 LTS
Release: 14.04

gcc (Ubuntu 4.9.4-2ubuntu1~14.04.1) 4.9.4
libstdc++ : 3.4

@shyambits2004
Copy link
Author

Tried all flags got from the travis job. But, unable to reproduce the problem.

cmake .. -DARROW_TEST_INCLUDE_LABELS=gandiva -DARROW_NO_DEPRECATED_API=ON -DARROW_EXTRA_ERROR_CONTEXT=ON -DARROW_JEMALLOC=ON -DARROW_GANDIVA=ON -DARROW_GANDIVA_JAVA=ON -DARROW_BUILD_TESTS=ON -DARROW_TEST_MEMCHECK=ON -DCMAKE_BUILD_TYPE=debug -DBUILD_WARNING_LEVEL=CHECKIN -DARROW_GANDIVA_BUILD_TESTS=ON

Any help on local repro is appreciated.

@wesm
Copy link
Member

wesm commented Dec 17, 2018

Here are the valgrind errors I get on Ubuntu 14.04 building with clang-6

https://gist.github.com/wesm/b83a83b954a4bc77ce245f4f29ceff7f

Here are my local options:

CC: clang-6.0
CXX: clang++-6.0
ARROW_CXXFLAGS: 
ARROW_OPTIONS: -GNinja -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DCMAKE_INSTALL_PREFIX=/home/wesm/test-install -DCMAKE_BUILD_TYPE=debug -DCMAKE_PREFIX_PATH=/home/wesm/local -DCMAKE_CXX_FLAGS='-D_GLIBCXX_USE_CXX11_ABI=0' -DARROW_OPTIONAL_INSTALL=ON -DARROW_VERBOSE_THIRDPARTY_BUILD=off -DARROW_NO_DEPRECATED_API=on -DARROW_EXTRA_ERROR_CONTEXT=on -DARROW_BOOST_USE_SHARED=on -DARROW_WITH_BZ2=ON -DARROW_WITH_ZSTD=ON -DARROW_BUILD_BENCHMARKS=ON -DARROW_BUILD_TESTS=ON -DARROW_FLIGHT=OFF -DARROW_GANDIVA=ON -DARROW_GANDIVA_JAVA=OFF -DARROW_GANDIVA_STATIC_LIBSTDCPP=OFF -DARROW_HDFS=on -DARROW_HIVESERVER2=on -DARROW_USE_GLOG=ON -DARROW_JEMALLOC=ON -DARROW_ORC=ON -DARROW_PARQUET=on -DPARQUET_BUILD_EXECUTABLES=on -DPARQUET_BUILD_EXAMPLES=on -DARROW_PYTHON=ON -DARROW_CUDA=ON -DBOOST_ROOT=/home/wesm/cpp-toolchain -DARROW_FUZZING=OFF -DARROW_TEST_MEMCHECK=ON -DARROW_USE_ASAN=OFF

@wesm
Copy link
Member

wesm commented Dec 18, 2018

Please let me and @pitrou know if we can assist -- this PR is now on the critical path for #3208

@shyambits2004
Copy link
Author

Fell sick yesterday. Will make sure this is done today.

I was unable to repro it on my local setup even with @wesm options. So, will use CI as testbed in this circumstances. If it the turnaround is taking time, we shall collaborate tonight IST/tomorrow morning PST to complete it. Hopefully, will be done today.

@wesm
Copy link
Member

wesm commented Dec 19, 2018

So when you run the following you get no errors? I'm on the same Linux distribution as you so it is disturbing to me that the results are different:

$ valgrind --tool=memcheck --suppressions=../valgrind.supp debug/gandiva-projector_test
==2177== Memcheck, a memory error detector
==2177== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==2177== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==2177== Command: debug/gandiva-projector_test
==2177== 
Running main() from gtest_main.cc
[==========] Running 14 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 14 tests from TestProjector
[ RUN      ] TestProjector.TestProjectCache
[       OK ] TestProjector.TestProjectCache (9027 ms)
[ RUN      ] TestProjector.TestProjectCacheFieldNames
[       OK ] TestProjector.TestProjectCacheFieldNames (1081 ms)
[ RUN      ] TestProjector.TestProjectCacheDouble
[       OK ] TestProjector.TestProjectCacheDouble (1147 ms)
[ RUN      ] TestProjector.TestProjectCacheFloat
[       OK ] TestProjector.TestProjectCacheFloat (918 ms)
[ RUN      ] TestProjector.TestIntSumSub
==2177== Mismatched free() / delete / delete []
==2177==    at 0x4C2C2BC: operator delete(void*) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2177==    by 0x545B03D: std::default_delete<unsigned char*>::operator()(unsigned char**) const (unique_ptr.h:76)
==2177==    by 0x545AD42: std::unique_ptr<unsigned char*, std::default_delete<unsigned char*> >::~unique_ptr() (unique_ptr.h:236)
==2177==    by 0x545CFA2: gandiva::EvalBatch::~EvalBatch() (in /home/wesm/code/arrow/cpp/build/debug/libgandiva.so.12.0.0)
==2177==    by 0x545CF58: void __gnu_cxx::new_allocator<gandiva::EvalBatch>::destroy<gandiva::EvalBatch>(gandiva::EvalBatch*) (in /home/wesm/code/arrow/cpp/build/debug/libgandiva.so.12.0.0)
==2177==    by 0x545CEF7: void std::allocator_traits<std::allocator<gandiva::EvalBatch> >::destroy<gandiva::EvalBatch>(std::allocator<gandiva::EvalBatch>&, gandiva::EvalBatch*) (in /home/wesm/code/arrow/cpp/build/debug/libgandiva.so.12.0.0)
==2177==    by 0x545A44B: std::_Sp_counted_ptr_inplace<gandiva::EvalBatch, std::allocator<gandiva::EvalBatch>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (shared_ptr_base.h:524)
==2177==    by 0x4677AB: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:149)
==2177==    by 0x467759: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() (shared_ptr_base.h:666)
==2177==    by 0x54563C8: std::__shared_ptr<gandiva::EvalBatch, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() (in /home/wesm/code/arrow/cpp/build/debug/libgandiva.so.12.0.0)
==2177==    by 0x5455DD4: std::shared_ptr<gandiva::EvalBatch>::~shared_ptr() (in /home/wesm/code/arrow/cpp/build/debug/libgandiva.so.12.0.0)
==2177==    by 0x54BD78C: gandiva::LLVMGenerator::Execute(arrow::RecordBatch const&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > > const&) (llvm_generator.cc:118)
==2177==  Address 0xd7324c0 is 0 bytes inside a block of size 64 alloc'd
==2177==    at 0x4C2B800: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2177==    by 0x545A69F: gandiva::EvalBatch::EvalBatch(long, int, int) (eval_batch.h:39)
==2177==    by 0x545A5EC: void __gnu_cxx::new_allocator<gandiva::EvalBatch>::construct<gandiva::EvalBatch, long, int&, int&>(gandiva::EvalBatch*, long&&, int&, int&) (in /home/wesm/code/arrow/cpp/build/debug/libgandiva.so.12.0.0)
==2177==    by 0x545A356: void std::allocator_traits<std::allocator<gandiva::EvalBatch> >::construct<gandiva::EvalBatch, long, int&, int&>(std::allocator<gandiva::EvalBatch>&, gandiva::EvalBatch*, long&&, int&, int&) (in /home/wesm/code/arrow/cpp/build/debug/libgandiva.so.12.0.0)
==2177==    by 0x545A277: std::_Sp_counted_ptr_inplace<gandiva::EvalBatch, std::allocator<gandiva::EvalBatch>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<long, int&, int&>(std::allocator<gandiva::EvalBatch>, long&&, int&, int&) (shared_ptr_base.h:515)
==2177==    by 0x545A141: void __gnu_cxx::new_allocator<std::_Sp_counted_ptr_inplace<gandiva::EvalBatch, std::allocator<gandiva::EvalBatch>, (__gnu_cxx::_Lock_policy)2> >::construct<std::_Sp_counted_ptr_inplace<gandiva::EvalBatch, std::allocator<gandiva::EvalBatch>, (__gnu_cxx::_Lock_policy)2>, std::allocator<gandiva::EvalBatch> const, long, int&, int&>(std::_Sp_counted_ptr_inplace<gandiva::EvalBatch, std::allocator<gandiva::EvalBatch>, (__gnu_cxx::_Lock_policy)2>*, std::allocator<gandiva::EvalBatch> const&&, long&&, int&, int&) (in /home/wesm/code/arrow/cpp/build/debug/libgandiva.so.12.0.0)
==2177==    by 0x5459FBB: void std::allocator_traits<std::allocator<std::_Sp_counted_ptr_inplace<gandiva::EvalBatch, std::allocator<gandiva::EvalBatch>, (__gnu_cxx::_Lock_policy)2> > >::construct<std::_Sp_counted_ptr_inplace<gandiva::EvalBatch, std::allocator<gandiva::EvalBatch>, (__gnu_cxx::_Lock_policy)2>, std::allocator<gandiva::EvalBatch> const, long, int&, int&>(std::allocator<std::_Sp_counted_ptr_inplace<gandiva::EvalBatch, std::allocator<gandiva::EvalBatch>, (__gnu_cxx::_Lock_policy)2> >&, std::_Sp_counted_ptr_inplace<gandiva::EvalBatch, std::allocator<gandiva::EvalBatch>, (__gnu_cxx::_Lock_policy)2>*, std::allocator<gandiva::EvalBatch> const&&, long&&, int&, int&) (in /home/wesm/code/arrow/cpp/build/debug/libgandiva.so.12.0.0)
==2177==    by 0x5459E16: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<gandiva::EvalBatch, std::allocator<gandiva::EvalBatch>, long, int&, int&>(std::_Sp_make_shared_tag, gandiva::EvalBatch*, std::allocator<gandiva::EvalBatch> const&, long&&, int&, int&) (shared_ptr_base.h:619)
==2177==    by 0x5459CFE: std::__shared_ptr<gandiva::EvalBatch, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<gandiva::EvalBatch>, long, int&, int&>(std::_Sp_make_shared_tag, std::allocator<gandiva::EvalBatch> const&, long&&, int&, int&) (shared_ptr_base.h:1089)
==2177==    by 0x5459C76: std::shared_ptr<gandiva::EvalBatch>::shared_ptr<std::allocator<gandiva::EvalBatch>, long, int&, int&>(std::_Sp_make_shared_tag, std::allocator<gandiva::EvalBatch> const&, long&&, int&, int&) (shared_ptr.h:316)
==2177==    by 0x5459BA8: std::shared_ptr<gandiva::EvalBatch> std::allocate_shared<gandiva::EvalBatch, std::allocator<gandiva::EvalBatch>, long, int&, int&>(std::allocator<gandiva::EvalBatch> const&, long&&, int&, int&) (in /home/wesm/code/arrow/cpp/build/debug/libgandiva.so.12.0.0)
==2177==    by 0x5455AC9: std::shared_ptr<gandiva::EvalBatch> std::make_shared<gandiva::EvalBatch, long, int&, int&>(long&&, int&, int&) (shared_ptr.h:609)
==2177== 
[       OK ] TestProjector.TestIntSumSub (311 ms)
[ RUN      ] TestProjector.TestIntSumSubCustomConfig
[       OK ] TestProjector.TestIntSumSubCustomConfig (722 ms)
[ RUN      ] TestProjector.TestAllIntTypes
[       OK ] TestProjector.TestAllIntTypes (12382 ms)
[ RUN      ] TestProjector.TestExtendedMath
[       OK ] TestProjector.TestExtendedMath (1285 ms)
[ RUN      ] TestProjector.TestFloatLessThan
[       OK ] TestProjector.TestFloatLessThan (537 ms)
[ RUN      ] TestProjector.TestIsNotNull
[       OK ] TestProjector.TestIsNotNull (520 ms)
[ RUN      ] TestProjector.TestZeroCopy
[       OK ] TestProjector.TestZeroCopy (589 ms)
[ RUN      ] TestProjector.TestZeroCopyNegative
[       OK ] TestProjector.TestZeroCopyNegative (52 ms)
[ RUN      ] TestProjector.TestDivideZero
[       OK ] TestProjector.TestDivideZero (574 ms)
[ RUN      ] TestProjector.TestModZero
[       OK ] TestProjector.TestModZero (533 ms)
[----------] 14 tests from TestProjector (29687 ms total)

[----------] Global test environment tear-down
[==========] 14 tests from 1 test case ran. (29731 ms total)
[  PASSED  ] 14 tests.
==2177== 
==2177== HEAP SUMMARY:
==2177==     in use at exit: 235,479 bytes in 1,497 blocks
==2177==   total heap usage: 1,173,684 allocs, 1,172,187 frees, 320,521,179 bytes allocated
==2177== 
==2177== LEAK SUMMARY:
==2177==    definitely lost: 0 bytes in 0 blocks
==2177==    indirectly lost: 0 bytes in 0 blocks
==2177==      possibly lost: 3,488 bytes in 56 blocks
==2177==    still reachable: 231,943 bytes in 1,435 blocks
==2177==         suppressed: 48 bytes in 6 blocks
==2177== Rerun with --leak-check=full to see details of leaked memory
==2177== 
==2177== For counts of detected and suppressed errors, rerun with: -v
==2177== ERROR SUMMARY: 17 errors from 1 contexts (suppressed: 416 from 29)

@shyambits2004
Copy link
Author

$ valgrind --tool=memcheck --suppressions=../valgrind.supp debug/gandiva-projector_test
==7850== Memcheck, a memory error detector
==7850== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==7850== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==7850== Command: debug/gandiva-projector_test
==7850==
Running main() from gtest_main.cc
[==========] Running 14 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 14 tests from TestProjector
[ RUN ] TestProjector.TestProjectCache
[ OK ] TestProjector.TestProjectCache (12301 ms)
[ RUN ] TestProjector.TestProjectCacheFieldNames
[ OK ] TestProjector.TestProjectCacheFieldNames (1617 ms)
[ RUN ] TestProjector.TestProjectCacheDouble
[ OK ] TestProjector.TestProjectCacheDouble (1700 ms)
[ RUN ] TestProjector.TestProjectCacheFloat
[ OK ] TestProjector.TestProjectCacheFloat (1393 ms)
[ RUN ] TestProjector.TestIntSumSub
[ OK ] TestProjector.TestIntSumSub (424 ms)
[ RUN ] TestProjector.TestIntSumSubCustomConfig
[ OK ] TestProjector.TestIntSumSubCustomConfig (1113 ms)
[ RUN ] TestProjector.TestAllIntTypes
[ OK ] TestProjector.TestAllIntTypes (17879 ms)
[ RUN ] TestProjector.TestExtendedMath
[ OK ] TestProjector.TestExtendedMath (1819 ms)
[ RUN ] TestProjector.TestFloatLessThan
[ OK ] TestProjector.TestFloatLessThan (768 ms)
[ RUN ] TestProjector.TestIsNotNull
[ OK ] TestProjector.TestIsNotNull (767 ms)
[ RUN ] TestProjector.TestZeroCopy
[ OK ] TestProjector.TestZeroCopy (838 ms)
[ RUN ] TestProjector.TestZeroCopyNegative
[ OK ] TestProjector.TestZeroCopyNegative (68 ms)
[ RUN ] TestProjector.TestDivideZero
[ OK ] TestProjector.TestDivideZero (815 ms)
[ RUN ] TestProjector.TestModZero
[ OK ] TestProjector.TestModZero (767 ms)
[----------] 14 tests from TestProjector (42290 ms total)

[----------] Global test environment tear-down
[==========] 14 tests from 1 test case ran. (42346 ms total)
[ PASSED ] 14 tests.
==7850==
==7850== HEAP SUMMARY:
==7850== in use at exit: 227,647 bytes in 1,384 blocks
==7850== total heap usage: 1,170,674 allocs, 1,169,290 frees, 320,098,093 bytes allocated
==7850==
==7850== LEAK SUMMARY:
==7850== definitely lost: 0 bytes in 0 blocks
==7850== indirectly lost: 0 bytes in 0 blocks
==7850== possibly lost: 448 bytes in 8 blocks
==7850== still reachable: 227,151 bytes in 1,370 blocks
==7850== suppressed: 48 bytes in 6 blocks
==7850== Rerun with --leak-check=full to see details of leaked memory
==7850==
==7850== For counts of detected and suppressed errors, rerun with: -v
==7850== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 836 from 47)

@shyambits2004
Copy link
Author

The only difference I found is suppressions are more. So, took out the suppressions file in hope that Mismatched delete was suppressed, but did not hit the Mismatched error (other errors were reported).

@shyambits2004
Copy link
Author

This is actually frustrating. Just one more thing, What is the kernel version ?

$ uname -a
Linux gandiva-hyd 4.4.0-140-generic #166~14.04.1-Ubuntu SMP Sat Nov 17 01:52:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

@wesm
Copy link
Member

wesm commented Dec 19, 2018

Nearly the same kernel

$ uname -a
Linux badgerpad16 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

@wesm
Copy link
Member

wesm commented Dec 19, 2018

I'm building locally with clang-6.0. In CI it's gcc 4.9 though

@shyambits2004
Copy link
Author

Ok. I am pretty sure, probably a very minute difference but hard to uncover at this point.

Looks like CI does not want to act like test bed too.

~/build/apache/arrow/cpp-build ~/build/apache/arrow
Test project /home/travis/build/apache/arrow/cpp-build
No tests were found!!!
~/build/apache/arrow
The command "$TRAVIS_BUILD_DIR/ci/travis_script_gandiva_cpp.sh" exited with 0.

Did something change ?

@shyambits2004 shyambits2004 force-pushed the master branch 7 times, most recently from 3ebc184 to ef380da Compare December 19, 2018 09:41
@shyambits2004 shyambits2004 force-pushed the master branch 4 times, most recently from c772d48 to a66c288 Compare December 19, 2018 11:20
Copy link
Contributor

@pravindra pravindra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good to me, just a few comments.

.travis.yml Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing --only-library probably makes building slower. Is it necessary?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will go away with ARROW-3803

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this option, gandiva tests were not running as part of CI. I guess in the travis gandiva script, gandiva tests option is enabled only if "--only-library" was not enabled. But, if the 3803 takes care of it, I am fine removing it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the Valgrind documentation:

Performance overhead: origin tracking is expensive. It halves Memcheck's speed and increases memory use by a minimum of 100MB, and possibly more. Nevertheless it can drastically reduce the effort required to identify the root cause of uninitialised value errors, and so is often a programmer productivity win, despite running more slowly.

I'm not sure we want to make CI builds slower than they need to be. Is this necessary in your opinion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say only enable this in local, if this doesn't increase the coverage of finding new bugs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I'll remove this

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fine. It can be a local env addition, if valgrind reports some errors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it "gandiva-tests" rather than "gandiva"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the result of 9fcce64

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds bizarre to have the name "tests" in test labels. Is it because Gandiva has microbenchmakrs in its tests as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, we removed the microbenchmarks from the tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look more closely at 9fcce64

We now have targets gandiva (libraries) gandiva-tests (tests) and gandiva-benchmarks (benchmarks). The label matches the target name

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok. I guess it doesn't make a difference when running e.g. ctest -L arrow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bitmap.data() is more idiomatic, though both are ok.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data should be zero initialized in the vector.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which also implies that all memset are useless with this change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed them

@wesm
Copy link
Member

wesm commented Dec 19, 2018

I will tweak this and then merge so I can get a passing build in ARROW-3803 and then merge that. @pitrou could you have a look at that, since the only thing that needs to change there is to re-enable valgrind

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Waiting for CI and then will merge and rebase ARROW-3803 with valgrind re-enabled so we can hopefully get that merged soon today

@codecov-io
Copy link

Codecov Report

Merging #3201 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3201      +/-   ##
==========================================
+ Coverage   86.42%   86.42%   +<.01%     
==========================================
  Files         508      508              
  Lines       70077    70080       +3     
==========================================
+ Hits        60566    60570       +4     
+ Misses       9404     9403       -1     
  Partials      107      107
Impacted Files Coverage Δ
cpp/src/gandiva/eval_batch.h 100% <ø> (ø) ⬆️
cpp/src/gandiva/projector.cc 57.14% <100%> (+0.73%) ⬆️
cpp/src/gandiva/local_bitmaps_holder.h 100% <100%> (ø) ⬆️
cpp/src/gandiva/exported_funcs_registry.h 100% <100%> (ø) ⬆️
cpp/src/gandiva/exported_funcs_registry.cc 100% <0%> (ø) ⬆️
cpp/src/gandiva/exported_funcs.h 100% <0%> (+16.66%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b8d4477...1713b8d. Read the comment docs.

@fsaintjacques
Copy link
Contributor

\o/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants