Skip to content

Conversation

@github-actions
Copy link

github-actions bot commented Nov 3, 2022

@github-actions
Copy link

github-actions bot commented Nov 3, 2022

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@paleolimbot
Copy link
Member Author

@github-actions crossbow submit test-r-linux-valgrind homebrew-r-autobrew homebrew-r-brew

@github-actions
Copy link

github-actions bot commented Nov 3, 2022

Revision: 3972e22c3ae343d6ed9528d8c9501b7c12adad9a

Submitted crossbow builds: ursacomputing/crossbow @ actions-bd233fcb8e

Task Status
homebrew-r-autobrew Github Actions
homebrew-r-brew Github Actions
test-r-linux-valgrind Azure

@paleolimbot
Copy link
Member Author

@github-actions crossbow submit homebrew-r-autobrew homebrew-r-brew

@github-actions
Copy link

Revision: 8d4f884f2bcc2d2ea86dee74fcf9b60e06a4b406

Submitted crossbow builds: ursacomputing/crossbow @ actions-e12a4af9f0

Task Status
homebrew-r-autobrew Github Actions
homebrew-r-brew Github Actions

@paleolimbot
Copy link
Member Author

@github-actions crossbow submit homebrew-r-autobrew homebrew-r-brew

@github-actions
Copy link

Revision: 8d4f884f2bcc2d2ea86dee74fcf9b60e06a4b406

Submitted crossbow builds: ursacomputing/crossbow @ actions-129ee54503

Task Status
homebrew-r-autobrew Github Actions
homebrew-r-brew Github Actions

@paleolimbot
Copy link
Member Author

I updated the JIRA too, but the problem here is I think that the MacOS runners have 1 CPU. I can reproduce on Linux and Mac by doing set_cpu_count(1):

library(arrow, warn.conflicts = FALSE)

dataset_dir <- tempfile()
write_dataset(mtcars, dataset_dir, )

# change to set_cpu_count(1) to reproduce the deadlock
set_cpu_count(2)

open_dataset(dataset_dir) |> 
  dplyr::filter(mpg > 30) |> 
  head() |> 
  dplyr::collect()

@assignUser
Copy link
Member

The GitHub Mac runners are supposed to have 3 cores https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources

Still good to be able to reproduce the issue :D

@paleolimbot
Copy link
Member Author

Oh no! Maybe I reproduced a different issue 😬 (but either way it seems like fixing the deadlock will force some design change that may help!)

@paleolimbot
Copy link
Member Author

paleolimbot commented Dec 16, 2022

A more detailed investigation of the threads present when the hang occurs. I'm guessing what's happening is that the CPU thread pool needs to create a thread somewhere but can't because there aren't enough of them. Where that happens isn't obvious in the thread backtraces, however.

The R code. Run this from a terminal (not from RStudio). If you want to launch R using a debugger you can run R -d gdb (for example) instead of just R. My workflow is to run it normally and attach LLDB from VSCode (you can get the PID in R by running Sys.getpid()).

library(arrow, warn.conflicts = FALSE)
# Information you need to attach LLDB
Sys.getpid()

dataset_dir <- tempfile()
write_dataset(mtcars, dataset_dir)

# change to set_cpu_count(1) to reproduce the deadlock
set_cpu_count(1)

open_dataset(dataset_dir) |> 
  dplyr::filter(mpg > 30) |> 
  head() |> 
  dplyr::collect()

Conceptually I expect this to create two exec plans:

  • open_dataset() |> filter()
  • <The RecordBatchReader Resulting from the first ExecPlan>. This is run through an ExecPlan to make sure that it's collected under RunWithCapturedR.
  • The fact that the head() is not turned into a topK sink node is a bit weird but also shouldn't result in a hang. The fact that a second ExecPlan exists in the first place could also be fixed: we could run every collection of a RecordBatchReader to Table via RunWithCapturedR. The idea there was to limit the surface of RunWithCapturedR to the places it was actually required since it's complicated and results in stuff like this happening. Again, chaining ExecPlans should be possible and shouldn't hang.

The final RecordBatchReader will get read into a Table; no evaluation of the ExecPlan should occur before the reader->ToTable(); call. That call should occur on a non main-R thread launched by RunWithCapturedR(). Currently that thread is borrowed from the IO thread pool.

Thread 1

The main R thread, waiting for SafeCallIntoR jobs to be submitted. This seems normal.

Details
bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00000001a1cba5e4 libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x00000001a1cf6638 libsystem_pthread.dylib`_pthread_cond_wait + 1232
    frame #2: 0x00000001a1c43ac4 libc++.1.dylib`std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 28
    frame #3: 0x0000000280806520 libarrow.1100.0.0.dylib`void std::__1::condition_variable::wait<arrow::internal::SerialExecutor::RunLoop()::$_0>(this=0x0000600000a88208, __lk=0x000000016afcece8, __pred=(0x000000016afced58))::$_0) at __mutex_base:404:9
    frame #4: 0x0000000280805b74 libarrow.1100.0.0.dylib`arrow::internal::SerialExecutor::RunLoop(this=0x000000016afced58) at thread_pool.cc:174:28
    frame #5: 0x000000010948a324 arrow.so`arrow::Future<std::__1::shared_ptr<arrow::Table>> arrow::internal::SerialExecutor::Run<std::__1::shared_ptr<arrow::Table>, arrow::Result<std::__1::shared_ptr<arrow::Table>>>(this=0x000000016afced58, initial_task=arrow::internal::SerialExecutor::TopLevelTask<std::__1::shared_ptr<arrow::Table> > @ 0x000000016afced50)>) at thread_pool.h:392:5 [opt]
    frame #6: 0x0000000109489efc arrow.so`arrow::Result<std::__1::shared_ptr<arrow::Table>> arrow::internal::SerialExecutor::RunInSerialExecutor<std::__1::shared_ptr<arrow::Table>, arrow::Future<std::__1::shared_ptr<arrow::Table>>, arrow::Result<std::__1::shared_ptr<arrow::Table>>>(initial_task=arrow::internal::SerialExecutor::TopLevelTask<std::__1::shared_ptr<arrow::Table> > @ 0x000000016afcedb0)>) at thread_pool.h:289:38 [opt]
    frame #7: 0x0000000109489ca4 arrow.so`arrow::Result<std::__1::shared_ptr<arrow::Table>> RunWithCapturedR<std::__1::shared_ptr<arrow::Table>>(make_arrow_call=function<arrow::Future<std::__1::shared_ptr<arrow::Table> > ()> @ 0x000000016afcee48)>) at safe-call-into-r.h:287:29 [opt]
    frame #8: 0x000000010947a8dc arrow.so`arrow::Result<std::__1::shared_ptr<arrow::Table>> RunWithCapturedRIfPossible<std::__1::shared_ptr<arrow::Table>>(make_arrow_call=function<arrow::Result<std::__1::shared_ptr<arrow::Table> > ()> @ 0x000000016afceea0)>) at safe-call-into-r.h:314:12 [opt]
    frame #9: 0x000000010947a784 arrow.so`Table__from_ExecPlanReader(reader=<unavailable>) at compute-exec.cpp:195:17 [opt]
    
    (omitting higher up frames involved in the R code that calls this C++ function)


Thread 2

This one seems like a signal handler of some kind but doesn't have any Arrow (R or C++) in the backtrace. It doesn't seem suspicious.

Details
thread #2
    frame #0: 0x00000001a1cba3e0 libsystem_kernel.dylib`__semwait_signal + 8
    frame #1: 0x00000001a1bc45b0 libsystem_c.dylib`nanosleep + 220
    frame #2: 0x000000010764bf70 cli.so`clic_thread_func(arg=<unavailable>) at thread.c:37:5 [opt]
    frame #3: 0x00000001a1cf606c libsystem_pthread.dylib`_pthread_start + 148

Thread 3

Has some calls that I would expect to occur. Notably, RecordBatchReaderHead::ReadNext(), which in turn calls ExecPlanReader::ReadNext(). The top level of the call stack seems to indicate that there is a BackgroundGenerator involved and so I suspect this is getting consumed as input to an ExecPlan.

Details
bt
  thread #3
    frame #0: 0x00000001a1cba5e4 libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x00000001a1cf6638 libsystem_pthread.dylib`_pthread_cond_wait + 1232
    frame #2: 0x00000001a1c43ac4 libc++.1.dylib`std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 28
    frame #3: 0x000000028079288c libarrow.1100.0.0.dylib`void std::__1::condition_variable::wait<arrow::ConcreteFutureImpl::DoWait()::'lambda'()>(this=0x0000600000a9a908, __lk=0x000000016b3226e8, __pred=(0x0000600000a9a880))::'lambda'()) at __mutex_base:404:9
    frame #4: 0x0000000280790ef8 libarrow.1100.0.0.dylib`arrow::ConcreteFutureImpl::DoWait(this=0x0000600000a9a880) at future.cc:154:9
    frame #5: 0x0000000280790e84 libarrow.1100.0.0.dylib`arrow::FutureImpl::Wait(this=0x0000600000a9a880) at future.cc:189:52
    frame #6: 0x000000010948c478 arrow.so`ExecPlanReader::ReadNext(std::__1::shared_ptr<arrow::RecordBatch>*) [inlined] arrow::Future<std::__1::optional<arrow::compute::ExecBatch>>::Wait(this=0x000000016b322740) const at future.h:385:12 [opt]
    frame #7: 0x000000010948c470 arrow.so`ExecPlanReader::ReadNext(std::__1::shared_ptr<arrow::RecordBatch>*) [inlined] arrow::Future<std::__1::optional<arrow::compute::ExecBatch>>::result(this=0x000000016b322740) const & at future.h:356:5 [opt]
    frame #8: 0x000000010948c470 arrow.so`ExecPlanReader::ReadNext(this=0x00006000011bc518, batch_out=0x000000016b322880) at compute-exec.cpp:118:28 [opt]
    frame #9: 0x000000010952f5c4 arrow.so`RecordBatchReaderHead::ReadNext(this=0x0000600001cbcf18, batch_out=0x000000016b322880) at recordbatchreader.cpp:142:5 [opt]
    frame #10: 0x000000010cbb839c libarrow_dataset.1100.0.0.dylib`arrow::RecordBatchReader::Next(this=0x0000600001cbcf18) at record_batch.h:243:5
    frame #11: 0x000000010cdb47d8 libarrow_dataset.1100.0.0.dylib`arrow::Iterator<arrow::RecordBatchReader::ValueType> arrow::MakeIteratorFromReader<arrow::RecordBatchReader>(this=(std::__1::shared_ptr<arrow::RecordBatchReader>::element_type @ 0x0000600001cbcf18 strong=2 weak=1))::'lambda'()::operator()() const at iterator.h:565:57
    frame #12: 0x000000010cdb47a4 libarrow_dataset.1100.0.0.dylib`arrow::FunctionIterator<arrow::Iterator<arrow::RecordBatchReader::ValueType> arrow::MakeIteratorFromReader<arrow::RecordBatchReader>(std::__1::shared_ptr<arrow::RecordBatchReader> const&)::'lambda'(), std::__1::shared_ptr<arrow::RecordBatch>>::Next(this=0x0000600003da1e00) at iterator.h:346:29
    frame #13: 0x000000010cdb4744 libarrow_dataset.1100.0.0.dylib`arrow::Result<std::__1::shared_ptr<arrow::RecordBatch>> arrow::Iterator<std::__1::shared_ptr<arrow::RecordBatch>>::Next<arrow::FunctionIterator<arrow::Iterator<arrow::RecordBatchReader::ValueType> arrow::MakeIteratorFromReader<arrow::RecordBatchReader>(ptr=0x0000600003da1e00)::'lambda'(), std::__1::shared_ptr<arrow::RecordBatch>>>(void*) at iterator.h:200:40
    frame #14: 0x000000010cbdf628 libarrow_dataset.1100.0.0.dylib`arrow::Iterator<std::__1::shared_ptr<arrow::RecordBatch>>::Next(this=0x0000600003f9cde8) at iterator.h:110:29
    frame #15: 0x0000000280c819dc libarrow.1100.0.0.dylib`arrow::MapIterator<arrow::compute::MakeReaderGenerator(std::__1::shared_ptr<arrow::RecordBatchReader>, arrow::internal::Executor*, int, int)::$_5, std::__1::shared_ptr<arrow::RecordBatch>, std::__1::optional<arrow::compute::ExecBatch>>::Next(this=0x0000600003f9cde0) at iterator.h:428:5
    frame #16: 0x0000000280c81950 libarrow.1100.0.0.dylib`arrow::Result<std::__1::optional<arrow::compute::ExecBatch>> arrow::Iterator<std::__1::optional<arrow::compute::ExecBatch>>::Next<arrow::MapIterator<arrow::compute::MakeReaderGenerator(ptr=0x0000600003f9cde0)::$_5, std::__1::shared_ptr<arrow::RecordBatch>, std::__1::optional<arrow::compute::ExecBatch>>>(void*) at iterator.h:200:40
    frame #17: 0x0000000280c5e784 libarrow.1100.0.0.dylib`arrow::Iterator<std::__1::optional<arrow::compute::ExecBatch>>::Next(this=0x0000600000a9ed28) at iterator.h:110:29
    frame #18: 0x0000000280c8a448 libarrow.1100.0.0.dylib`arrow::BackgroundGenerator<std::__1::optional<arrow::compute::ExecBatch>>::WorkerTask(state=std::__1::shared_ptr<arrow::BackgroundGenerator<std::__1::optional<arrow::compute::ExecBatch> >::State>::element_type @ 0x0000600000a9ed18 strong=3 weak=1) at async_generator.h:1759:29
    frame #19: 0x0000000280c8a3a0 libarrow.1100.0.0.dylib`arrow::BackgroundGenerator<std::__1::optional<arrow::compute::ExecBatch>>::State::DoRestartTask(this=(std::__1::shared_ptr<arrow::BackgroundGenerator<std::__1::optional<arrow::compute::ExecBatch> >::State>::element_type @ 0x0000600000a9ed18 strong=3 weak=1))::'lambda'()::operator()() const at async_generator.h:1666:23
    frame #20: 0x0000000280c8a31c libarrow.1100.0.0.dylib`arrow::internal::FnOnce<void ()>::FnImpl<arrow::BackgroundGenerator<std::__1::optional<arrow::compute::ExecBatch>>::State::DoRestartTask(this=0x0000600003f9cd60)::'lambda'()>::invoke() at functional.h:152:42
    frame #21: 0x00000002808063f0 libarrow.1100.0.0.dylib`arrow::internal::FnOnce<void ()>::operator(this=0x000000016b322e38)() && at functional.h:140:17
    frame #22: 0x0000000280818ef0 libarrow.1100.0.0.dylib`arrow::internal::WorkerLoop(state=std::__1::shared_ptr<arrow::internal::ThreadPool::State>::element_type @ 0x00000001508b32e8 strong=3 weak=2, it=std::__1::list<std::__1::thread, std::__1::allocator<std::__1::thread> >::iterator @ 0x000000016b322ed8) at thread_pool.cc:262:11
    frame #23: 0x0000000280818bfc libarrow.1100.0.0.dylib`arrow::internal::ThreadPool::LaunchWorkersUnlocked(this=(0x00006000031b8420, std::__1::shared_ptr<arrow::internal::ThreadPool::State>::element_type @ 0x00000001508b32e8 strong=3 weak=2, {...}))::$_6::operator()() const at thread_pool.cc:423:7
    frame #24: 0x0000000280818b58 libarrow.1100.0.0.dylib`decltype(__f=(0x00006000031b8420, std::__1::shared_ptr<arrow::internal::ThreadPool::State>::element_type @ 0x00000001508b32e8 strong=3 weak=2, {...}))::$_6>(fp)()) std::__1::__invoke<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>(arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6&&) at type_traits:3918:1
    frame #25: 0x0000000280818af4 libarrow.1100.0.0.dylib`void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>(__t=size=2, (null)=__tuple_indices<> @ 0x000000016b322f7f)::$_6>&, std::__1::__tuple_indices<>) at thread:287:5

frame #26: 0x0000000280818290 libarrow.1100.0.0.dylib`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>>(__vp=0x0000600003185c50) at thread:298:5
    frame #27: 0x00000001a1cf606c libsystem_pthread.dylib`_pthread_start + 148

Thread 4

This is the signal stop source waiting for input from a signal handler? I don't know the details but I am guessing this stop source would always be waiting for input as long as there is a signal handler registered.

Details
bt
  thread #4
    frame #0: 0x00000001a1cb79ac libsystem_kernel.dylib`read + 8
    frame #1: 0x00000002807d2844 libarrow.1100.0.0.dylib`arrow::internal::(anonymous namespace)::SelfPipeImpl::Wait(this=0x0000600001cbcbf8) at io_util.cc:1268:24
    frame #2: 0x000000028076fc04 libarrow.1100.0.0.dylib`arrow::(anonymous namespace)::SignalStopState::ReceiveSignals(self_pipe=std::__1::shared_ptr<arrow::internal::SelfPipe>::element_type @ 0x0000600001cbcbf8 strong=2 weak=3) at cancel.cc:246:39
    frame #3: 0x0000000280770ba4 libarrow.1100.0.0.dylib`decltype(__f=0x0000600003f9c6a8, __args=nullptr)(static_cast<std::__1::shared_ptr<arrow::internal::SelfPipe>>(fp0))) std::__1::__invoke<void (*)(std::__1::shared_ptr<arrow::internal::SelfPipe>), std::__1::shared_ptr<arrow::internal::SelfPipe>>(void (*&&)(std::__1::shared_ptr<arrow::internal::SelfPipe>), std::__1::shared_ptr<arrow::internal::SelfPipe>&&) at type_traits:3918:1
    frame #4: 0x0000000280770b1c libarrow.1100.0.0.dylib`void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (*)(std::__1::shared_ptr<arrow::internal::SelfPipe>), std::__1::shared_ptr<arrow::internal::SelfPipe>, 2ul>(__t=size=3, (null)=__tuple_indices<2> @ 0x000000016b552f7f)(std::__1::shared_ptr<arrow::internal::SelfPipe>), std::__1::shared_ptr<arrow::internal::SelfPipe>>&, std::__1::__tuple_indices<2ul>) at thread:287:5
    frame #5: 0x00000002807701bc libarrow.1100.0.0.dylib`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (*)(std::__1::shared_ptr<arrow::internal::SelfPipe>), std::__1::shared_ptr<arrow::internal::SelfPipe>>>(__vp=0x0000600003f9c6a0) at thread:298:5
    frame #6: 0x00000001a1cf606c libsystem_pthread.dylib`_pthread_start + 148

Thread 5

A thread pool worker loop. This seems like it's looping and sleeping which seems fine?

Details
bt
  thread #5
    frame #0: 0x00000001a1cba5e4 libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x00000001a1cf6638 libsystem_pthread.dylib`_pthread_cond_wait + 1232
    frame #2: 0x00000001a1c43ac4 libc++.1.dylib`std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 28
    frame #3: 0x0000000280819040 libarrow.1100.0.0.dylib`arrow::internal::WorkerLoop(state=std::__1::shared_ptr<arrow::internal::ThreadPool::State>::element_type @ 0x000000012f694238 strong=7 weak=2, it=std::__1::list<std::__1::thread, std::__1::allocator<std::__1::thread> >::iterator @ 0x000000016b5deed8) at thread_pool.cc:280:16
    frame #4: 0x0000000280818bfc libarrow.1100.0.0.dylib`arrow::internal::ThreadPool::LaunchWorkersUnlocked(this=(0x00006000031bc3c0, std::__1::shared_ptr<arrow::internal::ThreadPool::State>::element_type @ 0x000000012f694238 strong=7 weak=2, {...}))::$_6::operator()() const at thread_pool.cc:423:7
    frame #5: 0x0000000280818b58 libarrow.1100.0.0.dylib`decltype(__f=(0x00006000031bc3c0, std::__1::shared_ptr<arrow::internal::ThreadPool::State>::element_type @ 0x000000012f694238 strong=7 weak=2, {...}))::$_6>(fp)()) std::__1::__invoke<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>(arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6&&) at type_traits:3918:1
    frame #6: 0x0000000280818af4 libarrow.1100.0.0.dylib`void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>(__t=size=2, (null)=__tuple_indices<> @ 0x000000016b5def7f)::$_6>&, std::__1::__tuple_indices<>) at thread:287:5
    frame #7: 0x0000000280818290 libarrow.1100.0.0.dylib`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>>(__vp=0x00006000031bc510) at thread:298:5

frame #8: 0x00000001a1cf606c libsystem_pthread.dylib`_pthread_start + 148

Thread 6

This is the call to RecordBatchReader::ToTable()...probably the one that was launched by Thread #1. This seems to be reading an ExecPlanReader, which is the dummy ExecPlan that does nothing. The ExecPlanReader that it's reading seems like it's the ExecPlan that's doing the real work (reading the dataset plus filtering).

Details
bt
  thread #6
    frame #0: 0x00000001a1cba5e4 libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x00000001a1cf6638 libsystem_pthread.dylib`_pthread_cond_wait + 1232
    frame #2: 0x00000001a1c43ac4 libc++.1.dylib`std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 28
    frame #3: 0x000000028079288c libarrow.1100.0.0.dylib`void std::__1::condition_variable::wait<arrow::ConcreteFutureImpl::DoWait()::'lambda'()>(this=0x0000600000a88a48, __lk=0x000000016b66aa28, __pred=(0x0000600000a889c0))::'lambda'()) at __mutex_base:404:9
    frame #4: 0x0000000280790ef8 libarrow.1100.0.0.dylib`arrow::ConcreteFutureImpl::DoWait(this=0x0000600000a889c0) at future.cc:154:9
    frame #5: 0x0000000280790e84 libarrow.1100.0.0.dylib`arrow::FutureImpl::Wait(this=0x0000600000a889c0) at future.cc:189:52
    frame #6: 0x000000010948c478 arrow.so`ExecPlanReader::ReadNext(std::__1::shared_ptr<arrow::RecordBatch>*) [inlined] arrow::Future<std::__1::optional<arrow::compute::ExecBatch>>::Wait(this=0x000000016b66aa80) const at future.h:385:12 [opt]
    frame #7: 0x000000010948c470 arrow.so`ExecPlanReader::ReadNext(std::__1::shared_ptr<arrow::RecordBatch>*) [inlined] arrow::Future<std::__1::optional<arrow::compute::ExecBatch>>::result(this=0x000000016b66aa80) const & at future.h:356:5 [opt]
    frame #8: 0x000000010948c470 arrow.so`ExecPlanReader::ReadNext(this=0x00006000011bc618, batch_out=0x000000016b66ab88) at compute-exec.cpp:118:28 [opt]
    frame #9: 0x00000002804b56c4 libarrow.1100.0.0.dylib`arrow::RecordBatchReader::ToRecordBatches(this=0x00006000011bc618) at record_batch.cc:334:5
    frame #10: 0x00000002804b5bcc libarrow.1100.0.0.dylib`arrow::RecordBatchReader::ToTable(this=0x00006000011bc618) at record_batch.cc:348:3
    frame #11: 0x000000010948b750 arrow.so`arrow::internal::FnOnce<void ()>::FnImpl<std::__1::__bind<arrow::detail::ContinueFuture, arrow::Future<std::__1::shared_ptr<arrow::Table>>&, std::__1::function<arrow::Result<std::__1::shared_ptr<arrow::Table>> ()>>>::invoke() [inlined] std::__1::__function::__value_func<arrow::Result<std::__1::shared_ptr<arrow::Table>> ()>::operator(this=<unavailable>)() const at function.h:505:16 [opt]
    frame #12: 0x000000010948b738 arrow.so`arrow::internal::FnOnce<void ()>::FnImpl<std::__1::__bind<arrow::detail::ContinueFuture, arrow::Future<std::__1::shared_ptr<arrow::Table>>&, std::__1::function<arrow::Result<std::__1::shared_ptr<arrow::Table>> ()>>>::invoke() [inlined] std::__1::function<arrow::Result<std::__1::shared_ptr<arrow::Table>> ()>::operator(this=<unavailable>)() const at function.h:1182:12 [opt]
    frame #13: 0x000000010948b738 arrow.so`arrow::internal::FnOnce<void ()>::FnImpl<std::__1::__bind<arrow::detail::ContinueFuture, arrow::Future<std::__1::shared_ptr<arrow::Table>>&, std::__1::function<arrow::Result<std::__1::shared_ptr<arrow::Table>> ()>>>::invoke() [inlined] std::__1::enable_if<!std::is_void<arrow::Result<std::__1::shared_ptr<arrow::Table>>>::value && !is_future<arrow::Result<std::__1::shared_ptr<arrow::Table>>>::value && (!arrow::Future<std::__1::shared_ptr<arrow::Table>>::is_empty || std::is_same<arrow::Result<std::__1::shared_ptr<arrow::Table>>, arrow::Status>::value), void>::type arrow::detail::ContinueFuture::operator(this=<unavailable>, next=Future<std::__1::shared_ptr<arrow::Table> > @ 0x000000016b66aca0, f=<unavailable>)<std::__1::function<arrow::Result<std::__1::shared_ptr<arrow::Table>> ()>&, arrow::Result<std::__1::shared_ptr<arrow::Table>>, arrow::Future<std::__1::shared_ptr<arrow::Table>>>(arrow::Future<std::__1::shared_ptr<arrow::Table>>, std::__1::function<arrow::Result<std::__1::shared_ptr<arrow::Table>> ()>&) const at future.h:150:23 [opt]
    frame #14: 0x000000010948b738 arrow.so`arrow::internal::FnOnce<void ()>::FnImpl<std::__1::__bind<arrow::detail::ContinueFuture, arrow::Future<std::__1::shared_ptr<arrow::Table>>&, std::__1::function<arrow::Result<std::__1::shared_ptr<arrow::Table>> ()>>>::invoke() [inlined] decltype(__f=<unavailable>, __args=<unavailable>, __args=<unavailable>)(static_cast<arrow::Future<std::__1::shared_ptr<arrow::Table>>&>(fp0), static_cast<std::__1::function<arrow::Result<std::__1::shared_ptr<arrow::Table>> ()>&>(fp0))) std::__1::__invoke<arrow::detail::ContinueFuture&, arrow::Future<std::__1::shared_ptr<arrow::Table>>&, std::__1::function<arrow::Result<std::__1::shared_ptr<arrow::Table>> ()>&>(arrow::detail::ContinueFuture&, arrow::Future<std::__1::shared_ptr<arrow::Table>>&, std::__1::function<arrow::Result<std::__1::shared_ptr<arrow::Table>> ()>&) at type_traits:3918:1 [opt]
    frame #15: 0x000000010948b720 arrow.so`arrow::internal::FnOnce<void ()>::FnImpl<std::__1::__bind<arrow::detail::ContinueFuture, arrow::Future<std::__1::shared_ptr<arrow::Table>>&, std::__1::function<arrow::Result<std::__1::shared_ptr<arrow::Table>> ()>>>::invoke() [inlined] std::__1::__bind_return<arrow::detail::ContinueFuture, std::__1::tuple<arrow::Future<std::__1::shared_ptr<arrow::Table>>, std::__1::function<arrow::Result<std::__1::shared_ptr<arrow::Table>> ()>>, std::__1::tuple<>, __is_valid_bind_return<arrow::detail::ContinueFuture, std::__1::tuple<arrow::Future<std::__1::shared_ptr<arrow::Table>>, std::__1::function<arrow::Result<std::__1::shared_ptr<arrow::Table>> ()>>, std::__1::tuple<>>::value>::type std::__1::__apply_functor<arrow::detail::ContinueFuture, std::__1::tuple<arrow::Future<std::__1::shared_ptr<arrow::Table>>, std::__1::function<arrow::Result<std::__1::shared_ptr<arrow::Table>> (__f=<unavailable>, __bound_args=<unavailable>, (null)=<unavailable>, __args=<unavailable>)>>, 0ul, 1ul, std::__1::tuple<>>(arrow::detail::ContinueFuture&, std::__1::tuple<arrow::Future<std::__1::shared_ptr<arrow::Table>>, std::__1::function<arrow::Result<std::__1::shared_ptr<arrow::Table>> ()>>&, std::__1::__tuple_indices<0ul, 1ul>, std::__1::tuple<>&&) at bind.h:257:12 [opt]
    frame #16: 0x000000010948b720 arrow.so`arrow::internal::FnOnce<void ()>::FnImpl<std::__1::__bind<arrow::detail::ContinueFuture, arrow::Future<std::__1::shared_ptr<arrow::Table>>&, std::__1::function<arrow::Result<std::__1::shared_ptr<arrow::Table>> ()>>>::invoke() [inlined] std::__1::__bind_return<arrow::detail::ContinueFuture, std::__1::tuple<arrow::Future<std::__1::shared_ptr<arrow::Table>>, std::__1::function<arrow::Result<std::__1::shared_ptr<arrow::Table>> ()>>, std::__1::tuple<>, __is_valid_bind_return<arrow::detail::ContinueFuture, std::__1::tuple<arrow::Future<std::__1::shared_ptr<arrow::Table>>, std::__1::function<arrow::Result<std::__1::shared_ptr<arrow::Table>> ()>>, std::__1::tuple<>>::value>::type std::__1::__bind<arrow::detail::ContinueFuture, arrow::Future<std::__1::shared_ptr<arrow::Table>>&, std::__1::function<arrow::Result<std::__1::shared_ptr<arrow::Table>> (this=<unavailable>)>>::operator()<>() at bind.h:292:20 [opt]
    frame #17: 0x000000010948b720 arrow.so`arrow::internal::FnOnce<void ()>::FnImpl<std::__1::__bind<arrow::detail::ContinueFuture, arrow::Future<std::__1::shared_ptr<arrow::Table>>&, std::__1::function<arrow::Result<std::__1::shared_ptr<arrow::Table>> (this=<unavailable>)>>>::invoke() at functional.h:152:42 [opt]
    frame #18: 0x00000002808063f0 libarrow.1100.0.0.dylib`arrow::internal::FnOnce<void ()>::operator(this=0x000000016b66ae38)() && at functional.h:140:17
    frame #19: 0x0000000280818ef0 libarrow.1100.0.0.dylib`arrow::internal::WorkerLoop(state=std::__1::shared_ptr<arrow::internal::ThreadPool::State>::element_type @ 0x000000012f694238 strong=7 weak=2, it=std::__1::list<std::__1::thread, std::__1::allocator<std::__1::thread> >::iterator @ 0x000000016b66aed8) at thread_pool.cc:262:11
    frame #20: 0x0000000280818bfc libarrow.1100.0.0.dylib`arrow::internal::ThreadPool::LaunchWorkersUnlocked(this=(0x00006000031bc3c0, std::__1::shared_ptr<arrow::internal::ThreadPool::State>::element_type @ 0x000000012f694238 strong=7 weak=2, {...}))::$_6::operator()() const at thread_pool.cc:423:7
    frame #21: 0x0000000280818b58 libarrow.1100.0.0.dylib`decltype(__f=(0x00006000031bc3c0, std::__1::shared_ptr<arrow::internal::ThreadPool::State>::element_type @ 0x000000012f694238 strong=7 weak=2, {...}))::$_6>(fp)()) std::__1::__invoke<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>(arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6&&) at type_traits:3918:1
    frame #22: 0x0000000280818af4 libarrow.1100.0.0.dylib`void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>(__t=size=2, (null)=__tuple_indices<> @ 0x000000016b66af7f)::$_6>&, std::__1::__tuple_indices<>) at thread:287:5
    frame #23: 0x0000000280818290 libarrow.1100.0.0.dylib`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>>(__vp=0x0000600003186040) at thread:298:5
    frame #24: 0x00000001a1cf606c libsystem_pthread.dylib`_pthread_start + 148


Thread 7

Another thread pool worker loop. Also looping and sleeping.

Details
bt
  thread #7
    frame #0: 0x00000001a1cba5e4 libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x00000001a1cf6638 libsystem_pthread.dylib`_pthread_cond_wait + 1232
    frame #2: 0x00000001a1c43ac4 libc++.1.dylib`std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 28
    frame #3: 0x0000000280819040 libarrow.1100.0.0.dylib`arrow::internal::WorkerLoop(state=std::__1::shared_ptr<arrow::internal::ThreadPool::State>::element_type @ 0x000000012f694238 strong=7 weak=2, it=std::__1::list<std::__1::thread, std::__1::allocator<std::__1::thread> >::iterator @ 0x000000016b6f6ed8) at thread_pool.cc:280:16
    frame #4: 0x0000000280818bfc libarrow.1100.0.0.dylib`arrow::internal::ThreadPool::LaunchWorkersUnlocked(this=(0x00006000031bc3c0, std::__1::shared_ptr<arrow::internal::ThreadPool::State>::element_type @ 0x000000012f694238 strong=7 weak=2, {...}))::$_6::operator()() const at thread_pool.cc:423:7
    frame #5: 0x0000000280818b58 libarrow.1100.0.0.dylib`decltype(__f=(0x00006000031bc3c0, std::__1::shared_ptr<arrow::internal::ThreadPool::State>::element_type @ 0x000000012f694238 strong=7 weak=2, {...}))::$_6>(fp)()) std::__1::__invoke<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>(arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6&&) at type_traits:3918:1
    frame #6: 0x0000000280818af4 libarrow.1100.0.0.dylib`void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>(__t=size=2, (null)=__tuple_indices<> @ 0x000000016b6f6f7f)::$_6>&, std::__1::__tuple_indices<>) at thread:287:5
    frame #7: 0x0000000280818290 libarrow.1100.0.0.dylib`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>>(__vp=0x00006000031bc570) at thread:298:5

frame #8: 0x00000001a1cf606c libsystem_pthread.dylib`_pthread_start + 148

@paleolimbot
Copy link
Member Author

@github-actions crossbow submit homebrew-r-autobrew homebrew-r-brew test-r-linux-valgrind

@github-actions
Copy link

Revision: 1f8dcb5e3867db85a7d7ec25b29a367f4c386bf6

Submitted crossbow builds: ursacomputing/crossbow @ actions-02c0f76e1f

Task Status
homebrew-r-autobrew Github Actions
homebrew-r-brew Github Actions
test-r-linux-valgrind Azure

@paleolimbot
Copy link
Member Author

Those nightly test builds are still failing; however, they are no longer failing because of a crash when head() is called. Horray!

Comment on lines 33 to 40
// Can't find the header for this?
namespace arrow {
namespace io {
namespace internal {
arrow::internal::ThreadPool* GetIOThreadPool();
}
} // namespace io
} // namespace arrow
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitrou this doesn't seem to be in any public header; however, its use is required here for something that isn't R-specific (creating a source node from a record batch reader). Should this be added to arrow/io/type_fwd.h instead of declared here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made #15151 for this!

@paleolimbot
Copy link
Member Author

I'm going to merge this because it's blocking a few PRs and I want to get our MacOS jobs passing again in case there have been other failures obscured by this one/the pyarrow one during the many months that it's been failing!

@paleolimbot paleolimbot merged commit b1a48c7 into apache:master Jan 2, 2023
@paleolimbot paleolimbot deleted the r-more-head-fixes branch January 2, 2023 16:57
@assignUser
Copy link
Member

+1

@ursabot
Copy link

ursabot commented Jan 3, 2023

Benchmark runs are scheduled for baseline = 1aa8f35 and contender = b1a48c7. b1a48c7 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️1.11% ⬆️2.78%] test-mac-arm
[Finished ⬇️0.26% ⬆️0.51%] ursa-i9-9960x
[Failed ⬇️5.71% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] b1a48c78 ec2-t3-xlarge-us-east-2
[Failed] b1a48c78 test-mac-arm
[Finished] b1a48c78 ursa-i9-9960x
[Failed] b1a48c78 ursa-thinkcentre-m75q
[Finished] 1aa8f355 ec2-t3-xlarge-us-east-2
[Failed] 1aa8f355 test-mac-arm
[Finished] 1aa8f355 ursa-i9-9960x
[Failed] 1aa8f355 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants