-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-18240: [R] head() is crashing on some nightly builds #14582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
|
@github-actions crossbow submit test-r-linux-valgrind homebrew-r-autobrew homebrew-r-brew |
|
Revision: 3972e22c3ae343d6ed9528d8c9501b7c12adad9a Submitted crossbow builds: ursacomputing/crossbow @ actions-bd233fcb8e
|
3972e22 to
519059b
Compare
519059b to
8d4f884
Compare
|
@github-actions crossbow submit homebrew-r-autobrew homebrew-r-brew |
|
Revision: 8d4f884f2bcc2d2ea86dee74fcf9b60e06a4b406 Submitted crossbow builds: ursacomputing/crossbow @ actions-e12a4af9f0
|
|
@github-actions crossbow submit homebrew-r-autobrew homebrew-r-brew |
|
Revision: 8d4f884f2bcc2d2ea86dee74fcf9b60e06a4b406 Submitted crossbow builds: ursacomputing/crossbow @ actions-129ee54503
|
|
I updated the JIRA too, but the problem here is I think that the MacOS runners have 1 CPU. I can reproduce on Linux and Mac by doing library(arrow, warn.conflicts = FALSE)
dataset_dir <- tempfile()
write_dataset(mtcars, dataset_dir, )
# change to set_cpu_count(1) to reproduce the deadlock
set_cpu_count(2)
open_dataset(dataset_dir) |>
dplyr::filter(mpg > 30) |>
head() |>
dplyr::collect() |
|
The GitHub Mac runners are supposed to have 3 cores https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources Still good to be able to reproduce the issue :D |
|
Oh no! Maybe I reproduced a different issue 😬 (but either way it seems like fixing the deadlock will force some design change that may help!) |
8d4f884 to
f915bdf
Compare
|
A more detailed investigation of the threads present when the hang occurs. I'm guessing what's happening is that the CPU thread pool needs to create a thread somewhere but can't because there aren't enough of them. Where that happens isn't obvious in the thread backtraces, however. The R code. Run this from a terminal (not from RStudio). If you want to launch R using a debugger you can run library(arrow, warn.conflicts = FALSE)
# Information you need to attach LLDB
Sys.getpid()
dataset_dir <- tempfile()
write_dataset(mtcars, dataset_dir)
# change to set_cpu_count(1) to reproduce the deadlock
set_cpu_count(1)
open_dataset(dataset_dir) |>
dplyr::filter(mpg > 30) |>
head() |>
dplyr::collect()Conceptually I expect this to create two exec plans:
The final Thread 1The main R thread, waiting for SafeCallIntoR jobs to be submitted. This seems normal. DetailsThread 2This one seems like a signal handler of some kind but doesn't have any Arrow (R or C++) in the backtrace. It doesn't seem suspicious. DetailsThread 3Has some calls that I would expect to occur. Notably, DetailsThread 4This is the signal stop source waiting for input from a signal handler? I don't know the details but I am guessing this stop source would always be waiting for input as long as there is a signal handler registered. DetailsThread 5A thread pool worker loop. This seems like it's looping and sleeping which seems fine? DetailsThread 6This is the call to DetailsThread 7Another thread pool worker loop. Also looping and sleeping. Details |
|
@github-actions crossbow submit homebrew-r-autobrew homebrew-r-brew test-r-linux-valgrind |
|
Revision: 1f8dcb5e3867db85a7d7ec25b29a367f4c386bf6 Submitted crossbow builds: ursacomputing/crossbow @ actions-02c0f76e1f
|
|
Those nightly test builds are still failing; however, they are no longer failing because of a crash when |
1f8dcb5 to
cb2031d
Compare
r/src/compute-exec.cpp
Outdated
| // Can't find the header for this? | ||
| namespace arrow { | ||
| namespace io { | ||
| namespace internal { | ||
| arrow::internal::ThreadPool* GetIOThreadPool(); | ||
| } | ||
| } // namespace io | ||
| } // namespace arrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pitrou this doesn't seem to be in any public header; however, its use is required here for something that isn't R-specific (creating a source node from a record batch reader). Should this be added to arrow/io/type_fwd.h instead of declared here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made #15151 for this!
|
I'm going to merge this because it's blocking a few PRs and I want to get our MacOS jobs passing again in case there have been other failures obscured by this one/the pyarrow one during the many months that it's been failing! |
|
+1 |
|
Benchmark runs are scheduled for baseline = 1aa8f35 and contender = b1a48c7. b1a48c7 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
) Trying to address these two failures: - https://github.com/ursacomputing/crossbow/actions/runs/3368026889/jobs/5586109693#step:10:3813 - https://github.com/ursacomputing/crossbow/actions/runs/3368024633/jobs/5586105172#step:9:3813 Authored-by: Dewey Dunnington <[email protected]> Signed-off-by: Dewey Dunnington <[email protected]>
Trying to address these two failures: