Skip to content

Conversation

@nealrichardson
Copy link
Member

Also has a fix for the check NOTE about union_all and distinct.

@github-actions
Copy link

github-actions bot commented Jul 8, 2022

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

There seems to be a CI failure (crash on 32-bit windows only on RTools 40?) but I don't see anything in this PR that could have introduced it.

This also might be a good opportunity to run devtools::document() with the latest roxygen2 since you're touching DESCRIPTION + a bunch of .Rd files.

@nealrichardson
Copy link
Member Author

Windows CI crash looks like it may be related to #13521 cc @wesm

@wesm
Copy link
Member

wesm commented Jul 8, 2022

Sorry about that -- I see the error in #13521 but I misread the error that it looked like a flake

https://github.com/apache/arrow/runs/7243906864?check_suite_focus=true

What's the best way to diagnose so I can try to fix?

2022-07-08T15:24:46.8810386Z -- R CMD check results ----------------------------------- arrow 8.0.0.9000 ----
2022-07-08T15:24:46.8831977Z Duration: 13m 46.7s
2022-07-08T15:24:46.8832256Z 
2022-07-08T15:24:46.8844886Z > running examples for arch 'i386' ... ERROR
2022-07-08T15:24:46.8845746Z ##[error]  Running examples in 'arrow-Ex.R' failed
2022-07-08T15:24:46.8846746Z   The error most likely occurred in:
2022-07-08T15:24:46.8847125Z   
2022-07-08T15:24:46.8847625Z   > base::assign(".ptime", proc.time(), pos = "CheckExEnv")
2022-07-08T15:24:46.8848050Z   > ### Name: to_arrow
2022-07-08T15:24:46.8848826Z   > ### Title: Create an Arrow object from others
2022-07-08T15:24:46.8849266Z   > ### Aliases: to_arrow
2022-07-08T15:24:46.8849593Z   > 
2022-07-08T15:24:46.8849939Z   > ### ** Examples
2022-07-08T15:24:46.8850499Z   > 
2022-07-08T15:24:46.8850822Z   > ## Don't show: 
2022-07-08T15:24:46.8851492Z   > if (getFromNamespace("run_duckdb_examples", "arrow")()) (if (getRversion() >= "3.4") withAutoprint else force)({ # examplesIf
2022-07-08T15:24:46.8852006Z   + ## End(Don't show)
2022-07-08T15:24:46.8852378Z   + library(dplyr)
2022-07-08T15:24:46.8852697Z   + 
2022-07-08T15:24:46.8853111Z   + ds <- InMemoryDataset$create(mtcars)
2022-07-08T15:24:46.8853471Z   + 
2022-07-08T15:24:46.8853775Z   + ds %>%
2022-07-08T15:24:46.8854245Z   +   filter(mpg < 30) %>%
2022-07-08T15:24:46.8855417Z   +   to_duckdb() %>%
2022-07-08T15:24:46.8855793Z   +   group_by(cyl) %>%
2022-07-08T15:24:46.8856257Z   +   summarize(mean_mpg = mean(mpg, na.rm = TRUE)) %>%
2022-07-08T15:24:46.8856665Z   +   to_arrow() %>%
2022-07-08T15:24:46.8857014Z   +   collect()
2022-07-08T15:24:46.8857490Z   + ## Don't show: 
2022-07-08T15:24:46.8857874Z   + }) # examplesIf
2022-07-08T15:24:46.8858223Z   > library(dplyr)
2022-07-08T15:24:46.8858540Z   
2022-07-08T15:24:46.8858911Z   Attaching package: 'dplyr'
2022-07-08T15:24:46.8859246Z   
2022-07-08T15:24:46.8859673Z   The following objects are masked from 'package:stats':
2022-07-08T15:24:46.8860063Z   
2022-07-08T15:24:46.8860414Z       filter, lag
2022-07-08T15:24:46.8860726Z   
2022-07-08T15:24:46.8861172Z   The following objects are masked from 'package:base':
2022-07-08T15:24:46.8861529Z   
2022-07-08T15:24:46.8861922Z       intersect, setdiff, setequal, union
2022-07-08T15:24:46.8862255Z   
2022-07-08T15:24:46.8862624Z   > ds <- InMemoryDataset$create(mtcars)
2022-07-08T15:24:46.8863210Z   > ds %>% filter(mpg < 30) %>% to_duckdb() %>% group_by(cyl) %>% summarize(mean_mpg = mean(mpg, 
2022-07-08T15:24:46.8863726Z   +     na.rm = TRUE)) %>% to_arrow() %>% collect()

@nealrichardson
Copy link
Member Author

I looked to the end of the verbose test output, where you see test-chunked-array running and then it dies after the assertion on L260: https://github.com/apache/arrow/runs/7243906864?check_suite_focus=true#step:11:6360

So I checked out what's happening in the test. By my reading of the output, it does not always occur, but when it does, it happens when you Filter an empty ChunkedArray with another (boolean) empty ChunkedArray: https://github.com/apache/arrow/blob/master/r/tests/testthat/test-chunked-array.R#L265

@wesm
Copy link
Member

wesm commented Jul 8, 2022

Ok I think that gives me enough to go on I’ll try to fix.

@paleolimbot
Copy link
Member

I have another PR crashing at this line on Windows:

expect_as_vector(a[rep(c(TRUE, FALSE), 5)], vec[c(1, 3, 5, 7, 9)])

(seems also to be subsetting using a boolean)

@nealrichardson nealrichardson merged commit a48c09e into apache:master Jul 8, 2022
@nealrichardson nealrichardson deleted the remove-deprecated branch July 8, 2022 17:27
@wesm
Copy link
Member

wesm commented Jul 8, 2022

I've spent about an hour tinkering on this and I feel somewhat powerless to debug the problem. Are there instructions about how to debug the mingw32 RTools 4.0 C++ build on Windows? We don't run the C++ unit tests when building the Arrow libraries, and my guess is that running arrow-compute-vector-test in a debug build will reveal the issue (and with mingw32 we can use gdb to find out where the issue is coming from)? If there's someone better equipped to help identify the issue I would really appreciate it.

@paleolimbot
Copy link
Member

I hate to volunteer @wjones127 for what is probably a hard and possibly time-consuming debugging problem, but I personally consider him the master of debugging R packages on Windows.

@wesm
Copy link
Member

wesm commented Jul 8, 2022

I don't have access to a Windows VM right now otherwise I would try to do it myself, but it should be sufficient to build the C++ library locally with the mingw32 RTools toolchain and run the unit test suite -- building the R package may not be needed

@wjones127
Copy link
Member

I haven't yet built the 32-bit version, but I will look into that now. My instructions for 64-bit are here FWIW: https://www.datawill.io/2022/04/02/windows-apache-arrow-development-environment-with-rtools-4-0/

@wesm
Copy link
Member

wesm commented Jul 10, 2022

Just confirming here also that 88b42ef fixed the issue

kou pushed a commit that referenced this pull request Feb 20, 2023
…Hub issue numbers (#34260)

Rewrite the Jira issue numbers to the GitHub issue numbers, so that the GitHub issue numbers are automatically linked to the issues by pkgdown's auto-linking feature.

Issue numbers have been rewritten based on the following correspondence.
Also, the pkgdown settings have been changed and updated to link to GitHub.

I generated the Changelog page using the `pkgdown::build_news()` function and verified that the links work correctly.

---
ARROW-6338	#5198
ARROW-6364	#5201
ARROW-6323	#5169
ARROW-6278	#5141
ARROW-6360	#5329
ARROW-6533	#5450
ARROW-6348	#5223
ARROW-6337	#5399
ARROW-10850	#9128
ARROW-10624	#9092
ARROW-10386	#8549
ARROW-6994	#23308
ARROW-12774	#10320
ARROW-12670	#10287
ARROW-16828	#13484
ARROW-14989	#13482
ARROW-16977	#13514
ARROW-13404	#10999
ARROW-16887	#13601
ARROW-15906	#13206
ARROW-15280	#13171
ARROW-16144	#13183
ARROW-16511	#13105
ARROW-16085	#13088
ARROW-16715	#13555
ARROW-16268	#13550
ARROW-16700	#13518
ARROW-16807	#13583
ARROW-16871	#13517
ARROW-16415	#13190
ARROW-14821	#12154
ARROW-16439	#13174
ARROW-16394	#13118
ARROW-16516	#13163
ARROW-16395	#13627
ARROW-14848	#12589
ARROW-16407	#13196
ARROW-16653	#13506
ARROW-14575	#13160
ARROW-15271	#13170
ARROW-16703	#13650
ARROW-16444	#13397
ARROW-15016	#13541
ARROW-16776	#13563
ARROW-15622	#13090
ARROW-18131	#14484
ARROW-18305	#14581
ARROW-18285	#14615
* Closes: #33631

Authored-by: SHIMA Tatsuya <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants