Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions r/NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,17 +29,19 @@

## Enhancements

* Table columns can now be added, replaced, or removed by assigning `<-` with either `$` or `[[`
* Table columns can now be added, replaced, or removed by assigning (`<-`) with either `$` or `[[`
* Column names of Tables and RecordBatches can be renamed by assigning `names()`
* Large string types can now be written to Parquet files
* The [pronouns `.data` and `.env`](https://rlang.r-lib.org/reference/tidyeval-data.html) are now fully supported in Arrow-dplyr pipelines.
* The [pronouns `.data` and `.env`](https://rlang.r-lib.org/reference/tidyeval-data.html) are now fully supported in Arrow `dplyr` pipelines.
* Option `arrow.skip_nul` (default `FALSE`, as in `base::scan()`) allows conversion of Arrow string (`utf8()`) type data containing embedded nul `\0` characters to R. If set to `TRUE`, nuls will be stripped and a warning is emitted if any are found.

## Bug fixes

* Fixed a performance regression in converting Arrow string types to R that was present in the 2.0.0 release
* C++ functions now trigger garbage collection when needed
* `write_parquet()` can now write RecordBatches
* Reading a Table from a RecordBatchStreamReader containing 0 batches no longer crashes
* `readr`'s `problems` attribute is removed when converting to Arrow RecordBatch and table to prevent large amounts of metadata from accumulating inadvertently [ARROW-10624](https://issues.apache.org/jira/browse/ARROW-10624)

## Packaging and installation

Expand Down
5 changes: 5 additions & 0 deletions r/R/record-batch.R
Original file line number Diff line number Diff line change
Expand Up @@ -274,6 +274,11 @@ as.data.frame.RecordBatch <- function(x, row.names = NULL, optional = FALSE, ...
}

.serialize_arrow_r_metadata <- function(x) {
assert_is(x, "list")

# drop problems attributes (most likely from readr)
x[["attributes"]][["problems"]] <- NULL

rawToChar(serialize(x, NULL, ascii = TRUE))
}

Expand Down
24 changes: 23 additions & 1 deletion r/tests/testthat/test-metadata.R
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,9 @@ test_that("Garbage R metadata doesn't break things", {
"Invalid metadata$r",
fixed = TRUE
)
tab$metadata$r <- .serialize_arrow_r_metadata("garbage")
# serialize data like .serialize_arrow_r_metadata does, but don't call that
# directly since it checks to ensure that the data is a list
tab$metadata$r <- rawToChar(serialize("garbage", NULL, ascii = TRUE))
expect_warning(
expect_identical(as.data.frame(tab), example_data[1:6]),
"Invalid metadata$r",
Expand Down Expand Up @@ -134,3 +136,23 @@ test_that("metadata keeps attribute of top level data frame", {
expect_identical(attr(as.data.frame(tab), "foo"), "bar")
expect_identical(as.data.frame(tab), df)
})

test_that("metadata drops readr's problems attribute", {
readr_like <- tibble::tibble(
dbl = 1.1,
not_here = NA_character_
)
attributes(readr_like) <- append(
attributes(readr_like),
list(problems = tibble::tibble(
row = 1L,
col = NA_character_,
expected = "2 columns",
actual = "1 columns",
file = "'test'"
))
)

tab <- Table$create(readr_like)
expect_null(attr(as.data.frame(tab), "problems"))
})