Skip to content

Conversation

@jonkeane
Copy link
Member

@jonkeane jonkeane commented Jan 6, 2021

No description provided.

@jonkeane
Copy link
Member Author

jonkeane commented Jan 6, 2021

@github-actions crossbow submit test-r-version-compatibility

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve this. is this for testing that we can read files written by older versions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is that a problem? testthat evals in the package namespace so you can call unexported functions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workaround is for reading files written by 2.0 in 1.0, right? IIUC that should be tested in the extra-tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this file isn't what it needs to be for the golden tests we were thinking about — that's my next step. This is basically as it was when I was using it for forward compatibility

@jonkeane
Copy link
Member Author

jonkeane commented Jan 7, 2021

@github-actions crossbow submit test-r-version-compatibility

1 similar comment
@jonkeane
Copy link
Member Author

jonkeane commented Jan 7, 2021

@github-actions crossbow submit test-r-version-compatibility

@jonkeane jonkeane changed the title [WIP] Arrow 10623: [R] Version 1.0.1 breaks data.frame attributes when reading file written by 2.0.0 [WIP] Arrow 10623: [CI][R] Version 1.0.1 breaks data.frame attributes when reading file written by 2.0.0 Jan 7, 2021
@jonkeane
Copy link
Member Author

jonkeane commented Jan 7, 2021

@github-actions crossbow submit test-r-version-compatibility

@github-actions
Copy link

github-actions bot commented Jan 7, 2021

Revision: 1c58ce2e42fe9b1bf3ace430ca965b52fa079d09

Submitted crossbow builds: ursa-labs/crossbow @ actions-835

Task Status
test-r-version-compatibility Github Actions

@jonkeane
Copy link
Member Author

jonkeane commented Jan 7, 2021

@github-actions crossbow submit test-r-version-compatibility

@github-actions
Copy link

github-actions bot commented Jan 7, 2021

Revision: 1c58ce2e42fe9b1bf3ace430ca965b52fa079d09

Submitted crossbow builds: ursa-labs/crossbow @ actions-836

Task Status
test-r-version-compatibility Github Actions

@jonkeane
Copy link
Member Author

jonkeane commented Jan 7, 2021

@github-actions crossbow submit test-r-version-compatibility

@github-actions
Copy link

github-actions bot commented Jan 7, 2021

Revision: c92939bfc102618d1e683ea192962061b41fa56e

Submitted crossbow builds: ursa-labs/crossbow @ actions-837

Task Status
test-r-version-compatibility Github Actions

Copy link
Member

@nealrichardson nealrichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you're moving in the right direction

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This op is a little non-obvious. I'd rather have explicit skip function names (skip_if_version_equal, skip_if_version_less_than, etc.)

@jonkeane
Copy link
Member Author

jonkeane commented Jan 7, 2021

@github-actions crossbow submit test-r-version-compatibility

@github-actions
Copy link

github-actions bot commented Jan 7, 2021

Revision: 1d7788cb1466438b143f8bd2d24b9338fd0c7ac9

Submitted crossbow builds: ursa-labs/crossbow @ actions-839

Task Status
test-r-version-compatibility Github Actions

@jonkeane
Copy link
Member Author

jonkeane commented Jan 7, 2021

@github-actions crossbow submit test-r-version-compatibility

@github-actions
Copy link

github-actions bot commented Jan 7, 2021

Revision: 0cc1d259342f30c1e353f2680b7778c7c89d1f4e

Submitted crossbow builds: ursa-labs/crossbow @ actions-840

Task Status
test-r-version-compatibility Github Actions

@jonkeane
Copy link
Member Author

jonkeane commented Jan 7, 2021

@github-actions crossbow submit test-r-version-compatibility

@github-actions
Copy link

github-actions bot commented Jan 7, 2021

Revision: 2d9cdf3042b2a2763a5d0e0f0d67685dc3def499

Submitted crossbow builds: ursa-labs/crossbow @ actions-841

Task Status
test-r-version-compatibility Github Actions

@jonkeane
Copy link
Member Author

jonkeane commented Jan 8, 2021

@github-actions crossbow submit test-r-version-compatibility

@github-actions
Copy link

github-actions bot commented Jan 8, 2021

Revision: da48c83b1587e49b92c074096bae7e74db2e712f

Submitted crossbow builds: ursa-labs/crossbow @ actions-848

Task Status
test-r-version-compatibility Github Actions

@jonkeane jonkeane changed the title [WIP] Arrow 10623: [CI][R] Version 1.0.1 breaks data.frame attributes when reading file written by 2.0.0 Arrow 10623: [CI][R] Version 1.0.1 breaks data.frame attributes when reading file written by 2.0.0 Jan 8, 2021
@jonkeane
Copy link
Member Author

jonkeane commented Jan 8, 2021

Copy link
Member

@nealrichardson nealrichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few notes. Can you please update the JIRA in the PR title to be all caps etc.?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not packageVersion("arrow")?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
test_that(paste0("Can read feather version 1"), {
test_that("Can read feather version 1", {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
test_that("Can see the metadata (feather)", {
test_that("Can see the metadata (parquet)", {

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be failing on master per ARROW-10850?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like 1.0.1 works fine, but earlier versions (e.g. 0.17.0) saves files with the wrong case and does fail, but 1.0.1 (and 1.0.0) doesn't seem to have this issue. I need to dig more to inspect the files saved by 1.0.1 / through the source to find if there was a tolower() or the like that was introduced after 0.17

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, AFAIK, the bug only occurs for files written with 0.17, and when reading with 2.0.0.
In Arrow 0.17, we had experimental compression support specifically for feather (using some metadata string), while starting from Arrow 1.0, compression is part of the official format spec. So it's only those older pre-1.0.0 files that are the problem. And it only started failing to read them in Arrow 2.0, because we simply changed how the compressions are coded internally from lower case (as it was stored in the feather file) to upper case, in an unrelated commit (3694794#r43851633), forgetting the compatibility issue with older feather files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming this. I went searching on Friday and hadn't yet fully reconstructed that time log from git history :)

@jorisvandenbossche jorisvandenbossche changed the title Arrow 10623: [CI][R] Version 1.0.1 breaks data.frame attributes when reading file written by 2.0.0 ARROW-10623: [CI][R] Version 1.0.1 breaks data.frame attributes when reading file written by 2.0.0 Jan 11, 2021
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So once #9128 is merged, this should be fixed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I'll change these to skips and than add the Jira to the message then.

@github-actions
Copy link

@jonkeane jonkeane force-pushed the ARROW-10623-attr-writing-test branch from 64f69c7 to 5c9fe3f Compare January 11, 2021 15:52
@jonkeane
Copy link
Member Author

@github-actions crossbow submit test-r-version-compatibility

Copy link
Member

@nealrichardson nealrichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, just a note about consolidating some test code. Is there anything left to do here otherwise?

Comment on lines +36 to +61
expect_s3_class(df, "tbl")

expect_equal(
attributes(df),
list(
names = letters[1:4],
row.names = 1L,
top_level = list(
field_one = 12,
field_two = "more stuff"
),
class = c("tbl_df", "tbl", "data.frame")
)
)

# column-level attributes
expect_equal(attributes(df$a), list(class = "special_string"))
expect_equal(
attributes(df$c),
list(
row.names = 1L,
names = c("c1", "c2", "c3"),
class = c("tbl_df", "tbl", "data.frame")
)
)
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about factoring this out to an expect_something() helper in this file, such that your tests here are just expect_something(df)?

Of course, the simplest one would be expect_identical(df, example_with_metadata), right? Though there are tradeoffs with that.

# 1.0.1 didn't save top-level metadata, so we need to remove it.
example_with_metadata_sans_toplevel <- example_with_metadata
attributes(example_with_metadata_sans_toplevel)$top_level <- NULL
expect_equal(df, example_with_metadata_sans_toplevel)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just expect_identical and skip the rest of the assertions?

@jonkeane
Copy link
Member Author

I'll consolidate those tests some more like you mentioned. The only thing left to do (if we want to) would be to add backwards compatibility tests for stream-to-filesystem. I can add them if they add substantially more coverage than we have with feather files themselves.

@github-actions
Copy link

Revision: 5c9fe3f

Submitted crossbow builds: ursa-labs/crossbow @ actions-870

Task Status
test-r-version-compatibility Github Actions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants