Skip to content

Conversation

@wjones127
Copy link
Member

@wjones127 wjones127 commented Jul 7, 2022

This PR:

  • Moves minio integration tests into a generic suite that is now run on minio (S3 emulator) and GCS testbench (GCS emulator). This is run in CI.
  • Move Minio and GCS test server initialization to within the tests. This makes it easier to setup the background processes in a cross-platform way.
  • MinIO and GCS tests are now run on R Ubuntu CI. MinIO is now run on Windows CI. I couldn't get GCS to run on Windows CI yet, due to some issue where the tests hang (I believe this is an issue with the test setup and not the functionality). See follow up at: ARROW-17149: [R] Enable GCS tests for Windows
  • Sets the default retry timeout to 15 seconds to mitigate issue described by ARROW-17020. This affects explicitly-created fs with GcsFileSystem$create() (and gs_bucket() introducted in ARROW-16887: [R][Docs] Update Filesystem Vignette for GCS #13601), but not URIs.

@github-actions
Copy link

github-actions bot commented Jul 7, 2022

@github-actions
Copy link

github-actions bot commented Jul 7, 2022

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@wjones127
Copy link
Member Author

Tests seem to be failing right now because of a difference in how GCS and S3 handle paths:

library(arrow)
#> 
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#> 
#>     timestamp

example_data <- arrow_table(x = Array$create(c(1, 2, 3)))

testbench_port <- Sys.getenv("TESTBENCH_PORT", "9001")
fs <- GcsFileSystem$create(
  endpoint_override = sprintf("localhost:%s", testbench_port),
  retry_limit_seconds = 1,
  scheme = "http",
  anonymous = TRUE # Will fail to resolve host name if anonymous isn't TRUE
)
fs$CreateDir("test")
write_parquet(example_data, fs$path("test/test.parquet"))
write_parquet(example_data, fs$path("test/test.parquet/"))
# GCS seems to handle them as separate paths
fs$ls("test")
#> [1] "test/test.parquet" "test/test.parquet"


minio_key <- Sys.getenv("MINIO_ACCESS_KEY", "minioadmin")
minio_secret <- Sys.getenv("MINIO_SECRET_KEY", "minioadmin")
minio_port <- Sys.getenv("MINIO_PORT", "9000")
fs <- S3FileSystem$create(
  access_key = minio_key,
  secret_key = minio_secret,
  scheme = "http",
  endpoint_override = paste0("localhost:", minio_port),
  allow_bucket_creation = TRUE,
  allow_bucket_deletion = TRUE
)
fs$CreateDir("test")
write_parquet(example_data, fs$path("test/test.parquet"))
write_parquet(example_data, fs$path("test/test.parquet/"))
# S3 implementation seems to remove the last slash
fs$ls("test")
#> [1] "test/test.parquet"

Created on 2022-07-11 by the reprex package (v2.0.1)

@wjones127
Copy link
Member Author

I've included the change set from #13577. The C++ changes will go away when I rebase after merging that PR.

@wjones127 wjones127 force-pushed the ARROW-16879-r-gcs-testbench branch 3 times, most recently from f059ed4 to c4fd663 Compare July 20, 2022 04:45
@wjones127 wjones127 closed this Jul 20, 2022
@wjones127 wjones127 reopened this Jul 20, 2022
@wjones127 wjones127 marked this pull request as ready for review July 20, 2022 19:59
Copy link
Member

@nealrichardson nealrichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really cool, thanks for doing this.

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few notes! Looks great!

@wjones127 wjones127 force-pushed the ARROW-16879-r-gcs-testbench branch from 609a166 to 66c3c84 Compare August 1, 2022 20:40
@wjones127 wjones127 force-pushed the ARROW-16879-r-gcs-testbench branch from 66c3c84 to f107de9 Compare September 19, 2022 19:54
@wjones127 wjones127 force-pushed the ARROW-16879-r-gcs-testbench branch from 8fe2c67 to b42042a Compare September 28, 2022 18:06
Copy link
Member

@nealrichardson nealrichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI failure is unrelated, will merge

@nealrichardson nealrichardson merged commit b7f9dfc into apache:master Oct 4, 2022
@ursabot
Copy link

ursabot commented Oct 4, 2022

Benchmark runs are scheduled for baseline = 4660180 and contender = b7f9dfc. b7f9dfc is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.0% ⬆️0.0%] test-mac-arm
[Failed ⬇️0.27% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.96% ⬆️0.07%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] b7f9dfc2 ec2-t3-xlarge-us-east-2
[Failed] b7f9dfc2 test-mac-arm
[Failed] b7f9dfc2 ursa-i9-9960x
[Finished] b7f9dfc2 ursa-thinkcentre-m75q
[Finished] 46601808 ec2-t3-xlarge-us-east-2
[Failed] 46601808 test-mac-arm
[Failed] 46601808 ursa-i9-9960x
[Finished] 46601808 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants