Skip to content

Conversation

@nealrichardson
Copy link
Member

No description provided.

Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for the quick PR + a bit of cleanup along the way.

Comment on lines +210 to +211
# For backwards compatibility with Scanner-based writer (arrow <= 7.0.0):
# retain metadata from source dataset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we had one already, a jira would be nice here, but I'm sure we'll remember this is where it's going even without it, so let'snot

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I checked the sfarrow example that failed and it works on this branch (and fails on the master branch, as expected):

Details
# remotes::install_github("apache/arrow/r#13105")
library(arrow, warn.conflicts = FALSE)
library(sfarrow)

# read spatial object
nc <- sf::st_read(system.file("shape/nc.shp", package="sf"), quiet = TRUE)

# create random grouping
nc$group <- sample(1:3, nrow(nc), replace = TRUE)

# use dplyr to group the dataset. %>% also allowed
nc_g <- dplyr::group_by(nc, group)

# write out to parquet datasets
tf <- tempfile()  # create temporary location

# partitioning determined by dplyr 'group_vars'
write_sf_dataset(nc_g, path = tf)
#> Warning: This is an initial implementation of Parquet/Feather file support and
#> geo metadata. This is tracking version 0.1.0 of the metadata
#> (https://github.com/geopandas/geo-arrow-spec). This metadata
#> specification may change and does not yet make stability promises.  We
#> do not yet recommend using this in a production setting unless you are
#> able to rewrite your Parquet/Feather files.

list.files(tf, recursive = TRUE)
#> [1] "group=1/part-0.parquet" "group=2/part-0.parquet" "group=3/part-0.parquet"

# open parquet files from dataset
ds <- arrow::open_dataset(tf)

# create a query. %>% also allowed
q <- dplyr::filter(ds, group == 1)

# read the dataset (piping syntax also works)
read_sf_dataset(dataset = q)
#> Simple feature collection with 31 features and 15 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -83.73952 ymin: 33.88199 xmax: -75.7637 ymax: 36.55716
#> Geodetic CRS:  NAD27
#> First 10 features:
#>     AREA PERIMETER CNTY_ CNTY_ID        NAME  FIPS FIPSNO CRESS_ID BIR74 SID74
#> 1  0.070     2.968  1831    1831   Currituck 37053  37053       27   508     1
#> 2  0.153     2.206  1832    1832 Northampton 37131  37131       66  1421     9
#> 3  0.109     1.325  1841    1841      Person 37145  37145       73  1556     4
#> 4  0.190     2.204  1846    1846     Halifax 37083  37083       42  3608    18
#> 5  0.081     1.288  1880    1880     Watauga 37189  37189       95  1323     1
#> 6  0.086     1.267  1893    1893      Yadkin 37197  37197       99  1269     1
#> 7  0.111     1.392  1904    1904    Alamance 37001  37001        1  4672    13
#> 8  0.059     1.319  1927    1927    Mitchell 37121  37121       61   671     0
#> 9  0.122     1.516  1932    1932    Caldwell 37027  37027       14  3609     6
#> 10 0.080     1.307  1936    1936      Yancey 37199  37199      100   770     0
#>    NWBIR74 BIR79 SID79 NWBIR79 group                       geometry
#> 1      123   830     2     145     1 MULTIPOLYGON (((-76.00897 3...
#> 2     1066  1606     3    1197     1 MULTIPOLYGON (((-77.21767 3...
#> 3      613  1790     4     650     1 MULTIPOLYGON (((-78.8068 36...
#> 4     2365  4463    17    2980     1 MULTIPOLYGON (((-77.33221 3...
#> 5       17  1775     1      33     1 MULTIPOLYGON (((-81.80622 3...
#> 6       65  1568     1      76     1 MULTIPOLYGON (((-80.49554 3...
#> 7     1243  5767    11    1397     1 MULTIPOLYGON (((-79.24619 3...
#> 8        1   919     2       4     1 MULTIPOLYGON (((-82.11885 3...
#> 9      309  4249     9     360     1 MULTIPOLYGON (((-81.32813 3...
#> 10      12   869     1      10     1 MULTIPOLYGON (((-82.27921 3...

Created on 2022-05-09 by the reprex package (v2.0.1)

@github-actions
Copy link

github-actions bot commented May 9, 2022

nealrichardson added a commit to nealrichardson/arrow that referenced this pull request May 9, 2022
Closes apache#13105 from nealrichardson/write-dataset-metadata

Authored-by: Neal Richardson <[email protected]>
Signed-off-by: Neal Richardson <[email protected]>
@nealrichardson nealrichardson deleted the write-dataset-metadata branch May 9, 2022 19:49
@ursabot
Copy link

ursabot commented May 11, 2022

Benchmark runs are scheduled for baseline = 214135d and contender = d00caa9. d00caa9 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.43% ⬆️0.0%] test-mac-arm
[Finished ⬇️0.36% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.16% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] d00caa94 ec2-t3-xlarge-us-east-2
[Finished] d00caa94 test-mac-arm
[Finished] d00caa94 ursa-i9-9960x
[Finished] d00caa94 ursa-thinkcentre-m75q
[Finished] 214135d8 ec2-t3-xlarge-us-east-2
[Finished] 214135d8 test-mac-arm
[Finished] 214135d8 ursa-i9-9960x
[Finished] 214135d8 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

kou pushed a commit that referenced this pull request Feb 20, 2023
…Hub issue numbers (#34260)

Rewrite the Jira issue numbers to the GitHub issue numbers, so that the GitHub issue numbers are automatically linked to the issues by pkgdown's auto-linking feature.

Issue numbers have been rewritten based on the following correspondence.
Also, the pkgdown settings have been changed and updated to link to GitHub.

I generated the Changelog page using the `pkgdown::build_news()` function and verified that the links work correctly.

---
ARROW-6338	#5198
ARROW-6364	#5201
ARROW-6323	#5169
ARROW-6278	#5141
ARROW-6360	#5329
ARROW-6533	#5450
ARROW-6348	#5223
ARROW-6337	#5399
ARROW-10850	#9128
ARROW-10624	#9092
ARROW-10386	#8549
ARROW-6994	#23308
ARROW-12774	#10320
ARROW-12670	#10287
ARROW-16828	#13484
ARROW-14989	#13482
ARROW-16977	#13514
ARROW-13404	#10999
ARROW-16887	#13601
ARROW-15906	#13206
ARROW-15280	#13171
ARROW-16144	#13183
ARROW-16511	#13105
ARROW-16085	#13088
ARROW-16715	#13555
ARROW-16268	#13550
ARROW-16700	#13518
ARROW-16807	#13583
ARROW-16871	#13517
ARROW-16415	#13190
ARROW-14821	#12154
ARROW-16439	#13174
ARROW-16394	#13118
ARROW-16516	#13163
ARROW-16395	#13627
ARROW-14848	#12589
ARROW-16407	#13196
ARROW-16653	#13506
ARROW-14575	#13160
ARROW-15271	#13170
ARROW-16703	#13650
ARROW-16444	#13397
ARROW-15016	#13541
ARROW-16776	#13563
ARROW-15622	#13090
ARROW-18131	#14484
ARROW-18305	#14581
ARROW-18285	#14615
* Closes: #33631

Authored-by: SHIMA Tatsuya <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants