Skip to content

Conversation

@romainfrancois
Copy link
Contributor

codec = NULL means choose the right default for the current platform. Default is set for SNAPPY on windows and GZIP otherwise, instead of universally using GZIP as the default and abort on windows.

@pitrou
Copy link
Member

pitrou commented Sep 9, 2019

I don't want to opiniate too much on this (not a R developer), but I'm curious why the choice of snappy (which AFAIK is little used nowadays, as there are better options). Also, why would gzip fail on Windows? Gzip is so universal that hopefully it should be supported on Windows as well.

@nealrichardson
Copy link
Member

It all depends on what the C++ library is built with on Windows. According to https://github.com/apache/arrow/blob/master/ci/PKGBUILD#L88, gzip should be there.

Snappy support was added specifically because it was (reported to be) common with Parquet files.

Speaking of Parquet, as the ticket said it would be nice to support writing to parquet with compression (cf. #5192, which further added support to the C++ library for specifying a compression level).

@pitrou
Copy link
Member

pitrou commented Sep 9, 2019

Right, so personally I would make gzip the default everywhere, if you do need a default.

@nealrichardson
Copy link
Member

Agreed.

@wesm
Copy link
Member

wesm commented Sep 9, 2019

Snappy should be the default compression when writing Parquet files. Make sure it isn't gzip or benchmarks will suffer (almost every library out there defaults to Snappy AFAIK)

But anywhere else where there is a need for a compression gzip seems like the reasonable default

@nealrichardson nealrichardson force-pushed the ARROW-6360-compression-update branch from 3fb788b to c8021a5 Compare September 10, 2019 23:44
Copy link
Member

@nealrichardson nealrichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rebased and pushed a few changes. I guess this R code is covered by test-compression.R, and the real business is covered in the C++ library elsewhere.

If it makes sense, you could separate the parquet encryption support to a different ticket and merge this.

# more concisely
# rap( ~ decor:::parse_cpp_function(context))
mutate(functions = map(context, decor:::parse_cpp_function)) %>%
{ bind_cols(., bind_rows(pull(., functions))) } %>%
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? If so, please add vctrs to the README where the codegen script dependencies are discussed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I see we already Suggest vctrs.

@codecov-io
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (master@9dec79b). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #5329   +/-   ##
=========================================
  Coverage          ?   76.09%           
=========================================
  Files             ?       56           
  Lines             ?     3572           
  Branches          ?        0           
=========================================
  Hits              ?     2718           
  Misses            ?      854           
  Partials          ?        0

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9dec79b...88fbf00. Read the comment docs.

@nealrichardson
Copy link
Member

pprudhvi pushed a commit to pprudhvi/arrow that referenced this pull request Sep 16, 2019
codec = NULL means choose the right default for the current platform. Default is set for `SNAPPY` on windows and `GZIP` otherwise, instead of universally using `GZIP` as the default and abort on windows.

Closes apache#5329 from romainfrancois/ARROW-6360-compression-update and squashes the following commits:

88fbf00 <Neal Richardson> Remove windows test skip
7d83d2b <Neal Richardson> Move compression_codec to Codec; add docs; allow lowercase compression types
c8021a5 <Romain Francois> Making GZIP default everywhere
9cc2b43 <Romain Francois> using vctrs::vec_bind instead of dplyr::bind_
3727946 <Romain Francois> codec = NULL means choose the right default for the current platform.

Lead-authored-by: Romain Francois <[email protected]>
Co-authored-by: Neal Richardson <[email protected]>
Signed-off-by: Neal Richardson <[email protected]>
@romainfrancois romainfrancois deleted the ARROW-6360-compression-update branch September 20, 2019 11:39
kou pushed a commit that referenced this pull request Feb 20, 2023
…Hub issue numbers (#34260)

Rewrite the Jira issue numbers to the GitHub issue numbers, so that the GitHub issue numbers are automatically linked to the issues by pkgdown's auto-linking feature.

Issue numbers have been rewritten based on the following correspondence.
Also, the pkgdown settings have been changed and updated to link to GitHub.

I generated the Changelog page using the `pkgdown::build_news()` function and verified that the links work correctly.

---
ARROW-6338	#5198
ARROW-6364	#5201
ARROW-6323	#5169
ARROW-6278	#5141
ARROW-6360	#5329
ARROW-6533	#5450
ARROW-6348	#5223
ARROW-6337	#5399
ARROW-10850	#9128
ARROW-10624	#9092
ARROW-10386	#8549
ARROW-6994	#23308
ARROW-12774	#10320
ARROW-12670	#10287
ARROW-16828	#13484
ARROW-14989	#13482
ARROW-16977	#13514
ARROW-13404	#10999
ARROW-16887	#13601
ARROW-15906	#13206
ARROW-15280	#13171
ARROW-16144	#13183
ARROW-16511	#13105
ARROW-16085	#13088
ARROW-16715	#13555
ARROW-16268	#13550
ARROW-16700	#13518
ARROW-16807	#13583
ARROW-16871	#13517
ARROW-16415	#13190
ARROW-14821	#12154
ARROW-16439	#13174
ARROW-16394	#13118
ARROW-16516	#13163
ARROW-16395	#13627
ARROW-14848	#12589
ARROW-16407	#13196
ARROW-16653	#13506
ARROW-14575	#13160
ARROW-15271	#13170
ARROW-16703	#13650
ARROW-16444	#13397
ARROW-15016	#13541
ARROW-16776	#13563
ARROW-15622	#13090
ARROW-18131	#14484
ARROW-18305	#14581
ARROW-18285	#14615
* Closes: #33631

Authored-by: SHIMA Tatsuya <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.09%. Comparing base (1f57ba1) to head (88fbf00).

❗ There is a different number of reports uploaded between BASE (1f57ba1) and HEAD (88fbf00). Click for more details.

HEAD has 5 uploads less than BASE
Flag BASE (1f57ba1) HEAD (88fbf00)
6 1
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #5329       +/-   ##
===========================================
- Coverage   88.72%   76.09%   -12.64%     
===========================================
  Files         945       56      -889     
  Lines      124147     3572   -120575     
  Branches     1437        0     -1437     
===========================================
- Hits       110146     2718   -107428     
+ Misses      13639      854    -12785     
+ Partials      362        0      -362     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants