Skip to content

Conversation

@rok
Copy link
Member

@rok rok commented Jul 4, 2022

This is to resolve ARROW-16653.

Please note the intent here is to map out currently available formats via a new test. That is to support ARROW-16395 which will add ymd_hms() ymd_hm() ymd_h() dmy_hms() dmy_hm() dmy_h() mdy_hms() mdy_hm() mdy_h() ydm_hms() ydm_hm() ydm_h().

Currently most lubridate supported flags are available and this adds a test. Remaining are %q and %Op that we don't need to resolve ARROW-16395. We could open a ticket for adding support for the two remaining flags to C++ strptime or wait for users to request them.

@github-actions
Copy link

github-actions bot commented Jul 4, 2022

@rok rok marked this pull request as ready for review July 22, 2022 05:26
@rok rok force-pushed the ARROW-16653 branch 2 times, most recently from 9dde78a to 7cb9c69 Compare July 22, 2022 05:46
@rok
Copy link
Member Author

rok commented Jul 22, 2022

@dragosmg

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tiny things...looks great!

@dragosmg
Copy link
Contributor

dragosmg commented Jul 22, 2022

I think the scope of the Jira is to also test/ check/ make sure all formats are supported with parse_date_time(). This means we might need to get rid of the supported_orders list and test they actually work.

@rok
Copy link
Member Author

rok commented Jul 22, 2022

I think the scope of the Jira is to also test/ check/ make sure all formats are supported with parse_date_time(). This means we might need to get rid of the supported_orders list and test they actually work.

I forgot about that! 😄
Please check 772ee78.

@dragosmg
Copy link
Contributor

I forgot about that! 😄 Please check 772ee78.

Could you add a test/ tests for parse_date_time() with some of the formats / orders this enables?

@dragosmg
Copy link
Contributor

dragosmg commented Jul 22, 2022

LGTM once we have a test with the additional formats we now support & the CI passes. Thanks for your work on lubridate, @rok! 🙏🏻

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like @dragosmg was looking for a parse_date_time() test but I see a number of those tests in #13627 (am I reading that correctly?). In any case, I trust you to either add those tests to #13627 or create a follow-up ticket to add them in their own PR.

@dragosmg
Copy link
Contributor

dragosmg commented Jul 22, 2022

Once we remove the guardrails, it would be good to test orders that aren't covered by #13627, such as: parse_date_time("12/17/1996 04:00:00 PM", "mdYHMSp"). If my memory serves me well, we have test covering the following formats: y, Y, m, d, D, H, M, S, I, b, B. I don't think we have anything covering j, p, u, v, w etc.

@paleolimbot
Copy link
Member

@dragosmg Can you create a follow-up ticket for that?

@dragosmg
Copy link
Contributor

dragosmg commented Jul 22, 2022

I can, but @rok might be close to a unit test for those additional formats. If he is, I'd rather we have the tests in this PR as they go together.

@dragosmg
Copy link
Contributor

to clarify, that test would be nice to have, but it doesn't mean that without it the features aren't tested. ultimately, parse_date_time() calls strptime so, as long as those formats are tested with strptime they should work just fine with parse_date_time().

@rok
Copy link
Member Author

rok commented Jul 22, 2022

to clarify, that test would be nice to have, but it doesn't mean that without it the features aren't tested. ultimately, parse_date_time() calls strptime so, as long as those formats are tested with strptime they should work just fine with parse_date_time().

👍
The thing is I started looking into frankenformats (e.g. Y-%b-d) which are supported by lubridate, but they brake our orders parsing here. I'll try to make it work and make it reject frankenformats if I can't. Does that seem ok?

@rok rok force-pushed the ARROW-16653 branch 4 times, most recently from 9257a41 to ee856ba Compare July 25, 2022 06:47
Copy link
Contributor

@dragosmg dragosmg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work. I have some minor comments, and a larger concern. I am not convinced the build_formats() now works correctly:

# all the formatting characters should be supported
build_formats("wIpz")
#> Error: "wIpz" `orders` not supported in Arrow

supported_passed_orders <- intersect(orders, supported_orders)
formats_list <- map(orders, build_format_from_order)
formats_length <- map(map(formats_list, nchar), max)
invalid_orders <- formats_length < 6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding some comments here would help for our future selves. For example, on why you think 6 is a good threshold.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems unlikely that < 6 characters is the only threshold for an invalid order?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exactly sure how to check if an order is valid as it really comes down to strptime implementation used. I'm switching to detecting if no valid formats are present or not and will check what lubridate does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lubridate checks against an example of it's input, which is something we can't/won't do.
I'd say erroring on invalid formats and returning NA on unsuccessful parsing would be the thing to do.

Copy link
Contributor

@dragosmg dragosmg Jul 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. That sounds like a good plan

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I comment everywhere I saw some R code that looked out of place...in general I have little idea what's going on here and I think some comments would help.

supported_passed_orders <- intersect(orders, supported_orders)
formats_list <- map(orders, build_format_from_order)
formats_length <- map(map(formats_list, nchar), max)
invalid_orders <- formats_length < 6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems unlikely that < 6 characters is the only threshold for an invalid order?

Co-authored-by: Dewey Dunnington <[email protected]>
@rok rok force-pushed the ARROW-16653 branch 3 times, most recently from b68ad44 to 1cc88c9 Compare July 28, 2022 02:01
@rok
Copy link
Member Author

rok commented Jul 28, 2022

# all the formatting characters should be supported
build_formats("wIpz")
#> Error: "wIpz" `orders` not supported in Arrow

Added this to tests.

@rok
Copy link
Member Author

rok commented Jul 28, 2022

Thanks for the reviews @dragosmg @paleolimbot !
I've pushed a new set of changes, could you please review them? I'm still hoping to get this into the release 🤞 .

@rok rok requested a review from paleolimbot July 28, 2022 02:46
Copy link
Contributor

@dragosmg dragosmg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for sticking with it!

@rok
Copy link
Member Author

rok commented Jul 28, 2022

@github-actions crossbow submit -g r

@github-actions
Copy link

Revision: 17b46ab

Submitted crossbow builds: ursacomputing/crossbow @ actions-156d8fedd0

Task Status
conda-linux-gcc-py37-cpu-r40 Azure
conda-linux-gcc-py37-cpu-r41 Azure
conda-osx-clang-py37-r40 Azure
conda-osx-clang-py37-r41 Azure
conda-win-vs2017-py37-r40 Azure
conda-win-vs2017-py37-r41 Azure
homebrew-r-autobrew Github Actions
homebrew-r-brew Github Actions
r-binary-packages Github Actions
test-fedora-r-clang-sanitizer Azure
test-r-arrow-backwards-compatibility Github Actions
test-r-depsource-bundled Azure
test-r-depsource-system Github Actions
test-r-dev-duckdb Github Actions
test-r-devdocs Github Actions
test-r-gcc-11 Github Actions
test-r-gcc-12 Github Actions
test-r-install-local Github Actions
test-r-linux-as-cran Github Actions
test-r-linux-rchk Github Actions
test-r-linux-valgrind Azure
test-r-minimal-build Azure
test-r-offline-maximal Github Actions
test-r-offline-minimal Azure
test-r-rhub-debian-gcc-devel-lto-latest Azure
test-r-rhub-debian-gcc-release-custom-ccache Azure
test-r-rhub-ubuntu-gcc-release-latest Azure
test-r-rocker-r-base-latest Azure
test-r-rstudio-r-base-4.1-opensuse153 Azure
test-r-rstudio-r-base-4.2-centos7-devtoolset-8 Azure
test-r-rstudio-r-base-4.2-focal Azure
test-r-ubuntu-22.04 Github Actions
test-r-versions Github Actions
test-ubuntu-18.04-r-sanitizer Azure

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With these changes it's much more readable...thank you!

@paleolimbot
Copy link
Member

(The sanitizer error was fixed already and valgrind will likely fail when it completes but that is a separate issue)

@paleolimbot paleolimbot merged commit 3e87c66 into apache:master Jul 29, 2022
@rok
Copy link
Member Author

rok commented Jul 29, 2022

Thank you for fast reviews and feedback @dragosmg and @paleolimbot !

@ursabot
Copy link

ursabot commented Jul 29, 2022

Benchmark runs are scheduled for baseline = f645ffa and contender = 3e87c66. 3e87c66 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Failed ⬇️3.77% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.65% ⬆️0.0%] test-mac-arm
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.39% ⬆️0.07%] ursa-thinkcentre-m75q
Buildkite builds:
[Failed] 3e87c66a ec2-t3-xlarge-us-east-2
[Finished] 3e87c66a test-mac-arm
[Finished] 3e87c66a ursa-i9-9960x
[Finished] 3e87c66a ursa-thinkcentre-m75q
[Failed] f645ffa2 ec2-t3-xlarge-us-east-2
[Finished] f645ffa2 test-mac-arm
[Finished] f645ffa2 ursa-i9-9960x
[Finished] f645ffa2 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

kszucs pushed a commit that referenced this pull request Jul 29, 2022
…date_time` binding (#13506)

This is to resolve [ARROW-16653](https://issues.apache.org/jira/browse/ARROW-16653).

Please note the intent here is to map out currently available formats via a new test. That is to support [ARROW-16395](https://issues.apache.org/jira/browse/ARROW-16395) which will add `ymd_hms() ymd_hm() ymd_h() dmy_hms() dmy_hm() dmy_h() mdy_hms() mdy_hm() mdy_h() ydm_hms() ydm_hm() ydm_h()`.

Currently most [lubridate supported flags](https://lubridate.tidyverse.org/reference/parse_date_time.html#details) are available and this adds a test. Remaining are `%q` and `%Op` that we don't need to resolve [ARROW-16395](https://issues.apache.org/jira/browse/ARROW-16395). We could open a ticket for adding support for the two remaining flags to C++ `strptime` or wait for users to request them.

Lead-authored-by: Rok <[email protected]>
Co-authored-by: Rok Mihevc <[email protected]>
Signed-off-by: Dewey Dunnington <[email protected]>
paleolimbot added a commit that referenced this pull request Jan 16, 2023
Changes introduced by [ARROW-16653](#13506) were not written up by NEWS.md.

Lead-authored-by: Rok <[email protected]>
Co-authored-by: Dewey Dunnington <[email protected]>
Co-authored-by: Rok Mihevc <[email protected]>
Signed-off-by: Dewey Dunnington <[email protected]>
kou pushed a commit that referenced this pull request Feb 20, 2023
…Hub issue numbers (#34260)

Rewrite the Jira issue numbers to the GitHub issue numbers, so that the GitHub issue numbers are automatically linked to the issues by pkgdown's auto-linking feature.

Issue numbers have been rewritten based on the following correspondence.
Also, the pkgdown settings have been changed and updated to link to GitHub.

I generated the Changelog page using the `pkgdown::build_news()` function and verified that the links work correctly.

---
ARROW-6338	#5198
ARROW-6364	#5201
ARROW-6323	#5169
ARROW-6278	#5141
ARROW-6360	#5329
ARROW-6533	#5450
ARROW-6348	#5223
ARROW-6337	#5399
ARROW-10850	#9128
ARROW-10624	#9092
ARROW-10386	#8549
ARROW-6994	#23308
ARROW-12774	#10320
ARROW-12670	#10287
ARROW-16828	#13484
ARROW-14989	#13482
ARROW-16977	#13514
ARROW-13404	#10999
ARROW-16887	#13601
ARROW-15906	#13206
ARROW-15280	#13171
ARROW-16144	#13183
ARROW-16511	#13105
ARROW-16085	#13088
ARROW-16715	#13555
ARROW-16268	#13550
ARROW-16700	#13518
ARROW-16807	#13583
ARROW-16871	#13517
ARROW-16415	#13190
ARROW-14821	#12154
ARROW-16439	#13174
ARROW-16394	#13118
ARROW-16516	#13163
ARROW-16395	#13627
ARROW-14848	#12589
ARROW-16407	#13196
ARROW-16653	#13506
ARROW-14575	#13160
ARROW-15271	#13170
ARROW-16703	#13650
ARROW-16444	#13397
ARROW-15016	#13541
ARROW-16776	#13563
ARROW-15622	#13090
ARROW-18131	#14484
ARROW-18305	#14581
ARROW-18285	#14615
* Closes: #33631

Authored-by: SHIMA Tatsuya <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants