Skip to content

Conversation

@spencerkclark
Copy link
Collaborator

Closes #171.

This is a cleaned up implementation of num2date_exact illustrated in #171, complete with tests and support for masked input arrays.

I did a little more profiling. It happens that the speed depends on how distant the times are from the reference date. My initial tests were fairly extreme -- I was testing with timedeltas on the order of a million days, which resulted in some dates in year 4000 or higher. For more typical use-cases, this new method is actually faster than the old method (I'm not sure if this changes the perspective on the default):

In [1]: import cftime

In [2]: import numpy as np

In [3]: UNITS = "microseconds since 1900-01-01"

In [4]: CALENDAR = "proleptic_gregorian"

In [5]: times = np.random.randint(0, 86400000000 * 100000, size=(10000, ))

In [6]: %timeit cftime.num2date(times, UNITS, CALENDAR)
97.6 ms ± 1.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [7]: %timeit cftime.num2date_exact(times, UNITS, CALENDAR)
83.2 ms ± 882 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [8]: cftime.num2date_exact(times, UNITS, CALENDAR)
Out[8]:
array([cftime.DatetimeProlepticGregorian(2082-11-09 05:54:21.915404),
       cftime.DatetimeProlepticGregorian(1939-01-12 18:24:47.502144),
       cftime.DatetimeProlepticGregorian(2072-01-05 15:25:01.058681), ...,
       cftime.DatetimeProlepticGregorian(2079-10-28 20:44:07.104899),
       cftime.DatetimeProlepticGregorian(1932-11-23 18:29:45.244291),
       cftime.DatetimeProlepticGregorian(1916-03-13 14:29:15.970997)],
      dtype=object)

This is obviously a fairly important part of cftime. I think I got to everything in the test coverage, but please let me know if there's anything missing. Happy to iterate as long as needed; I'm in no rush to get this in.

Note this leverages a version of the xarray.coding.times.cast_to_int_if_safe function, which I think @shoyer added to xarray a while ago. Is it ok if we use that here?

@jswhit
Copy link
Collaborator

jswhit commented May 28, 2020

This looks fantastic @spencerkclark, thank you.
I suggest we get this in a 1.1.4 release, and make it the default if all goes well for the release after.

@jswhit
Copy link
Collaborator

jswhit commented May 29, 2020

Perhaps a better name would be num2date_int (then the existing implementation can be changed to num2date_float, and num2date can point to num2date_float for now, and switch to num2date_int in the next release).

@spencerkclark
Copy link
Collaborator Author

Perhaps a better name would be num2date_int (then the existing implementation can be changed to num2date_float, and num2date can point to num2date_float for now, and switch to num2date_int in the next release).

Thanks @jswhit I think that is a good idea -- I updated things accordingly.

If @shoyer is ok with us using xarray.coding.times.cast_to_int_if_safe then I think this may be ready to go. I guess beyond just that function, the approach I've taken here is largely inspired by how times are decoded in xarray with pandas, which seems to work pretty well.

@jswhit
Copy link
Collaborator

jswhit commented May 30, 2020

I updated the date2num function to to use the UNIT_CONVERSION_FACTORS

@jswhit
Copy link
Collaborator

jswhit commented Jun 1, 2020

Perhaps a better name would be num2date_int (then the existing implementation can be changed to num2date_float, and num2date can point to num2date_float for now, and switch to num2date_int in the next release).

Changed my mind on this - let's make it the default now as long as all tests pass. That way people will actually use it and we'll get more feedback.

raise ValueError("Units of months only valid for 360_day calendar.")

factor = UNIT_CONVERSION_FACTORS[unit]
scaled_times = factor * times
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe cast times to np.longdouble? (80 bit float in x86_64, but not on Windows) - this would increase the interval over which microsecond accuracy is preserved from ~285 years to > 290000 years (see comment in cast_to_int_if_safe). Suggest using changing to np.asanyarray(times, dtype=np.longdouble) - without casting to an array this will fail with list input.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to cast float times to np.longdouble prior to scaling. I also added some code to cast_to_int_if_safe which prevents integer casting if the values are outside the longdouble integer range on the particular platform. Let me know if that looks correct.

I make sure to cast any integer inputs to int64 before doing any multiplication too.

without casting to an array this will fail with list input.

Thanks for pointing this out -- I added a test case for list input and cast times as an array before doing anything else.

do not contain a time-zone offset, even if the specified `units`
contains one.
"""
return num2date_float(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go ahead, bite the bullet, and use numdate_int as long as all the tests pass.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing; it looks like they all do now.

@jswhit
Copy link
Collaborator

jswhit commented Jun 15, 2020

@spencerkclark I'd like to get this merged, can you take a look at my review?

@spencerkclark
Copy link
Collaborator Author

Thanks for the careful review and for the ping @jswhit; sorry for letting this slip. I'll try and fix this up by the end of the week.

@spencerkclark
Copy link
Collaborator Author

Thanks again @jswhit -- when you get a chance, I think things should be ready for another pass.

@jswhit jswhit merged commit 76e2b10 into Unidata:master Jun 22, 2020
@jswhit
Copy link
Collaborator

jswhit commented Jun 22, 2020

Looks good @spencerkclark - merging now

@spencerkclark spencerkclark deleted the exact branch June 23, 2020 00:48
@spencerkclark spencerkclark changed the title Add num2date_exact function to decode times exactly using timedelta addition Update num2date function to decode times exactly using timedelta addition Jan 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Decoding times in num2date exactly with timedelta arithmetic

2 participants