-
Notifications
You must be signed in to change notification settings - Fork 43
Update num2date function to decode times exactly using timedelta addition #176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This looks fantastic @spencerkclark, thank you. |
|
Perhaps a better name would be |
Thanks @jswhit I think that is a good idea -- I updated things accordingly. If @shoyer is ok with us using |
|
I updated the |
Changed my mind on this - let's make it the default now as long as all tests pass. That way people will actually use it and we'll get more feedback. |
cftime/_cftime.pyx
Outdated
| raise ValueError("Units of months only valid for 360_day calendar.") | ||
|
|
||
| factor = UNIT_CONVERSION_FACTORS[unit] | ||
| scaled_times = factor * times |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe cast times to np.longdouble? (80 bit float in x86_64, but not on Windows) - this would increase the interval over which microsecond accuracy is preserved from ~285 years to > 290000 years (see comment in cast_to_int_if_safe). Suggest using changing to np.asanyarray(times, dtype=np.longdouble) - without casting to an array this will fail with list input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea to cast float times to np.longdouble prior to scaling. I also added some code to cast_to_int_if_safe which prevents integer casting if the values are outside the longdouble integer range on the particular platform. Let me know if that looks correct.
I make sure to cast any integer inputs to int64 before doing any multiplication too.
without casting to an array this will fail with list input.
Thanks for pointing this out -- I added a test case for list input and cast times as an array before doing anything else.
cftime/_cftime.pyx
Outdated
| do not contain a time-zone offset, even if the specified `units` | ||
| contains one. | ||
| """ | ||
| return num2date_float( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go ahead, bite the bullet, and use numdate_int as long as all the tests pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing; it looks like they all do now.
|
@spencerkclark I'd like to get this merged, can you take a look at my review? |
|
Thanks for the careful review and for the ping @jswhit; sorry for letting this slip. I'll try and fix this up by the end of the week. |
|
Thanks again @jswhit -- when you get a chance, I think things should be ready for another pass. |
|
Looks good @spencerkclark - merging now |
Closes #171.
This is a cleaned up implementation of
num2date_exactillustrated in #171, complete with tests and support for masked input arrays.I did a little more profiling. It happens that the speed depends on how distant the times are from the reference date. My initial tests were fairly extreme -- I was testing with timedeltas on the order of a million days, which resulted in some dates in year 4000 or higher. For more typical use-cases, this new method is actually faster than the old method (I'm not sure if this changes the perspective on the default):
This is obviously a fairly important part of cftime. I think I got to everything in the test coverage, but please let me know if there's anything missing. Happy to iterate as long as needed; I'm in no rush to get this in.
Note this leverages a version of the
xarray.coding.times.cast_to_int_if_safefunction, which I think @shoyer added to xarray a while ago. Is it ok if we use that here?