Skip to content

Conversation

gbrail
Copy link
Collaborator

@gbrail gbrail commented Oct 10, 2025

Our existing code to convert numbers to decimal is based on a very old
set of C code from 2000. It was enhanced a while back with some code from
V8 that is faster but still must fall back to the old code. All of this code
is very complicated.

Recent developments in computer science have resulted in a new generation
of algorithms for this task. The "Schubfach" algorithm, for example, is
fast and does not take much code. Many libraries, including the JDK, use it
already.

We have to have our own floating-point conversion code because JavaScript
specifies slighty different behavior than Java, but we can use the same
algorithm.

This PR replaces the two generations of DtoA code with an implementation of
Schubfach for the general-purpose case of converting doubles to strings.
The new code is much shorter than the old code. It is faster for most, but
not all, use cases, although this does not affect our standard benchmark
results unfortunately.

Subsequent PRs will also replace the Number functions toExponential,
toPrecision, and toFixed, with code based on this same base, so we'll
be able to retire yet more of the old stuff. Finally, we'll see if there
are other options for simplifying the case of converting numbers to
strings with non-base-10 representations.

I've verified this code with a pretty good set of test cases and also
compared intermedate steps in the output with the C++ code that it's based
on and it has been correct so far.

gbrail added 2 commits October 9, 2025 20:28
Number-to-string conversions in Rhino have historically been made using
the DtoA class, which is based on 20-year-old reference C code. A while
back we added support for a faster algorithm (the Grisu algorithm).

This change switches to the Schubfach algorithm, which is a relatively
new algorithm used in many standard libraries including Java.

The resulting code is much simpler, while still matching the output of
the original code and also the output of V8. In a few cases, it is also
a bit faster.
@p-bakker
Copy link
Collaborator

Nice! Where does this leave #254?

@aardvark179
Copy link
Contributor

I'm a little nervous of doing this, partly because this is replacing one implementation which we likely will not maintain with another that , and partly because introducing an additional licence means there will be some pain in adopting these changes in forks. Is there a reason we cannot use a Java DecimalFormat to do this?

@gbrail
Copy link
Collaborator Author

gbrail commented Oct 11, 2025

Those signed integers in Java will get you every time!

Thanks for the pointer to the Nashorn test data. That data (which originated in V8) helped find some bugs.

I'm going to switch to an implementation much closer to the one from the inventor of Schubfach, which will take me a few days, so I'm going to close this PR and open a new one that's better. Thanks!

@gbrail gbrail closed this Oct 11, 2025
@gbrail
Copy link
Collaborator Author

gbrail commented Oct 11, 2025

I get the concerns about maintenance, although I do say that in the end we're going to end up with quite a bit less code in this area, and in a way I kind of think the concerns about maintenance apply to the whole project...

As for the license -- the code I'm currently looking at has an MIT license, and we already have MIT-license code in this project from V8. It would require people who adopt Rhino to update their NOTICE file, yes, but TBH mature companies shipping commercial software that relies on Rhino ought to be doing that.

Or, if maintaining a fork is a big burden, they can join those of you already on this thread by helping or funding people to work on the project.

@gbrail
Copy link
Collaborator Author

gbrail commented Oct 11, 2025

And I haven't been able to crack the code on using DecimalFormat instead of all this -- JavaScript has very specific formatting rules that require different formats depending on the exponent and I'm not sure how simple they are -- I'll keep experimenting, though. Plus DecimalFormat has other difficulties like it's unusual relationship with threads.

After this particular PR, however, in whatever form it eventually takes, the implementation of toExponential, toDecimal, and toFixed is basically a few lines of code that creates a BigDecimal instance, rounds the number, and from then on it's purely string manipulation to create the output, so again a lot of complexity will go away.

@aardvark179
Copy link
Contributor

I think associating a formatter with context would be sufficient, we already make lots of assumptions about a context being associated with a single thread even when we relax those assumptions around individual objects.

having more test vectors is good. Most of the bugs I’ve found in the old code are to do with sign extensions, so I recommend adding extra cases round MSBs and similar. Happy to help in generating additional test data if you need it.

@gbrail
Copy link
Collaborator Author

gbrail commented Oct 11, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants