Separate immutable code and descriptors from function objects. #2091

aardvark179 · 2025-09-17T17:05:55Z

Separate function objects and the code they execute.

What's the problem we're trying to solve here?

The way we have traditionally represented JavaScript functions has a few drawbacks which we've encountered.:

We have separated functions into a class hierarchy based both on the type of the function (arrow, bound, etc.) and on the way the function was compiled (interpreter, class compilation, and built-in (LambdaFunction and similar)). This class hierarchy starts to become especially strained if we consider what would be needed to support class natively (where the constructor carries extra information about fields and methods), or start to support additional compilation levels or new interpreters.
Although top level scripts are effectively immutable, and can thus be cached to by frameworks which use Rhino, they are built as function objects, and can carry a lot of unintended state which may be mutable.
We have inconsistent behaviour between interpreted and compiled functions.
Encoding more complex metadata such as activation frame shape proved needlessly difficult, as have other things such as lazy source providers which we've added in our fork which might be worth upstreaming.
For large scripts the way we compiled them to function objects produced extremely large utility methods (e.g. param and var name accessors) which create a large optimisation barrier.
The way we handled compiled functions made correct generator function behaviour impossible to achieve.
We have no good framework for introducing just in time compilation of JS functions.

The overall solution

This change introduces JSCode objects which have an execute and a resume method. These objects are then used to represent the call, or construct actions on an object, and are held as part of a JSDescriptor which carries all the meta data related to JSFunctions or JSScripts created from these descriptors.

The JSFunction then only need to inherit from BaseFunction and hold a home object and a lexically bound this if they represent an arrow function.

InterpreterData has been refactored to be a subclass of JSCode, and its metadata moved to a descriptor, and the class compiler now produces static methods which are invoked by instances of OptJSCode objects.

The construction of the descriptors has been factored out into common code used by both the interpreter and the class compiler, and the now unnecessary classes (ArrowFunction, InterpretedFunction, and NativeFunction) removed.

Direct calls and function creation within compiled classes

Direct calls within a compiled script were done using an instanceof check (to ensure the callee was compiled from this class) and a comparison of its function index. This has been changed to a check that the descriptor of the callee matches the descriptors injected into the class as part of its initialisation., these descriptors are also used for the creation of function objects within the class.

Creating the `OptJSCode` instances

I tried multiple techniques for creating instances of OptJSCode objects to invoke the static compiled methods (reflect, method handle invokes, method handle proxies, and horrible abuses of LambdaMetaFactory) and found the best to the compilation of subclasses separate from the main script compilation. It might be possible to ease meatspace pressure in large systems by using hidden classes but these are only supported on Java 15+ so I haven't done that as part of this change.

Performance and scalability

Most benchmarks remain largely unchanged, though earley (which creates an extremely large number of activation frames) shows a 35% to 40% speed up as the compiled overrides of NativeFunction methods averaged over 2000 bytes and posed a substantial optimisation barrier.

With this change we are able to compile considerably larger scripts which previously generated NativeFunction overrides exceeding the 64K method byte code limit. We do still see issues with some large scripts due to bugs and limitations in our constant pool handling.

Loose ends and future work

There are a few loose ends which I have not tackled as part of this PR, but will tackle in followup work:

Our current Evaluator interface is poorly designed internally and passes opaque objects about which can then be turned into either Scripts or Function. I'd like to refactor that, and this would also help to remove the remaining suppression of unchecked casts we need in this area.
Class compilation has been complex to work on because any errors in type signatures are only detectable at run time, and sometimes only on very specific code paths (e.g. object literals with more than 10 fields). @ZZZank attempted to start refactoring Signatures.java in Compute (instead of define) method descriptor in .optimizer.Signatures #1948, but that would still push errors to run time. I think we actually do this at class initialisation time and catch errors much earlier, and simplify work on class compilation in the process.
Our constant pool generation has a bug with large numbers of int32 constants (the underlying index is an unsigned short, but we treat it as a signed short and get bitten by sign extension when it is cast back to an int) but we also seem to generate a large number of duplicate entries. The former is a one line fix but the latter will need more work to resolve.
I believe a large number of functions may no longer require activation frames as the function is always available to compiled or interpreted code, and can provide the home object.
~~Debugger interface testing. I am far from sure the debug interface is still working, but I don't appear to have broken any tests.~~ Smoke test added.

Future enhancements that might be made based on these changes

I think we could replace the current direct call logic with an invokeDynamic based solution that would work across compiled classes. We would need to expand OptJSCode to provide a mechanism for getting the right method handle, but this would not be too hard to implement, with the immutable descriptors provided a simple check that the function being invoked is the expected one.
Unification of our generator implementation, and improvement in our spec compliance.
We may no longer need the separate LambdaFunction and LambdaConstructor for built in functions. I've tried converting a small number of native functions over as proof of concept and the idea appears to work well. Future work might allow us to combine the replacement of direct call linking with the movement of built in functions to JSFunctions to allow optimised linking of built in functions.
Optimised creation and access to activation frames. Internally we have done a couple of prototypes around property maps and activation frames, and these are much more tractable with the new descriptors.

aardvark179 · 2025-09-17T17:09:41Z

@gbrail, let me know if you'd like this split into smaller PRs. The first 3 commits are all laying the ground work for the changes to the interpreter and class compilation, and the tidy up and test properties update could reasonably be split out as well.

aardvark179 · 2025-09-17T17:50:18Z

Sigh, that will teach me not to rest locally on all Java versions.

ZZZank · 2025-09-17T17:58:46Z

The SecurityControllerTest, if I remember correctly, is a problemetic test that happens to be working for current Rhino. At least 2 PRs tried to fix it, but none get merged. For example: https://github.com/FOCONIS/rhino/blob/better-type-support/tests%2Fsrc%2Ftest%2Fjava%2Forg%2Fmozilla%2Fjavascript%2Ftests%2FSecurityControllerTest.java#L91-L91

aardvark179 · 2025-09-17T18:25:02Z

In this case it's just that I just wasn't setting the security domain stuff on descriptors, and JDK21 really doesn't care. I'll fix it tomorrow.

aardvark179 · 2025-09-17T18:58:03Z

Also, we do appear to have at least a smoke test for the debugger in our fork that should be easy to upstream. I’ll add that to this PR tomorrow as well.

gbrail · 2025-09-18T00:40:36Z

If this means that you got rid of the giant switch table in every compiled class, and it looks like you did, I'm quite excited! I will spend some time reading over the next few days, thanks for this so far and let me try to understand it better!

aardvark179 · 2025-09-18T03:33:26Z

Yes, the giant switch tables are all gone.

ZZZank · 2025-09-18T17:09:18Z

Class compilation has been complex to work on because any errors in type signatures are only detectable at run time

On this I have a rough idea. We can make type signatures auto-generated from actual class/method on class init, so that it can use runtime info and will always reflect the actual class/method signature. Some sort of mark-scan approach, like the one used in #1950 , should be capable of doing this.

Another thing is matching actual object types with static arg types, I believe this will require ClassWriter to have the ability to track on-stack object type and method param/return type, some high-level bytecode manipulation framework might be required.

aardvark179 · 2025-09-18T20:09:36Z

Yeah, I was planning on resolving the actual methods so that a a type mismatch would fail as early as possible. Trying to be too clever with types on the stack can be actively unhelpful in my experience.

aardvark179 · 2025-09-18T20:12:30Z

I also almost have a nice refactoring of our compiler API To make it type safe. Not quite happy with that yet so I think it should wait to go into a follow up PR. We’ve survived with casting from object up till now.

gbrail · 2025-09-27T18:55:42Z

I really appreciate this and I'm especially excited about cleaning up the bytecode.

You've also added a bunch of new interfaces which are pretty fundamental to how the new structure works. For the sake of future maintainers, can you please add some docs? We don't need anything crazy, but top-level comments on "JSCode" and "JSScript" and other new interfaces like that, along with a bit on what the methods mean, is going to help a lot of people (and me!). I won't be offended if some of the docs are written by AI either. Thanks!

gbrail · 2025-09-27T18:57:30Z

One other thing in the interest of future maintenance -- the JSDescriptor constructor, with its long long list of parameters, seems like a great candidate for a "Builder" pattern. With that said, I'll look more at who has to actually use it, but if it's humans then that might be a more flexible long-term pattern.

aardvark179 · 2025-10-03T14:18:23Z

Heh @gbrail, I meant to reply to this earlier. The descriptors do have a builder object, with mutable fields, but I had not created it as classic builder with withX style methods as it's only used internally, and mostly in just one place. We also can't make them opaque because we need to use all that builder state in the ClassCompiler, as I discovered yesterday.

I realised while tidying up other things on my scope branch branch was that I needed to touch the ClassCompiler, that there were NO tests for it, and that I had broken it on this branch. Oops.

I've now added a smoke test and support for class compilation. One thing I'm not sure about is the expected interface provided here. As far as I can tell it's simply that one of the classes generated by the compiler will contain a main that will create the script object and defer to the appropriate method. Are there more constraints than that?

I'll separate the debug and compiler smoke tests into a separate PR because they are likely useful to others.

While writing the smoke test I also discovered that we had one bug in the byte code compilation that prevented arrow functions being compiled in one edge case, and this wasn't showing up as a test failure because we fallback silently to the interpreter. I tried disabling the fallback mode after fixing that and found 18 failing Mozilla suite tests (failing due to large script size). The main branch has the same failing tests, along with 4 test 262 cases which also failed due to large method size. Should we make a separate change to disable fallback in test 262 tests to ensure we do not regress on this?

aardvark179 · 2025-10-03T16:58:15Z

The occasional build failure seems to be a bug in Jacobo when dealing with large numbers of generated classes and hash collisions. I'll see if I can turn off coverage for the generated OptJSCode stuff.

aardvark179 · 2025-10-03T19:27:19Z

Right, I've excluded the small generated accessors from coverage. That roughly halves the size of the generated coverage file and will mean we are much less likely to hit a collision.

gbrail · 2025-10-04T17:36:35Z

I like the idea of finding a way to test when we regress due to large script size -- maybe a way to mark tests that we know will cause a fallback to interpreted mode and disable it for others. (Perhaps the RhinoConfig mechanism could be used to set a flag we enable when running the tests?)

gbrail · 2025-10-04T17:48:13Z

OK, I get it on the big constructor thing, this is making sense and I like how it's finding limitations in other areas.

I'd still like JavaDoc for classes like JSCode, JSFunction, JSScript so that future generations will understand what they're doing when they try to maintain this!

aardvark179 · 2025-10-04T18:12:17Z

Sure, that seems entirely reasonable.

gbrail · 2025-10-10T00:27:28Z

Awesome this is great -- Thanks!

aardvark179 force-pushed the aardvark179-function-refactor branch 3 times, most recently from c29720e to c26776f Compare September 18, 2025 14:10

aardvark179 mentioned this pull request Sep 21, 2025

Implement ES2024 Array.fromAsync #2076

Open

aardvark179 force-pushed the aardvark179-function-refactor branch from c26776f to 0ffdb64 Compare October 3, 2025 14:05

aardvark179 force-pushed the aardvark179-function-refactor branch 2 times, most recently from a0cc743 to 36f620e Compare October 3, 2025 15:03

aardvark179 force-pushed the aardvark179-function-refactor branch from bfbbac2 to e196fa5 Compare October 4, 2025 16:32

aardvark179 force-pushed the aardvark179-function-refactor branch from e196fa5 to 93f827a Compare October 6, 2025 12:59

aardvark179 added 3 commits October 7, 2025 10:37

Change array literal related methods to be static.

35cb0e5

Introduce function and script descriptor infrastructure.

8e25e89

Create common code gen utils for creating descriptors.

ce14a5d

Convert interpreter to emit JSFunctions.

a9bbd94

aardvark179 force-pushed the aardvark179-function-refactor branch from 93f827a to 388a4d6 Compare October 7, 2025 17:38

aardvark179 added 5 commits October 7, 2025 11:09

Convert class compilation to emit JSFunctions.

eda2349

Tidy up unused classes and functions.

5d2a97c

Update test 262 properties.

6d0574d

Exclude generated OptJSCode subclasses from coverage instrumentation.

35abbe7

Tidy up and documentation.

dde7488

aardvark179 force-pushed the aardvark179-function-refactor branch from 388a4d6 to dde7488 Compare October 7, 2025 18:09

gbrail merged commit 8e69a72 into mozilla:master Oct 10, 2025
3 checks passed

rbri mentioned this pull request Oct 10, 2025

remove a no longer used call and make an if statement looking not that depressive #2118

Open

Separate immutable code and descriptors from function objects. #2091

Separate immutable code and descriptors from function objects. #2091

Uh oh!

Conversation

aardvark179 commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Separate function objects and the code they execute.

What's the problem we're trying to solve here?

The overall solution

Direct calls and function creation within compiled classes

Creating the OptJSCode instances

Performance and scalability

Loose ends and future work

Uh oh!

aardvark179 commented Sep 17, 2025

Uh oh!

aardvark179 commented Sep 17, 2025

Uh oh!

ZZZank commented Sep 17, 2025

Uh oh!

aardvark179 commented Sep 17, 2025

Uh oh!

aardvark179 commented Sep 17, 2025

Uh oh!

gbrail commented Sep 18, 2025

Uh oh!

aardvark179 commented Sep 18, 2025

Uh oh!

ZZZank commented Sep 18, 2025

Uh oh!

aardvark179 commented Sep 18, 2025

Uh oh!

aardvark179 commented Sep 18, 2025

Uh oh!

gbrail commented Sep 27, 2025

Uh oh!

gbrail commented Sep 27, 2025

Uh oh!

aardvark179 commented Oct 3, 2025

Uh oh!

aardvark179 commented Oct 3, 2025

Uh oh!

aardvark179 commented Oct 3, 2025

Uh oh!

gbrail commented Oct 4, 2025

Uh oh!

gbrail commented Oct 4, 2025

Uh oh!

aardvark179 commented Oct 4, 2025

Uh oh!

gbrail commented Oct 10, 2025

Uh oh!

Uh oh!

Uh oh!

aardvark179 commented Sep 17, 2025 •

edited

Loading

Creating the `OptJSCode` instances