-
Notifications
You must be signed in to change notification settings - Fork 151
Description
It seems that J2CL is unable to load the Character.getType
or Unicode category constant fields (like Character.COMBINING_SPACING_MARK
), and throws a "symbol not found" error.
For reference, see google/closure-compiler#3639, where Closure Compiler was unable to interpret a composite Unicode sequence as a valid IdentifierPart
. In CC, there is a part of the parsing process which relies on Scanner.java
, a class that determines if a given token is an IdentifierStart
or IdentifierPart
in compliance with the ECMAScript spec. All token Unicode category checks are currently done by evaluating if the character belongs to any hard-coded Unicode ranges (see below), an approach that I replicated that for this fix, but is not as future-proof nor as legible as Character.getType(char) == Character.COMBINING_SPACING_MARK
, which will work as the Unicode standard evolves over time.
private static boolean isCombiningMark(char ch) {
return (
// 0300-036F
(0x0300 <= ch & ch <= 0x036F) |
// 1AB0–1AFF
(0x1AB0 <= ch & ch <= 0x1AFF) |
// 1DC0–1DFF
(0x1DC0 <= ch & ch <= 0x1DFF) |
// 20D0–20FF
(0x20D0 <= ch & ch <= 0x20FF) |
// FE20–FE2F
(0xFE20 <= ch & ch <= 0xFE2F)
);
// TODO (ctjl): Implement in a more reliable and future-proofed way, i.e.:
// return Character.getType(ch) == Character.NON_SPACING_MARK;
}
This hardcoded, manual approach is taken for every Unicode category check in the jsComp library because the J2CL compile must succeed in order to push a release (using Character.getType()
will compile using maven, but not with bazel). It would be beneficial for the CC library if J2CL could support these.