-
Notifications
You must be signed in to change notification settings - Fork 68
Open
Labels
Description
I have not further analysed this...
FindRepeated (for unicode) calls IncUnicode2 which may (for surrogates) increment by 2. For the OPs that can match a surrogate this will be a problem.
OP_STAR/.... in MatchPrim will iterate the returned range in steps of one ReChar (codeunit): regInput := save + no;
Also the result of FindRepeated may be the
- codeunits for OP_ANY (counting a surrogate as 2)
- "Chars"/full-codepoints for any of the OP_NOT... (counting a surrogate as 1)
One way I can think of (.+).
- if the last char in the text is a surrogate, then the capture matches half a surrogate
- if the text is exactly one char, and that is a surrogate, then it incorrectly matches. It needs 2 chars, and takes each half of the surrogate as a full char.
OP_STAR goes back half the surrogate, and then OP_ANY does not check that it matches the 2nd part of a surrogate
This may be fixable (but I have not tested)
- OP_STAR... in MatchPrim must check
regInput := save + no;points to the 2nd part of a surrogate FindRepeatedalways most return the amount of codeunits (ReChars) / always counting a surrogate as 2.
Alexey-T