Skip to content

Commit f55b180

Browse files
rbucktonljharb
authored andcommitted
Normative: Add RegExp Modifiers (tc39#3221)
1 parent 10cc1b7 commit f55b180

File tree

1 file changed

+74
-4
lines changed

1 file changed

+74
-4
lines changed

spec.html

Lines changed: 74 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -35788,7 +35788,15 @@ <h2>Syntax</h2>
3578835788
`\` AtomEscape[?UnicodeMode, ?NamedCaptureGroups]
3578935789
CharacterClass[?UnicodeMode, ?UnicodeSetsMode]
3579035790
`(` GroupSpecifier[?UnicodeMode]? Disjunction[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] `)`
35791-
`(?:` Disjunction[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] `)`
35791+
`(?` RegularExpressionModifiers `:` Disjunction[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] `)`
35792+
`(?` RegularExpressionModifiers `-` RegularExpressionModifiers `:` Disjunction[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] `)`
35793+
35794+
RegularExpressionModifiers ::
35795+
[empty]
35796+
RegularExpressionModifiers RegularExpressionModifier
35797+
35798+
RegularExpressionModifier :: one of
35799+
`i` `m` `s`
3579235800

3579335801
SyntaxCharacter :: one of
3579435802
`^` `$` `\` `.` `*` `+` `?` `(` `)` `[` `]` `{` `}` `|`
@@ -36033,6 +36041,27 @@ <h1>Static Semantics: Early Errors</h1>
3603336041
It is a Syntax Error if the MV of the first |DecimalDigits| is strictly greater than the MV of the second |DecimalDigits|.
3603436042
</li>
3603536043
</ul>
36044+
<emu-grammar>Atom :: `(?` RegularExpressionModifiers `:` Disjunction `)`</emu-grammar>
36045+
<ul>
36046+
<li>
36047+
It is a Syntax Error if the source text matched by |RegularExpressionModifiers| contains the same code point more than once.
36048+
</li>
36049+
</ul>
36050+
<emu-grammar>Atom :: `(?` RegularExpressionModifiers `-` RegularExpressionModifiers `:` Disjunction `)`</emu-grammar>
36051+
<ul>
36052+
<li>
36053+
It is a Syntax Error if the source text matched by the first |RegularExpressionModifiers| and the source text matched by the second |RegularExpressionModifiers| are both empty.
36054+
</li>
36055+
<li>
36056+
It is a Syntax Error if the source text matched by the first |RegularExpressionModifiers| contains the same code point more than once.
36057+
</li>
36058+
<li>
36059+
It is a Syntax Error if the source text matched by the second |RegularExpressionModifiers| contains the same code point more than once.
36060+
</li>
36061+
<li>
36062+
It is a Syntax Error if any code point in the source text matched by the first |RegularExpressionModifiers| is also contained in the source text matched by the second |RegularExpressionModifiers|.
36063+
</li>
36064+
</ul>
3603636065
<emu-grammar>AtomEscape :: `k` GroupName</emu-grammar>
3603736066
<ul>
3603836067
<li>
@@ -37230,9 +37259,19 @@ <h1>
3723037259
<emu-note>
3723137260
<p>Parentheses of the form `(` |Disjunction| `)` serve both to group the components of the |Disjunction| pattern together and to save the result of the match. The result can be used either in a backreference (`\\` followed by a non-zero decimal number), referenced in a replace String, or returned as part of an array from the regular expression matching Abstract Closure. To inhibit the capturing behaviour of parentheses, use the form `(?:` |Disjunction| `)` instead.</p>
3723237261
</emu-note>
37233-
<emu-grammar>Atom :: `(?:` Disjunction `)`</emu-grammar>
37262+
<emu-grammar>Atom :: `(?` RegularExpressionModifiers `:` Disjunction `)`</emu-grammar>
3723437263
<emu-alg>
37235-
1. Return CompileSubpattern of |Disjunction| with arguments _rer_ and _direction_.
37264+
1. Let _addModifiers_ be the source text matched by |RegularExpressionModifiers|.
37265+
1. Let _removeModifiers_ be the empty String.
37266+
1. Let _modifiedRer_ be UpdateModifiers(_rer_, CodePointsToString(_addModifiers_), _removeModifiers_).
37267+
1. Return CompileSubpattern of |Disjunction| with arguments _modifiedRer_ and _direction_.
37268+
</emu-alg>
37269+
<emu-grammar>Atom :: `(?` RegularExpressionModifiers `-` RegularExpressionModifiers `:` Disjunction `)`</emu-grammar>
37270+
<emu-alg>
37271+
1. Let _addModifiers_ be the source text matched by the first |RegularExpressionModifiers|.
37272+
1. Let _removeModifiers_ be the source text matched by the second |RegularExpressionModifiers|.
37273+
1. Let _modifiedRer_ be UpdateModifiers(_rer_, CodePointsToString(_addModifiers_), CodePointsToString(_removeModifiers_)).
37274+
1. Return CompileSubpattern of |Disjunction| with arguments _modifiedRer_ and _direction_.
3723637275
</emu-alg>
3723737276

3723837277
<!-- AtomEscape -->
@@ -37384,6 +37423,34 @@ <h1>
3738437423
<p>In case-insignificant matches when HasEitherUnicodeFlag(_rer_) is *false*, the mapping is based on Unicode Default Case Conversion algorithm toUppercase rather than toCasefold, which results in some subtle differences. For example, `Ω` (U+2126 OHM SIGN) is mapped by toUppercase to itself but by toCasefold to `ω` (U+03C9 GREEK SMALL LETTER OMEGA) along with `Ω` (U+03A9 GREEK CAPITAL LETTER OMEGA), so *"\u2126"* is matched by `/[ω]/ui` and `/[\u03A9]/ui` but not by `/[ω]/i` or `/[\u03A9]/i`. Also, no code point outside the Basic Latin block is mapped to a code point within it, so strings such as *"\u017F ſ"* and *"\u212A K"* are not matched by `/[a-z]/i`.</p>
3738537424
</emu-note>
3738637425
</emu-clause>
37426+
37427+
<emu-clause id="sec-updatemodifiers" type="abstract operation">
37428+
<h1>
37429+
UpdateModifiers (
37430+
_rer_: a RegExp Record,
37431+
_add_: a String,
37432+
_remove_: a String,
37433+
): a RegExp Record
37434+
</h1>
37435+
<dl class="header">
37436+
</dl>
37437+
<emu-alg>
37438+
1. Assert: _add_ and _remove_ have no elements in common.
37439+
1. Let _ignoreCase_ be _rer_.[[IgnoreCase]].
37440+
1. Let _multiline_ be _rer_.[[Multiline]].
37441+
1. Let _dotAll_ be _rer_.[[DotAll]].
37442+
1. Let _unicode_ be _rer_.[[Unicode]].
37443+
1. Let _unicodeSets_ be _rer_.[[UnicodeSets]].
37444+
1. Let _capturingGroupsCount_ be _rer_.[[CapturingGroupsCount]].
37445+
1. If _remove_ contains *"i"*, set _ignoreCase_ to *false*.
37446+
1. Else if _add_ contains *"i"*, set _ignoreCase_ to *true*.
37447+
1. If _remove_ contains *"m"*, set _multiline_ to *false*.
37448+
1. Else if _add_ contains *"m"*, set _multiline_ to *true*.
37449+
1. If _remove_ contains *"s"*, set _dotAll_ to *false*.
37450+
1. Else if _add_ contains *"s"*, set _dotAll_ to *true*.
37451+
1. Return the RegExp Record { [[IgnoreCase]]: _ignoreCase_, [[Multiline]]: _multiline_, [[DotAll]]: _dotAll_, [[Unicode]]: _unicode_, [[UnicodeSets]]: _unicodeSets_, [[CapturingGroupsCount]]: _capturingGroupsCount_ }.
37452+
</emu-alg>
37453+
</emu-clause>
3738737454
</emu-clause>
3738837455

3738937456
<emu-clause id="sec-compilecharacterclass" type="sdo" oldids="sec-characterclass">
@@ -50858,6 +50925,8 @@ <h1>Regular Expressions</h1>
5085850925
<emu-prodref name="Quantifier"></emu-prodref>
5085950926
<emu-prodref name="QuantifierPrefix"></emu-prodref>
5086050927
<emu-prodref name="Atom"></emu-prodref>
50928+
<emu-prodref name="RegularExpressionModifiers"></emu-prodref>
50929+
<emu-prodref name="RegularExpressionModifier"></emu-prodref>
5086150930
<emu-prodref name="SyntaxCharacter"></emu-prodref>
5086250931
<emu-prodref name="PatternCharacter"></emu-prodref>
5086350932
<emu-prodref name="AtomEscape"></emu-prodref>
@@ -51021,7 +51090,8 @@ <h2>Syntax</h2>
5102151090
`\` [lookahead == `c`]
5102251091
CharacterClass[~UnicodeMode, ~UnicodeSetsMode]
5102351092
`(` GroupSpecifier[~UnicodeMode]? Disjunction[~UnicodeMode, ~UnicodeSetsMode, ?NamedCaptureGroups] `)`
51024-
`(?:` Disjunction[~UnicodeMode, ~UnicodeSetsMode, ?NamedCaptureGroups] `)`
51093+
`(?` RegularExpressionModifiers `:` Disjunction[~UnicodeMode, ~UnicodeSetsMode, ?NamedCaptureGroups] `)`
51094+
`(?` RegularExpressionModifiers `-` RegularExpressionModifiers `:` Disjunction[~UnicodeMode, ~UnicodeSetsMode, ?NamedCaptureGroups] `)`
5102551095
InvalidBracedQuantifier
5102651096
ExtendedPatternCharacter
5102751097

0 commit comments

Comments
 (0)