-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
Large max-k and ambiguity
The grammar for Rust contains a number of large max-k's.
$ bash /c/Users/Kenne/Documents/GitHub/g4-scripts/wip/find-low-k.sh ../examples/long/*.rs
14 ../examples/long/issue_1985_raw_string.rs
56 ../examples/long/literal.rs
162 ../examples/long/inlinepython_example.rs
1865 ../examples/long/intellijrust_test_allinone.rs
2052 ../examples/long/ssrust_ssserver.rs
2291 ../examples/long/rustls_quic.rs
5154 ../examples/long/ssrust_config.rs
6255 ../examples/long/deno_core_runtime.rs
8212 ../examples/long/leaf_vmessstream.rs
11/13-19:10:30 ~/issues/g4-4675/rust/Generated-CSharp
It also contains many ambiguities.
$ bash /c/Users/Kenne/Documents/GitHub/g4-scripts/wip/find-low-ambiguity.sh ../examples/long/*.rs
6 ../examples/long/issue_1985_raw_string.rs
12 ../examples/long/inlinepython_example.rs
20 ../examples/long/literal.rs
486 ../examples/long/leaf_vmessstream.rs
494 ../examples/long/ssrust_ssserver.rs
908 ../examples/long/ssrust_config.rs
1055 ../examples/long/rustls_quic.rs
1934 ../examples/long/intellijrust_test_allinone.rs
3760 ../examples/long/deno_core_runtime.rs
The ambiguity for expression is particularly large but it is not the only problem.
$ trperf ../examples/long/deno_core_runtime.rs -h -c aFdriTkmfaet | ( head -n 1 && tail -n +2 | sort -k1 -n -r; ) | head | column -t
Time to parse: 00:00:05.9877894
Ambiguities File Decision Rule Invocations Time Total-k Max-k Fallback Ambiguities Errors Transitions
1180 ../examples/long/deno_core_runtime.rs 174 expression 1623 10.773379 22999 495 1180 1180 0 7217
533 ../examples/long/deno_core_runtime.rs 179 expression 1047 2.675077 9943 304 533 533 0 3437
460 ../examples/long/deno_core_runtime.rs 244 patternWithoutRange 646 0.613918 2845 7 460 460 0 950
371 ../examples/long/deno_core_runtime.rs 278 typeNoBounds 449 2.228789 2757 27 371 371 0 1689
339 ../examples/long/deno_core_runtime.rs 277 type_ 406 2.793525 2647 27 339 339 0 1655
154 ../examples/long/deno_core_runtime.rs 324 genericArgs 156 1.267268 1842 25 154 154 0 978
104 ../examples/long/deno_core_runtime.rs 155 statement 708 0.971191 4673 58 104 104 0 1531
68 ../examples/long/deno_core_runtime.rs 180 expression 2670 0.739647 4116 44 1047 68 0 1421
29 ../examples/long/deno_core_runtime.rs 186 statements 708 12.210207 31433 1175 29 29 0 6073
11/13-19:17:14 ~/issues/g4-4675/rust/Generated-CSharp
Authorative source for the EBNF
It is never stated in the readme or the .g4's the authorative source for the EBNF. It should because we need to know how to update the grammar for future releases.
This grammar is derived from the official Rust grammar, https://doc.rust-lang.org/reference/, because it makes reference to section numbers.
Refactoring of expression
The expression rule in the Antlr grammar is refactored from expression in the reference. The refactoring places some operators behind compoundAssignOperator and comparisonOperator. Doing so introduces operator precedence bugs. This is a well-known problem in Antlr4. These rules must be unfolded. in expression.
At some point, the grammar should be scraped from the parser implementation because documentation always lags behind implementation.
typeNoBounds vs traitObjectType and traitObjectTypeOneBound vs typePath
Consider this Rust input:
#[empty_attr()]
const T: i32 = 92;
This is parsed two ways:
../examples/long/x.d=277.a=1: (crate (item (outerAttribute (POUND "#") (LSQUAREBRACKET "[") (attr (simplePath (simplePathSegment (identifier (NON_KEYWORD_IDENTIFIER "empty_attr")))) (attrInput (delimTokenTree (LPAREN "(") (RPAREN ")")))) (RSQUAREBRACKET "]")) (visItem (constantItem (KW_CONST "const") (identifier (NON_KEYWORD_IDENTIFIER "T")) (COLON ":") (type_ (typeNoBounds (traitObjectTypeOneBound (traitBound (typePath (typePathSegment (pathIdentSegment (identifier (NON_KEYWORD_IDENTIFIER "i32"))))))))) (EQ "=") (expression (literalExpression (INTEGER_LITERAL "92"))) (SEMI ";")))) (EOF ""))
../examples/long/x.d=277.a=3: (crate (item (outerAttribute (POUND "#") (LSQUAREBRACKET "[") (attr (simplePath (simplePathSegment (identifier (NON_KEYWORD_IDENTIFIER "empty_attr")))) (attrInput (delimTokenTree (LPAREN "(") (RPAREN ")")))) (RSQUAREBRACKET "]")) (visItem (constantItem (KW_CONST "const") (identifier (NON_KEYWORD_IDENTIFIER "T")) (COLON ":") (type_ (traitObjectType (typeParamBounds (typeParamBound (traitBound (typePath (typePathSegment (pathIdentSegment (identifier (NON_KEYWORD_IDENTIFIER "i32")))))))))) (EQ "=") (expression (literalExpression (INTEGER_LITERAL "92"))) (SEMI ";")))) (EOF ""))
Here we see ambiguity in Decision State 277 (NFA state 2048 in rule _type).
In addition, there is further ambiguity between traitObjectTypeOneBound vs typePath:
../examples/long/x.d=278.a=3: (crate (item (outerAttribute (POUND "#") (LSQUAREBRACKET "[") (attr (simplePath (simplePathSegment (identifier (NON_KEYWORD_IDENTIFIER "empty_attr")))) (attrInput (delimTokenTree (LPAREN "(") (RPAREN ")")))) (RSQUAREBRACKET "]")) (visItem (constantItem (KW_CONST "const") (identifier (NON_KEYWORD_IDENTIFIER "T")) (COLON ":") (type_ (typeNoBounds (traitObjectTypeOneBound (traitBound (typePath (typePathSegment (pathIdentSegment (identifier (NON_KEYWORD_IDENTIFIER "i32"))))))))) (EQ "=") (expression (literalExpression (INTEGER_LITERAL "92"))) (SEMI ";")))) (EOF ""))
../examples/long/x.d=278.a=4: (crate (item (outerAttribute (POUND "#") (LSQUAREBRACKET "[") (attr (simplePath (simplePathSegment (identifier (NON_KEYWORD_IDENTIFIER "empty_attr")))) (attrInput (delimTokenTree (LPAREN "(") (RPAREN ")")))) (RSQUAREBRACKET "]")) (visItem (constantItem (KW_CONST "const") (identifier (NON_KEYWORD_IDENTIFIER "T")) (COLON ":") (type_ (typeNoBounds (typePath (typePathSegment (pathIdentSegment (identifier (NON_KEYWORD_IDENTIFIER "i32"))))))) (EQ "=") (expression (literalExpression (INTEGER_LITERAL "92"))) (SEMI ";")))) (EOF ""))
Here we see ambiguity in Decision State 278 (NFA state 2064 in rule typeNoBounds).
Input i32 is not a trait, but a primitive type. But, the parser cannot decide what it is. The problem occurs in several rules, as shown above.
The grammar needs disambiguating predicates for symbol types.