Skip to content

[rust] The grammar for Rust is very slow, contains other problems. #4677

@kaby76

Description

@kaby76

Large max-k and ambiguity

The grammar for Rust contains a number of large max-k's.

$ bash /c/Users/Kenne/Documents/GitHub/g4-scripts/wip/find-low-k.sh ../examples/long/*.rs
14 ../examples/long/issue_1985_raw_string.rs
56 ../examples/long/literal.rs
162 ../examples/long/inlinepython_example.rs
1865 ../examples/long/intellijrust_test_allinone.rs
2052 ../examples/long/ssrust_ssserver.rs
2291 ../examples/long/rustls_quic.rs
5154 ../examples/long/ssrust_config.rs
6255 ../examples/long/deno_core_runtime.rs
8212 ../examples/long/leaf_vmessstream.rs
11/13-19:10:30 ~/issues/g4-4675/rust/Generated-CSharp

It also contains many ambiguities.

$ bash /c/Users/Kenne/Documents/GitHub/g4-scripts/wip/find-low-ambiguity.sh ../examples/long/*.rs
6 ../examples/long/issue_1985_raw_string.rs
12 ../examples/long/inlinepython_example.rs
20 ../examples/long/literal.rs
486 ../examples/long/leaf_vmessstream.rs
494 ../examples/long/ssrust_ssserver.rs
908 ../examples/long/ssrust_config.rs
1055 ../examples/long/rustls_quic.rs
1934 ../examples/long/intellijrust_test_allinone.rs
3760 ../examples/long/deno_core_runtime.rs

The ambiguity for expression is particularly large but it is not the only problem.

$ trperf ../examples/long/deno_core_runtime.rs  -h -c aFdriTkmfaet | ( head -n 1 && tail -n +2 | sort -k1 -n -r; ) | head | column -t
Time to parse: 00:00:05.9877894
Ambiguities  File                                   Decision  Rule                 Invocations  Time       Total-k  Max-k  Fallback  Ambiguities  Errors  Transitions
1180         ../examples/long/deno_core_runtime.rs  174       expression           1623         10.773379  22999    495    1180      1180         0       7217
533          ../examples/long/deno_core_runtime.rs  179       expression           1047         2.675077   9943     304    533       533          0       3437
460          ../examples/long/deno_core_runtime.rs  244       patternWithoutRange  646          0.613918   2845     7      460       460          0       950
371          ../examples/long/deno_core_runtime.rs  278       typeNoBounds         449          2.228789   2757     27     371       371          0       1689
339          ../examples/long/deno_core_runtime.rs  277       type_                406          2.793525   2647     27     339       339          0       1655
154          ../examples/long/deno_core_runtime.rs  324       genericArgs          156          1.267268   1842     25     154       154          0       978
104          ../examples/long/deno_core_runtime.rs  155       statement            708          0.971191   4673     58     104       104          0       1531
68           ../examples/long/deno_core_runtime.rs  180       expression           2670         0.739647   4116     44     1047      68           0       1421
29           ../examples/long/deno_core_runtime.rs  186       statements           708          12.210207  31433    1175   29        29           0       6073
11/13-19:17:14 ~/issues/g4-4675/rust/Generated-CSharp

Authorative source for the EBNF

It is never stated in the readme or the .g4's the authorative source for the EBNF. It should because we need to know how to update the grammar for future releases.

This grammar is derived from the official Rust grammar, https://doc.rust-lang.org/reference/, because it makes reference to section numbers.

Refactoring of expression

The expression rule in the Antlr grammar is refactored from expression in the reference. The refactoring places some operators behind compoundAssignOperator and comparisonOperator. Doing so introduces operator precedence bugs. This is a well-known problem in Antlr4. These rules must be unfolded. in expression.

At some point, the grammar should be scraped from the parser implementation because documentation always lags behind implementation.

typeNoBounds vs traitObjectType and traitObjectTypeOneBound vs typePath

Consider this Rust input:

#[empty_attr()]
const T: i32 = 92;

This is parsed two ways:

../examples/long/x.d=277.a=1: (crate (item (outerAttribute (POUND "#") (LSQUAREBRACKET "[") (attr (simplePath (simplePathSegment (identifier (NON_KEYWORD_IDENTIFIER "empty_attr")))) (attrInput (delimTokenTree (LPAREN "(") (RPAREN ")")))) (RSQUAREBRACKET "]")) (visItem (constantItem (KW_CONST "const") (identifier (NON_KEYWORD_IDENTIFIER "T")) (COLON ":") (type_ (typeNoBounds (traitObjectTypeOneBound (traitBound (typePath (typePathSegment (pathIdentSegment (identifier (NON_KEYWORD_IDENTIFIER "i32"))))))))) (EQ "=") (expression (literalExpression (INTEGER_LITERAL "92"))) (SEMI ";")))) (EOF ""))
../examples/long/x.d=277.a=3: (crate (item (outerAttribute (POUND "#") (LSQUAREBRACKET "[") (attr (simplePath (simplePathSegment (identifier (NON_KEYWORD_IDENTIFIER "empty_attr")))) (attrInput (delimTokenTree (LPAREN "(") (RPAREN ")")))) (RSQUAREBRACKET "]")) (visItem (constantItem (KW_CONST "const") (identifier (NON_KEYWORD_IDENTIFIER "T")) (COLON ":") (type_ (traitObjectType (typeParamBounds (typeParamBound (traitBound (typePath (typePathSegment (pathIdentSegment (identifier (NON_KEYWORD_IDENTIFIER "i32")))))))))) (EQ "=") (expression (literalExpression (INTEGER_LITERAL "92"))) (SEMI ";")))) (EOF ""))

Here we see ambiguity in Decision State 277 (NFA state 2048 in rule _type).

Image

In addition, there is further ambiguity between traitObjectTypeOneBound vs typePath:

../examples/long/x.d=278.a=3: (crate (item (outerAttribute (POUND "#") (LSQUAREBRACKET "[") (attr (simplePath (simplePathSegment (identifier (NON_KEYWORD_IDENTIFIER "empty_attr")))) (attrInput (delimTokenTree (LPAREN "(") (RPAREN ")")))) (RSQUAREBRACKET "]")) (visItem (constantItem (KW_CONST "const") (identifier (NON_KEYWORD_IDENTIFIER "T")) (COLON ":") (type_ (typeNoBounds (traitObjectTypeOneBound (traitBound (typePath (typePathSegment (pathIdentSegment (identifier (NON_KEYWORD_IDENTIFIER "i32"))))))))) (EQ "=") (expression (literalExpression (INTEGER_LITERAL "92"))) (SEMI ";")))) (EOF ""))
../examples/long/x.d=278.a=4: (crate (item (outerAttribute (POUND "#") (LSQUAREBRACKET "[") (attr (simplePath (simplePathSegment (identifier (NON_KEYWORD_IDENTIFIER "empty_attr")))) (attrInput (delimTokenTree (LPAREN "(") (RPAREN ")")))) (RSQUAREBRACKET "]")) (visItem (constantItem (KW_CONST "const") (identifier (NON_KEYWORD_IDENTIFIER "T")) (COLON ":") (type_ (typeNoBounds (typePath (typePathSegment (pathIdentSegment (identifier (NON_KEYWORD_IDENTIFIER "i32"))))))) (EQ "=") (expression (literalExpression (INTEGER_LITERAL "92"))) (SEMI ";")))) (EOF ""))

Here we see ambiguity in Decision State 278 (NFA state 2064 in rule typeNoBounds).

Image

Input i32 is not a trait, but a primitive type. But, the parser cannot decide what it is. The problem occurs in several rules, as shown above.

The grammar needs disambiguating predicates for symbol types.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions