Skip to content

Changing '+' to '*' crashes antlr4 #1203

@SimonStPeter

Description

@SimonStPeter

The following refers to TSQL (MS's dialect of SQL).
Background: TSQL uses semicolons as separators, but allows them to be scattered freely and in multiples, with no SQL in between, as well as allowing them to be omitted altogether, hence this code to handle them:

sql_items :
        (
            opt_sql_separators
            sql_item
        ) *
        opt_sql_separators
    ;

Further, there is a kind of super-separator which terminates batches (lumps of SQL). It is traditionally given the name 'GO'. The token GO is not TSQL but recognised by a trivial scanner that zips through the code, collecting all TSQL into a batch, then when it hits GO or the end of the file, it sends the batch, as one lump, to the sql server engine to be executed. So

select 1
GO
select 2
select 3
GO
select 4
GO
Go
select 5

is processsed as follows:

select 1 is sent to the server to be executed. When it's finished any results are returned.
select 2 select 3 is then sent to be processed as a batch, results returned

select 4 is then sent, results returned

an empty batch is sent (the nothingness between the 2 consecutive GOs)

select 5 is then sent.

This is handled with this:

sql_batches :
        (
            opt_batch_separators
            sql_items
            opt_batch_separators
        ) *
    ;

For convenience I'm treating GO as if it were an SQL language statement ie. as part of parsing SQL, even though it's distinctly not (it also has a few more options, as in the code below but you can ignore these).

Anyway given the above, code crashes antlr. This works but is incorrect:

sql_items :
        (
            opt_sql_separators
            sql_item
        ) +   // <<<<<<<<<<< here, change to * to crash
        opt_sql_separators
    ;

Note the +. It needs to be * to handle batches as I want. Change it thus and it bombs.
Input is

throw

gets this:

throw
^Z
[@0,0:4='throw',<1>,1:0]
[@1,7:6='<EOF>',<-1>,2:0]
enter   start_parse, LT(1)=throw
enter   sql_batches, LT(1)=throw
enter   opt_batch_separators, LT(1)=throw
exit    opt_batch_separators, LT(1)=throw
exit    sql_batches, LT(1)=throw
exit    start_parse, LT(1)=throw
Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at org.antlr.v4.gui.TestRig.process(TestRig.java:246)
        at org.antlr.v4.gui.TestRig.process(TestRig.java:189)
        at org.antlr.v4.gui.TestRig.main(TestRig.java:143)
Caused by: java.lang.StackOverflowError
        at org.antlr.v4.runtime.atn.SemanticContext$Predicate.hashCode(SemanticContext.java:123)
        at org.antlr.v4.runtime.atn.ATNConfigSet$ConfigEqualityComparator.hashCode(ATNConfigSet.java:75)
        at org.antlr.v4.runtime.atn.ATNConfigSet$ConfigEqualityComparator.hashCode(ATNConfigSet.java:64)
        at org.antlr.v4.runtime.misc.Array2DHashSet.getBucket(Array2DHashSet.java:132)
        at org.antlr.v4.runtime.misc.Array2DHashSet.getOrAddImpl(Array2DHashSet.java:87)
        at org.antlr.v4.runtime.misc.Array2DHashSet.getOrAdd(Array2DHashSet.java:83)
        at org.antlr.v4.runtime.atn.ATNConfigSet.add(ATNConfigSet.java:170)
        at org.antlr.v4.runtime.atn.ParserATNSimulator.closure_(ParserATNSimulator.java:1551)
        at org.antlr.v4.runtime.atn.ParserATNSimulator.closureCheckingStopState(ParserATNSimulator.java:1535)
        at org.antlr.v4.runtime.atn.ParserATNSimulator.closure_(ParserATNSimulator.java:1603)
        at org.antlr.v4.runtime.atn.ParserATNSimulator.closureCheckingStopState(ParserATNSimulator.java:1535)
        at org.antlr.v4.runtime.atn.ParserATNSimulator.closure_(ParserATNSimulator.java:1603)
        at org.antlr.v4.runtime.atn.ParserATNSimulator.closureCheckingStopState(ParserATNSimulator.java:1535)
        at org.antlr.v4.runtime.atn.ParserATNSimulator.closure_(ParserATNSimulator.java:1603)
        at org.antlr.v4.runtime.atn.ParserATNSimulator.closureCheckingStopState(ParserATNSimulator.java:1535)
... (lots more here)

Can reproduce? ANTLR 4.5.3 on windows. Code below

grammar MSSQL;

start_parse :
        sql_batches
        EOF
    ;

sql_batches :
        (
            opt_batch_separators
            sql_items
            opt_batch_separators
        ) *
    ;

sql_items :
        (
            opt_sql_separators
            sql_item
        ) *   // <<<<<<<<<<< here, change to * to crash
        opt_sql_separators
    ;

SEMICOLON : ';' ;

opt_sql_separators :
        ( SEMICOLON * )
    ;

sql_item :
        throw_statement
    ;

throw_statement : 'throw' ;

DECDIGITS : [0-9]+ ;

fragment WUnl : ( '\r' ? )  '\n' ;

opt_batch_separators :
        ( batch_separator ) *
    ;

batch_separator : BATCH_SEPARATOR ;

BATCH_SEPARATOR :
        HWS*
        BATCH_SEPARATOR_TOKEN
        HWS*
        OPT_INT_CONST
        HWS*
        OPT_SLCOMMENT_BODY
        HWS*
        WUnl
    ;

BATCH_SEPARATOR_TOKEN : 'GO' | 'Go' | 'gO' | 'go' ;  // blech - todo
fragment OPT_INT_CONST : ( HWS+ DECDIGITS ) ? ;
fragment OPT_SLCOMMENT_BODY : ( '--' .*? ) ? ;
fragment HWS : [ \t] ; // horizontal whitespaces
fragment ALLWSes   : [ \t\r\n]+ ;

SKIPWS : ALLWSes -> skip ; 

(just realised, handling GO as an island grammar would possibly be better as it is a trivial language, however ignore that for now).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions