-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
The following refers to TSQL (MS's dialect of SQL).
Background: TSQL uses semicolons as separators, but allows them to be scattered freely and in multiples, with no SQL in between, as well as allowing them to be omitted altogether, hence this code to handle them:
sql_items :
(
opt_sql_separators
sql_item
) *
opt_sql_separators
;
Further, there is a kind of super-separator which terminates batches (lumps of SQL). It is traditionally given the name 'GO'. The token GO is not TSQL but recognised by a trivial scanner that zips through the code, collecting all TSQL into a batch, then when it hits GO or the end of the file, it sends the batch, as one lump, to the sql server engine to be executed. So
select 1
GO
select 2
select 3
GO
select 4
GO
Go
select 5
is processsed as follows:
select 1
is sent to the server to be executed. When it's finished any results are returned.
select 2 select 3
is then sent to be processed as a batch, results returned
select 4
is then sent, results returned
an empty batch is sent (the nothingness between the 2 consecutive GOs)
select 5
is then sent.
This is handled with this:
sql_batches :
(
opt_batch_separators
sql_items
opt_batch_separators
) *
;
For convenience I'm treating GO as if it were an SQL language statement ie. as part of parsing SQL, even though it's distinctly not (it also has a few more options, as in the code below but you can ignore these).
Anyway given the above, code crashes antlr. This works but is incorrect:
sql_items :
(
opt_sql_separators
sql_item
) + // <<<<<<<<<<< here, change to * to crash
opt_sql_separators
;
Note the +
. It needs to be *
to handle batches as I want. Change it thus and it bombs.
Input is
throw
gets this:
throw
^Z
[@0,0:4='throw',<1>,1:0]
[@1,7:6='<EOF>',<-1>,2:0]
enter start_parse, LT(1)=throw
enter sql_batches, LT(1)=throw
enter opt_batch_separators, LT(1)=throw
exit opt_batch_separators, LT(1)=throw
exit sql_batches, LT(1)=throw
exit start_parse, LT(1)=throw
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.antlr.v4.gui.TestRig.process(TestRig.java:246)
at org.antlr.v4.gui.TestRig.process(TestRig.java:189)
at org.antlr.v4.gui.TestRig.main(TestRig.java:143)
Caused by: java.lang.StackOverflowError
at org.antlr.v4.runtime.atn.SemanticContext$Predicate.hashCode(SemanticContext.java:123)
at org.antlr.v4.runtime.atn.ATNConfigSet$ConfigEqualityComparator.hashCode(ATNConfigSet.java:75)
at org.antlr.v4.runtime.atn.ATNConfigSet$ConfigEqualityComparator.hashCode(ATNConfigSet.java:64)
at org.antlr.v4.runtime.misc.Array2DHashSet.getBucket(Array2DHashSet.java:132)
at org.antlr.v4.runtime.misc.Array2DHashSet.getOrAddImpl(Array2DHashSet.java:87)
at org.antlr.v4.runtime.misc.Array2DHashSet.getOrAdd(Array2DHashSet.java:83)
at org.antlr.v4.runtime.atn.ATNConfigSet.add(ATNConfigSet.java:170)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closure_(ParserATNSimulator.java:1551)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closureCheckingStopState(ParserATNSimulator.java:1535)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closure_(ParserATNSimulator.java:1603)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closureCheckingStopState(ParserATNSimulator.java:1535)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closure_(ParserATNSimulator.java:1603)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closureCheckingStopState(ParserATNSimulator.java:1535)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closure_(ParserATNSimulator.java:1603)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closureCheckingStopState(ParserATNSimulator.java:1535)
... (lots more here)
Can reproduce? ANTLR 4.5.3 on windows. Code below
grammar MSSQL;
start_parse :
sql_batches
EOF
;
sql_batches :
(
opt_batch_separators
sql_items
opt_batch_separators
) *
;
sql_items :
(
opt_sql_separators
sql_item
) * // <<<<<<<<<<< here, change to * to crash
opt_sql_separators
;
SEMICOLON : ';' ;
opt_sql_separators :
( SEMICOLON * )
;
sql_item :
throw_statement
;
throw_statement : 'throw' ;
DECDIGITS : [0-9]+ ;
fragment WUnl : ( '\r' ? ) '\n' ;
opt_batch_separators :
( batch_separator ) *
;
batch_separator : BATCH_SEPARATOR ;
BATCH_SEPARATOR :
HWS*
BATCH_SEPARATOR_TOKEN
HWS*
OPT_INT_CONST
HWS*
OPT_SLCOMMENT_BODY
HWS*
WUnl
;
BATCH_SEPARATOR_TOKEN : 'GO' | 'Go' | 'gO' | 'go' ; // blech - todo
fragment OPT_INT_CONST : ( HWS+ DECDIGITS ) ? ;
fragment OPT_SLCOMMENT_BODY : ( '--' .*? ) ? ;
fragment HWS : [ \t] ; // horizontal whitespaces
fragment ALLWSes : [ \t\r\n]+ ;
SKIPWS : ALLWSes -> skip ;
(just realised, handling GO as an island grammar would possibly be better as it is a trivial language, however ignore that for now).