You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update grammar format versioning rules et al. (#471)
* Refine grammar format versioning rules; a too new minor version now makes the grammar unparsable.
* Clean-up grammar format spec.
* Implement new versioning rules.
* Add tests for `Grammar.IsUnparsable` and `HasUnknownData`.
* Update release notes.
* Do not emit group end symbols in the main DFA.
This de facto makes the custom DFA group start states mandatory for parsers to understand. The format spec already requires it.
Oriskany
* Add specification for DFA state machines on bytes.
We won't support them in the library right now, but if they are added in the future, they won't be affected by the `Critical` flag.
Copy file name to clipboardExpand all lines: RELEASE_NOTES.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,6 +4,7 @@ The following changes were made after 7.0.0-preview.1:
4
4
*__Breaking change:__ Removed the `IParserStateBox` interface and related APIs.
5
5
* Added `BuildSemanticProvider` extension method to `IGrammarBuilder<T>`.
6
6
* Added structures to the grammar file format that can make parsing groups two times faster or more. (https://github.com/teo-tsirpanis/Farkle/issues/153)
7
+
* Updated the grammar file format specification to improve its version compatibility rules, and add specification for potential future features.
7
8
* Fixed referencing the Farkle package in F# Interactive.
8
9
* Fixed failures when rendering templates from very large grammars.
9
10
* Fixed a bug where the `IsSingleTokenizerInChain` extension method would return wrong results.
Copy file name to clipboardExpand all lines: designs/7.0/grammar-file-format-spec.md
+38-9Lines changed: 38 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,12 +2,23 @@
2
2
3
3
This document describes the binary format of Farkle 7's grammars. It is heavily inspired by the Common Language Infrastructure metadata format described in [ECMA-335][ecma].
4
4
5
+
The current version of the format is __7.0__.
6
+
5
7
## Ground rules
6
8
7
9
* The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
10
+
* Non-normative text is written in blockquotes.
8
11
* Numbers are stored in little-endian format.
9
12
* The total size of a grammar file MUST NOT exceed 2<sup>31</sup> - 1 bytes.
10
13
14
+
## Implementation requirements
15
+
16
+
Implementations of this specification are divided into three categories:
17
+
18
+
*_Writers_ create grammar files. They MUST implement the subset of the specification that is necessary to create valid grammar files according to the specification and their use case.
19
+
*_Readers_ read grammar files for the purpose of displaying their content to users. They MAY ignore parts of the specification that dictate the behavior of parsers, or data structure specifications that they don't need to display.
20
+
*_Parsers_ are a subset of readers, and read grammar files for the purpose of parsing input according to the grammar's specification, either directly, or by generating code that performs the parsing. They are REQUIRED to implement the entire specification, unless specified otherwise. They MAY forgo implementing parts of the specification that they don't need, as long as they report an error when encountering data that would require such parts to be implemented.
21
+
11
22
## Abstract file structure
12
23
13
24
From a top-level perspective, a grammar file is made of a _header_ and a sequence of _streams_.
@@ -25,10 +36,10 @@ A grammar file starts with the following data:
25
36
26
37
* If the first eight bits of the file are different than the __Magic__ field's expected value, readers MUST NOT read past them and MUST report an error.
27
38
> Note that the magic code ends with two zeroes. It will prevent [GOLD Parser][gold] grammar file readers from continuing reading it.
28
-
* If the value of the __MajorVersion__ field is larger than the expected one or smaller than `7`, readers MUST NOT read past the __MinorVersion__ field and MUST report an error.
29
-
* When and if a file version after 7 gets specified, if the version of the file is smaller than the latest the reader supports, it MAY keep compatibility with the older format, or report an error.
39
+
* If the value of the __MajorVersion__ field is outside the range of the major versions the reader supports, readers MUST NOT read past the __MinorVersion__ field and MUST report an error.
40
+
* If the value of the __MajorVersion__ field is equal to the latest supported major version, and the value of the __MinorVersion__ field is larger than the latest supported major version, the grammar file MUST NOT be used for parsing.
30
41
31
-
A different than expected __MajorVersion__ field indicates that the file cannot be read at all. A different than expected __MinorVersion__ field indicates that the file can be read, but might be incorrectly interpreted.
42
+
> A different than expected __MajorVersion__ field indicates that the file cannot be read at all. A different than expected __MinorVersion__ field indicates that the file can be read, but might be incorrectly interpreted.
32
43
33
44
### Stream Definition
34
45
@@ -208,8 +219,8 @@ The following bit values are defined for the __Flags__ column:
208
219
|Bit|Name|Description|
209
220
|---|----|-----------|
210
221
|0|`EndsOnEndOfInput`|The group can also end when the end of the input is reached, without encountering the token symbol specified in the __End__ column.|
211
-
|1|`AdvanceByCharacter`|When inside this group, the parser should read the input without invoking the regular tokenizer.|
212
-
|2|`KeepEndToken`|When the group ends, the parser should keep the token that ended the group in the input stream.|
222
+
|1|`AdvanceByCharacter`|When inside this group, the parser must read the input without invoking the regular tokenizer.|
223
+
|2|`KeepEndToken`|When the group ends, the parser must keep the token that ended the group in the input stream.|
213
224
214
225
The following rules apply to the _Group_ table:
215
226
@@ -315,7 +326,7 @@ The following rules apply to the _Production_ table:
315
326
* The value of the __FirstMember__ column of the first production MUST be equal to one.
316
327
* If a production and all productions after it does not have any members, its __FirstMember__ column MUST be equal to the number of rows in the _ProductionMember_ table plus one.
317
328
318
-
> Before accessing the members of a production, readers MUST ensure that it actually has members.
329
+
> Before accessing the members of a production, readers MUST ensure that the production actually has members.
319
330
320
331
### _ProductionMember_ table
321
332
@@ -343,15 +354,20 @@ The following values are defined for the __Kind__ column:
343
354
|3|LR(1) state machine.|
344
355
|4|Generalized LR(1) (GLR(1)) state machine.|
345
356
|5|Deterministic Finite Automaton (DFA) group start states on 16-bit character ranges.|
357
+
|6|Deterministic Finite Automaton (DFA) on 8-bit character ranges.|
358
+
|7|Deterministic Finite Automaton (DFA) on 8-bit character ranges with conflicts.|
359
+
|8|Deterministic Finite Automaton (DFA) default transitions on 8-bit character ranges.|
360
+
|9|Deterministic Finite Automaton (DFA) group start states on 8-bit character ranges.|
346
361
|_anything else_|Reserved for future use by the Farkle project.|
347
362
348
363
> Instead of GLR(1) we could have called it "LR(1) state machine with conflicts" for symmetry, but this kind of state machine has an established name. Currently there are no plans to support GLR parsing in the Farkle project.
349
364
350
365
The following rules apply to the _StateMachine_ table:
351
366
352
367
* The __Kind__ column MUST NOT contain duplicate values.
353
-
* If both state machines of __Kind__ 0 and 1, or 3 and 4 exist, they MUST describe the same state machine, with their only difference being in the preferred values in case of conflicts.
354
-
* If a state machine of __Kind__ 2 exists, a state machine of __Kind__ 0 or 1 MUST also exist.
368
+
* If both state machines of __Kind__ 0 and 1, 3 and 4, or 6 and 7 exist, they MUST describe the same state machine, with their only difference being in the preferred values in case of conflicts.
369
+
* If state machines of __Kind__ 2 or 5 exist, a state machine of __Kind__ 0 or 1 MUST also exist.
370
+
* If state machines of __Kind__ 8 or 9 exist, a state machine of __Kind__ 6 or 7 MUST also exist.
355
371
356
372
State machines with no states MUST be treated as if they do not exist.
357
373
@@ -369,7 +385,7 @@ The following rules apply to the _SpecialName_ table:
369
385
* The __Symbol__ column MUST NOT contain duplicate values.
370
386
* The __Name__ column MUST NOT contain duplicate values.
371
387
372
-
> The use case for this table is to help custom code that integrates with parsers such as tokenizers. Since in Farkle symbols can be renamed and many can have the same name within a grammar, the special name provides a stable way to identify them (it sticks to the symbol's original name and duplicate names would cause build failures).
388
+
> The use case for this table is to help custom code that integrates with parsers such as tokenizers. Since many symbols can have the same name within a grammar, the special name provides a guaranteed unique way to identify them.
373
389
374
390
## State machines
375
391
@@ -512,6 +528,19 @@ Readers SHOULD expose an API that indicates whether a grammar file contains data
512
528
* A state machine whose __Kind__ is not known to the reader.
513
529
* A table whose kind is not specified in this specification.
514
530
531
+
### Summary of extensibility mechanisms
532
+
533
+
The following table summarizes the extensibility mechanisms provided by the format, and whether third parties can use them:
534
+
535
+
|I want to|How to achieve it|Available to third parties|
536
+
|-|-|-|
537
+
|Extend the format with data that does not affect parsing behavior and can be safely ignored|Add a custom stream|Yes|
538
+
|Extend the format with data that might affect parsing behavior|Add a custom state machine|Yes|
539
+
|Extend the format with data that might affect parsing behavior|Add a custom table|No|
540
+
|Instruct parsers to refuse parsing if they do not understand your format extensions|Set the `Critical` flag in the _Grammar_ table|Yes|
541
+
|Instruct parsers to refuse parsing if they do not understand your format extensions, while keeping the `Critical` flag available for third parties to use|Increase the __MinorVersion__ field in the file header|No|
542
+
|Substantially change the format in a way incompatible for both readers and parsers|Increase the __MajorVersion__ field in the file header|No|
<value>Η γραμματική δεν μπορεί να χρησιμοποιηθεί για συντακτική ή λεκτική ανάλυση επειδή περιέχει δεδομένα που δεν αναγνωρίζονται από αυτήν την έκδοση του Farkle</value>
0 commit comments