Skip to content

Conversation

@dmarcotte
Copy link
Contributor

A collection of parser improvements representing a great leap in the maturity and robustness of the Kson parser and summing up to an important milestone:

We are now an RFC8259-compliant Json parser according to JSONTestSuite (modulo the cases noted in JsonTestSuiteEditList that we accept as superset of Json)

See individual commits for more detail these changes

dmarcotte added 12 commits July 16, 2024 11:14
Having the tests output the Json source under test when something fails
makes these tests easier to work with.
Tackle a large group of untested `SKIP_NEEDS_INVESTIGATION` JsonSuite
tests by verifying they can indeed be set to our tested state of
`ACCEPT_N_FOR_SUPERSET`, enhancing coverage
Implement our rules for control characters in strings.  These are based
closely on Json's rule of disallowing all control characters (i.e. they
may only be included in strings in escaped form).  See
https://www.rfc-editor.org/rfc/rfc8259.html#section-7 for more details.

Our rules are similar, except we allow whitespace control characters to
be embedded in our strings.

Note that this also fixed a bug in the Lexer: we had hijacked the null
byte to mean EOF, but that made parsing strings that actually _contain_
the null byte parse incorrectly.
Inspired by b0c1f05, stop hijacking the null byte to mean EOF in
our NumberParser.

Note: because this was private to the scanner in this case and not
actually consulted to check for EOF, there wasn't a true bug in this...
yet.  It's still a bad idea to overload the meaning of the
null byte, so clean this up while we're thinking of it.

Also `getCurrentChar` to `peek` while we're in there.
Add a configurable guard against excessive nesting in Kson to gracefully
handle even the most egregious attempts to blow out our stack
This test is enabled as with ACCEPT_FOR_KSON since we formalize
in b0c1f05 that we accept unescaped whitespace control characters
Ensure that trying to parse an empty Kson file produces a
helpful/appropriate error.

Also ensure that the empty file error doesn't bubble up to our editor
where it makes no sense to complain about an empty file while someone
is editing it — it's only a problem once they try to use it.
Tighten up parsing to handle encountering a `}` or `]` when no object or
list has been opened.
We behave well on these, not erroring, so they no longer need to be
skipped
We now parse and validate string escapes very closely to the rules
outlined in [RFC8259](https://www.rfc-editor.org/rfc/rfc8259.html#section-7)

This gets our Json compatibility (as measured by [JSONTestSuite](https://github.com/nst/JSONTestSuite)
to near completion.

Site note: as of this commit, we no longer process any of these escapes
as part of parsing.  We simply validate and store in the resulting AST,
ready to process esapes if/when needed in some future AST transform.
This is mostly a formality as the [JSONTestSuite](https://github.com/nst/JSONTestSuite)
tests did not change between these commits, but at least now it's clear
at a glance how recently we verified we had all the latest tests
We now properly detect (and report errors for) unexpected/illegal
characters encountered during parsing.  This change means two excellent
things:

- our parsing should now be very robust in the face of nonsense input,
parsing all tokens generated and not crashing on unrecognized chars
- we are an [RFC8259-compliant](https://www.rfc-editor.org/rfc/rfc8259.html)
Json parser according to [JSONTestSuite](https://github.com/nst/JSONTestSuite)
(modulo the cases noted in `JsonTestSuiteEditList` that we accept as
superset of Json)
@dmarcotte dmarcotte merged commit e943583 into kson-org:main Aug 1, 2024
@dmarcotte dmarcotte deleted the json-compat-work branch August 1, 2024 00:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant