Skip to content

Conversation

@tlively
Copy link
Member

@tlively tlively commented Feb 21, 2024

Catch and report all kinds of WTF-8 encoding errors in the source strings,
including invalid leading bytes, invalid trailing bytes, unexpected ends of
strings, and invalid surrogate sequences. Insert replacement characters into the
output as necessary. Add a TODO about minimizing size by escaping only those
code points mandated to be escaped by the JSON spec. Generally improve
readability of the code.

Catch and report all kinds of WTF-8 encoding errors in the source strings,
including invalid leading bytes, invalid trailing bytes, unexpected ends of
strings, and invalid surrogate sequences. Insert replacement characters into the
output as necessary. Add a TODO about minimizing size by escaping only those
code points mandated to be escaped by the JSON spec. Generally improve
readability of the code.
@tlively tlively requested a review from kripken February 21, 2024 03:26
@tlively
Copy link
Member Author

tlively commented Feb 21, 2024

Current dependencies on/for this PR:

This stack of pull requests is managed by Graphite.

os << "\\b";
continue;
case '\f':
os << "\\f";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are \b and \f necessary here? (What spec would I read that in?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(if you add them to the test without this PR, does the test fail?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See page 4 of the JSON spec: https://ecma-international.org/wp-content/uploads/ECMA-404_2nd_edition_december_2017.pdf. We could also use \uXXXX escapes, which is what the previous code would have done, but these are shorter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks.

Maybe worth adding those to the test?

// Print.cpp would consider the contents unprintable, messing up our test.
bool isNaivelyPrintable = 32 <= u && u < 127;
if (isNaivelyPrintable) {
assert(u < 0x80 && "need additional logic to emit valid UTF-8");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only get here if u < 127, so the only difference with u < 0x80 = 128 is when u == 128? If so perhaps check that directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is meant to guard against future improvement where we would start emitting most code points directly as UTF-8 instead of escaping them. After the initial refactoring to allow most code points to hit this code path by default, this assertion will trigger if we don't add extra logic for code points 0x80 and greater. The assertion would be less robust in the context of that future change if it checked against 128 specifically.

@tlively
Copy link
Member Author

tlively commented Feb 21, 2024

I added tests for all the error cases. Will land, but PTAL if you're interested.

@tlively tlively enabled auto-merge (squash) February 21, 2024 19:54
@tlively tlively merged commit 39ae6cf into main Feb 21, 2024
@tlively tlively deleted the improve-json-encode branch February 21, 2024 20:10
radekdoulik pushed a commit to dotnet/binaryen that referenced this pull request Jul 12, 2024
Catch and report all kinds of WTF-8 encoding errors in the source strings,
including invalid leading bytes, invalid trailing bytes, unexpected ends of
strings, and invalid surrogate sequences. Insert replacement characters into the
output as necessary. Add a TODO about minimizing size by escaping only those
code points mandated to be escaped by the JSON spec. Generally improve
readability of the code.
@gkdn gkdn mentioned this pull request Aug 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants