You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add a strict parsing mode to the JsonReader and improve its javadoc
By default, the JsonReader accepts unescaped control characters that
are embedded in a string or name (which is technically just a string).
According to RFC 8259, RFC 7159, and RFC 4627, this is forbidden.
From RFC 8259 Section 7 "Strings":
"All Unicode characters may be placed within the
quotation marks, except for the characters that MUST be escaped:
quotation mark, reverse solidus, and the control characters (U+0000
through U+001F)."
Accepting unescaped control characters can at least cause some confusion
as the following naive program demonstrates:
marcus@linux:~> cat NaiveJsonProcessor.java
import java.io.IOException;
import java.io.StringReader;
import com.google.gson.stream.JsonReader;
public class NaiveJsonProcessor {
public static void parseAndLog(String jsonInput) throws IOException {
JsonReader reader = new JsonReader(new StringReader(jsonInput));
String parsed = reader.nextString();
if (parsed.equals("foo")) {
throw new IllegalStateException("foo is forbidden");
}
/*
* According to the JsonReader's documentation "[...] this parser
* is strict and only accepts JSON as specified by RFC 4627" (see
* documentation of setLenient). Hence, we can safely log the
* raw jsonInput to stdout because it contains no unescaped control
* characters, which could be interpreted by a terminal.
* Oops... wrong assumption:)
*/
System.out.println("Processed: " + jsonInput);
}
public static void main(String[] args) {
String jsonInput = "\"foobar\u001b[3D\u001b[K\"";
try {
// the log entry might confuse the user...
parseAndLog(jsonInput);
} catch (IOException e) {
e.printStackTrace();
}
}
}
marcus@linux:~> javac -cp /path/to/gson/classes NaiveJsonProcessor.java
marcus@linux:~> java -cp /path/to/gson/classes:. NaiveJsonProcessor
Processed: "foo"
marcus@linux:~>
Since the unescaped control characters of the raw jsonInput are interpreted
by the terminal, it _looks_ as if we processed the JSON text "foo" even
though this string should result in an IllegalStateException (of course in
reality we did _not_ process "foo").
Apart from this, the JsonReader accepts non-lowercase literals (like tRuE,
falSE, NULl). According to the previously mentioned RFCs, this is forbidden.
From RFC 8259 Section 3 "Values":
"[...]or one of
the following three literal names:
false
null
true
The literal names MUST be lowercase."
To cope with this a strict mode is added to the JsonReader. In strict mode,
the JsonReader does not accept unescaped control characters in strings and
names. For this, the JsonReader raises an exception if it encounters an
unescaped control character in nextQuotedValue and skipQuotedValue.
Also, it does not accept non-lowercase literals. For this, peekKeyword
raises an exception if a non-lowercase literal is encountered.
In order to avoid regressions, the strict mode is disabled by default and
the old behavior is retained. In strict mode, the JsonReader behaves exactly
as before (except in case of an unescaped control character or non-lowercase
literal, of course). For the details, see the new JsonReaderStrictTest
testcase.
The javadoc of the JsonReader is updated accordingly. As part of this update,
all references to a JSON RFC are changed to RFC 8259 (that's what the
JsonReader conforms to (in strict mode)).
Signed-off-by: Marcus Huewe <[email protected]>
0 commit comments