-
Notifications
You must be signed in to change notification settings - Fork 8
Description
It is valid for a JSON text to represent only a single scalar value, rather than an object or array - this is supported by Python's json module:
>>> import json
>>> json.loads('true')
True
>>> json.loads('false')
False
>>> json.loads('"an example"')
'an example'However, a stream containing such texts will not be split correctly by splitstream. The keywords true, false, and null are silently dropped, as are numeric literals:
>>> import io; from splitstream import splitfile
>>> split_buf = lambda data: list(splitfile(io.BytesIO(data), format='json'))
>>> split_buf(b'true false null [5] null true false true false {"a": 6}')
[b'[5]', b'{"a": 6}']
>>> split_buf(b'4 5 6 7 []')
[b'[]']Attempting to insert a string literal will cause different, still incorrect behaviour. If there are no objects or arrays in the stream, the text is still silently dropped; however, if there is an object or array occurring somewhere after the string, the entire stream up to that object or array will be captured as one buffer.
>>> split_buf(b'"abc" 56 "def"')
[]
>>> split_buf(b'"abc" 56 "def" {} 3 4')
[b'"abc" 56 "def" {}']
>>> split_buf(b'"abc" 56 "def" {} 3 4 "5" 6 7 []')
[b'"abc" 56 "def" {}', b' 3 4 "5" 6 7 []']Attempting to parse these buffers with json.loads, naturally, does not work.
The correct behaviour would be to split the stream on every toplevel JSON value, producing separate buffers for each - in other words:
>>> fixed_split_buf(b'true false null 1 "hello world" ["goodbye", "world"] {"a": 12, "b": [null]}')
[b'true', b'false', b'null', b'1', b'"hello world"', b'["goodbye", "world"]', b'{"a": 12, "b": [null]}']