JSON scalar texts in the stream are not captured properly

It is valid for a JSON text to represent only a single scalar value, rather than an object or array - this is supported by Python's `json` module:

``` python
>>> import json
>>> json.loads('true')
True
>>> json.loads('false')
False
>>> json.loads('"an example"')
'an example'
```

However, a stream containing such texts will not be split correctly by `splitstream`. The keywords `true`, `false`, and `null` are silently dropped, as are numeric literals:

``` python
>>> import io; from splitstream import splitfile
>>> split_buf = lambda data: list(splitfile(io.BytesIO(data), format='json'))
>>> split_buf(b'true false null [5] null true false true false {"a": 6}')
[b'[5]', b'{"a": 6}']
>>> split_buf(b'4 5 6 7 []')
[b'[]']
```

Attempting to insert a string literal will cause different, still incorrect behaviour. If there are no objects or arrays in the stream, the text is still silently dropped; however, if there is an object or array occurring somewhere after the string, the entire stream up to that object or array will be captured as one buffer.

``` python
>>> split_buf(b'"abc" 56 "def"')
[]
>>> split_buf(b'"abc" 56 "def" {} 3 4')
[b'"abc" 56 "def" {}']
>>> split_buf(b'"abc" 56 "def" {} 3 4 "5" 6 7 []')
[b'"abc" 56 "def" {}', b' 3 4 "5" 6 7 []']
```

Attempting to parse these buffers with `json.loads`, naturally, does not work.

The correct behaviour would be to split the stream on _every_ toplevel JSON value, producing separate buffers for each - in other words:

``` python
>>> fixed_split_buf(b'true false null 1 "hello world" ["goodbye", "world"] {"a": 12, "b": [null]}')
[b'true', b'false', b'null', b'1', b'"hello world"', b'["goodbye", "world"]', b'{"a": 12, "b": [null]}']
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

JSON scalar texts in the stream are not captured properly #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

JSON scalar texts in the stream are not captured properly #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions