-
Notifications
You must be signed in to change notification settings - Fork 241
Closed
Labels
bugBugs and behaviour differing from documentationBugs and behaviour differing from documentationenhancementFeature requests and improvementsFeature requests and improvements
Description
I've been getting a "bytes object is too large" error when processing a large-ish number of documents using the 01_parse.py
script. Creating several smaller doc_bin
objects resolves the issue. Full error:
ahalt@xxxxxxxx:~/sense2vec$ python sense2vec/scripts/01_parse.py hindu_complete.txt docbins en_core_web_sm -n 10
ℹ Using spaCy model en_core_web_sm
Preprocessing text...
Docs: 267103 [1:00:38, 73.42/s]
✔ Processed 267103 docs
Traceback (most recent call last):
File "sense2vec/scripts/01_parse.py", line 47, in <module>
plac.call(main)
File "/home/ahalt/anaconda3/lib/python3.6/site-packages/plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "/home/ahalt/anaconda3/lib/python3.6/site-packages/plac_core.py", line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "sense2vec/scripts/01_parse.py", line 39, in main
doc_bin_bytes = doc_bin.to_bytes()
File "/home/ahalt/anaconda3/lib/python3.6/site-packages/spacy/tokens/_serialize.py", line 151, in to_bytes
return zlib.compress(srsly.msgpack_dumps(msg))
File "/home/ahalt/anaconda3/lib/python3.6/site-packages/srsly/_msgpack_api.py", line 16, in msgpack_dumps
return msgpack.dumps(data, use_bin_type=True)
File "/home/ahalt/anaconda3/lib/python3.6/site-packages/srsly/msgpack/__init__.py", line 40, in packb
return Packer(**kwargs).pack(o)
File "_packer.pyx", line 285, in srsly.msgpack._packer.Packer.pack
File "_packer.pyx", line 291, in srsly.msgpack._packer.Packer.pack
File "_packer.pyx", line 288, in srsly.msgpack._packer.Packer.pack
File "_packer.pyx", line 235, in srsly.msgpack._packer.Packer._pack
File "_packer.pyx", line 206, in srsly.msgpack._packer.Packer._pack
ValueError: bytes object is too large
Metadata
Metadata
Assignees
Labels
bugBugs and behaviour differing from documentationBugs and behaviour differing from documentationenhancementFeature requests and improvementsFeature requests and improvements