-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Closed as not planned
Labels
Description
软件环境
- paddlepaddle:
- paddlepaddle-gpu: 2.3.0.post111
- paddlenlp: 2.5.0
- paddleocr: 2.6.1.2重复问题
- I have searched the existing issues
错误描述
erine-layout 推理代码,如果ocr结果为空,会出错:
File "infer.py", line 70, in <module>
main()
File "infer.py", line 62, in main
outputs = predictor.predict(docs)
File "/Users/xx/yy/PaddleNLP/model_zoo/ernie-layout/deploy/python/predictor.py", line 761, in predict
example = ppocr2example(ocr_result, doc)
File "/Users/xx/miniconda3/envs/my_env/lib/python3.8/site-packages/paddlenlp/utils/image_utils.py", line 698, in ppocr2example
im_w_box = max([seg["bbox"].left + seg["bbox"].width for seg in segments]) + 20
ValueError: max() arg is an empty sequence出错代码:
https://github.com/PaddlePaddle/PaddleNLP/blob/develop/model_zoo/ernie-layout/deploy/python/predictor.py#LL761C53-L761C53
ppocr2example 接受空ocr_result输入报错
def predict(self, docs):
input_data = []
for doc in docs:
ocr_result = self.ocr.ocr(doc, cls=True)
# Compatible with paddleocr>=2.6.0.2
ocr_result = ocr_result[0] if len(ocr_result) == 1 else ocr_result
example = ppocr2example(ocr_result, doc)
input_data.append(example)
inputs = collections.defaultdict(list)
for data in input_data:
for k in data.keys():
inputs[k].append(data[k])
preprocess_result = self.preprocess(inputs)
### 稳定复现步骤 & 代码
1. 输入: image中没有文字(或者文字模糊,导致ocr识别不出结果)
2. 执行:
官方示例代码:
https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/ernie-layout/deploy/python
```python
python infer.py \
--model_path_prefix ../../cls_export/inference \
--lang "en" \
--task_type cls \
--batch_size 8
解决方案:
- 对于
ocr_result为空的不做ppocr2example处理, 作为非合法输入, 相应predict结果返回空 - 有
ocr_result的给到ppocr2example正常处理 - 维护batch 输入中输入image -> predict result的映射关系, 返回1/2两种的结果