Skip to content

[Bug]: erine-layout 推理代码对于无ocr结果的输入出错 #5865

@zirui

Description

@zirui

软件环境

- paddlepaddle:
- paddlepaddle-gpu: 2.3.0.post111
- paddlenlp: 2.5.0
- paddleocr: 2.6.1.2

重复问题

  • I have searched the existing issues

错误描述

erine-layout 推理代码,如果ocr结果为空,会出错:


  File "infer.py", line 70, in <module>
    main()
  File "infer.py", line 62, in main
    outputs = predictor.predict(docs)
  File "/Users/xx/yy/PaddleNLP/model_zoo/ernie-layout/deploy/python/predictor.py", line 761, in predict
    example = ppocr2example(ocr_result, doc)
  File "/Users/xx/miniconda3/envs/my_env/lib/python3.8/site-packages/paddlenlp/utils/image_utils.py", line 698, in ppocr2example
    im_w_box = max([seg["bbox"].left + seg["bbox"].width for seg in segments]) + 20
ValueError: max() arg is an empty sequence

出错代码:

https://github.com/PaddlePaddle/PaddleNLP/blob/develop/model_zoo/ernie-layout/deploy/python/predictor.py#LL761C53-L761C53
ppocr2example 接受空ocr_result输入报错

    def predict(self, docs):
        input_data = []
        for doc in docs:
            ocr_result = self.ocr.ocr(doc, cls=True)
            # Compatible with paddleocr>=2.6.0.2
            ocr_result = ocr_result[0] if len(ocr_result) == 1 else ocr_result
            example = ppocr2example(ocr_result, doc)
            input_data.append(example)

        inputs = collections.defaultdict(list)
        for data in input_data:
            for k in data.keys():
                inputs[k].append(data[k])

        preprocess_result = self.preprocess(inputs)


### 稳定复现步骤 & 代码

1. 输入: image中没有文字(或者文字模糊,导致ocr识别不出结果)
2. 执行: 
官方示例代码:
https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/ernie-layout/deploy/python
```python
python infer.py \
    --model_path_prefix ../../cls_export/inference \
    --lang "en" \
    --task_type cls \
    --batch_size 8

解决方案:

  1. 对于ocr_result为空的不做ppocr2example处理, 作为非合法输入, 相应predict结果返回空
  2. ocr_result的给到ppocr2example正常处理
  3. 维护batch 输入中输入image -> predict result的映射关系, 返回1/2两种的结果

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingstale

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions