Check it out on 
Source Code of our Paper:
Multi-Type-TD-TSR Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition
TSR for partially bordered tables uses the same erosion algorithm as in bordered tables to detect existing borderes, but without using them to create a grid cell, but to delete the borders from the table image to get an unbordered table. This allows for applying the algorithm for unbordered tables to create the grid-cell image and contours by analogy to the variants discussed above. A key feature of this approach is that it works with both bordered and unbordered tables: it is type-independent.
| IoU | IoU | IoU | IoU | Weighted | |
|---|---|---|---|---|---|
| Team | 0.6 | 0.7 | 0.8 | 0.9 | Average | 
| CascadeTabNet | 0.438 | 0.354 | 0.19 | 0.036 | 0.232 | 
| NLPR-PAL | 0.365 | 0.305 | 0.195 | 0.035 | 0.206 | 
| Multi-Type-TD-TSR | 0.589 | 0.404 | 0.137 | 0.015 | 0.253 | 
The source code is developed under the following library dependencies
- PyTorch = 1.7.0
- Torchvision = 0.8.1
- Cuda = 10.1
- PyYAML = 5.1
The table detection model is based on detectron2 follow this installation guide to setup.
For the image alignment pre-processing step there is one script available:
- deskew.py
To apply the image alignment pre-processing algorithm to all images in one folder, you need to execute:
python3 deskew.py
with the following parameters
- --folderthe input folder including document images
- --outputthe output folder for the deskewed images
For the table structure recognition we offer a simple script for different approaches
- tsr.py
To apply a table structure recognition algorithm to all images in one folder, you need to execute:
python3 tsr.py
with the following parameters
- --folderpath of the input folder including table images
- --typethe table structure recognition type- type in ["borderd", "unbordered", "partially", "partially_color_inv"]
- --img_outputoutput folder path for the processed images
- --xml_outputoutput folder path for the xml files including bounding boxes
To appy the table detection with a followed table structure recogniton
- tdtsr.py
To apply a table structure recognitio algorithm to all images in one folder, you need to execute:
python3 tdtsr.py
with the following parameters
- --folderpath of the input folder including table images
- --typethe table structure recognition type- type in ["borderd", "unbordered", "partially", "partially_color_inv"]
- --tsr_img_outputoutput folder path for the processed table images
- --td_img_outputoutput folder path for the produced table cutouts
- --xml_outputoutput folder path for the xml files for tables and cells including bounding boxes
- --configpath of detectron2 configuration file for table detection
- --yamlpath of detectron2 yaml file for table detection
- --weightspath of detectron2 model weights for table detection
To evaluate the table structure recognition algorithm we provide the following script:
- evaluate.py
to apply the evaluation the table images and their labels in xml-format have to be the same name and should lie in a single folder. The evaluation could be started by:
python3 evaluate.py
with the following parameter
- --datasetdataset folder path containing table images and labels in .xml format
- test dataset for table structure recognition including table images and annotations can be downloaded here
- table detection detectron2 model weights and configuration files can be downloaded here
@misc{fischer2021multitypetdtsr,
    title={Multi-Type-TD-TSR - Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition: from OCR to Structured Table Representations},
    author={Pascal Fischer and Alen Smajic and Alexander Mehler and Giuseppe Abrami},
    year={2021},
    eprint={2105.11021},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}




