Browse > Article
http://dx.doi.org/10.9717/kmms.2022.25.5.695

Structure Recognition Method in Various Table Types for Document Processing Automation  

Lee, Dong-Seok (AI Grand ICT Research Center, Dong-Eui University)
Kwon, Soon-Kak (Dept. of Computer Software Engineering, Dongeui University)
Publication Information
Abstract
In this paper, we propose the method of a table structure recognition in various table types for document processing automation. A table with items surrounded by ruled lines are analyzed by detecting horizontal and vertical lines for recognizing the table structure. In case of a table with items separated by spaces, the table structure are recognized by analyzing the arrangement of row items. After recognizing the table structure, the areas of the table items are input into OCR engine and the character recognition result output to a text file in a structured format such as CSV or JSON. In simulation results, the average accuracy of table item recognition is about 94%.
Keywords
Table structure detection; Document processing automation; Optical character recognition;
Citations & Related Records
연도 인용수 순위
  • Reference
1 P. Forczmanski, A. Smolinski, A. Nowosielski, and K. Malecki, "Segmentation of Scanned Documents Using Deep-learning Approach," Proceeding of International Conference on Computer Recognition Systems, pp. 141-152, 2019.
2 Z. Cheng, P. Bai, Y. Xu, G. Zheng, S. Pu, and S. Zhou, "Focusing Attention: Towards Accurate Text Recognition in Natural Images," Proceeding of IEEE International Conference on Computer Vision, pp. 5076-5084, 2017.
3 A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is All you Need," Proceeding of Neural Information Processing Systems, pp. 5998-6008, 2017.
4 B. Gatos, D. Danatsas, I. Pratikakis and S. J. Perantonis, "Automatic Table Detection in Document Images," Proceeding of International Conference on Advances in Pattern Recognition, pp. 612-621, 2005.
5 T. Kasar, P. Barlas, S. Adam and C. Chatelain, "Learning to Detect Tables in Scanned Document Images Using Line Information," Proceeding of International Conference on Document Analysis and Recognition, pp. 1185-1189, 2013.
6 S. Mandal, S.P. Chowdhury, A.K. Das, and B. Chanda, "A Simple and Effective Table Detection System from Document Images," International Journal of Document Analysis and Recognition, Vol. 8, No. 2, pp. 172-182, 2006.   DOI
7 S.R. Qasim, H. Mahmood, and F. Shafait, "Rethinking Table Recognition using Graph Neural Networks," Proceeding of International Conference on Document Analysis and Recognition, pp. 142-147, 2019.
8 M.D. Ajij, S. Pratihar, D.S. Roy, and T. Hanne, "Robust Detection of Tables in Documents Using Scores from Table Cell Cores," SN Computer Science, Vol. 3, No. 161, pp. 1-19, 2022.   DOI
9 Public Administration Documents for OCR (2020), https://aihub.or.kr/aidata/30724 (accessed May 25, 2022.
10 K.Y. Wong, R.G. Casey, and F.M. Wahl, "Document Analysis System," IBM Journal of Research and Development, Vol. 26, No. 6, pp. 647-656, 1982.   DOI
11 B. Shi, X. Bai, and C. Yao, "An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, No. 11, pp. 2298-2304, 2017.   DOI
12 S.S. Paliwal, V.D.R. Rahul, M. Sharma, and L. Vig, "TableNet: Deep Learning Model for End-to-end Table Detection and Tabular Data Extraction from Scanned Document Images," Proceeding of International Conference on Document Analysis and Recognition, 2019, pp. 128-133, 2019.
13 B. Shi, M. Yang, X. Wang. P. Lyu, C. Yao, and X. Bai, "ASTER: An Attentional Scene Text Recognizer with Flexible Rectification," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, No. 9, pp. 2035-2048, 2019.   DOI
14 T. He, Z. Tian, W. Huang, C. Shen, Y. Qiao and C. Sun, "An End-to-End TextSpotter with Explicit Alignment and Attention," Proceeding of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5020-5029, 2018.
15 H. Feng, Y. Wang, W. Zhou, J. Deng, and H. Li, "DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction," Proceeding of ACM International Conference on Multimedia, pp. 273-281, 2021.
16 D.S. Lee and S.K. Kwon, "Methods of Classification and Character Recognition for Table Items through Deep Learning," Journal of Korea Multimedia Society, Vol. 24, No. 5, pp. 651-658, 2021.   DOI
17 J. Wang and X. Hu, "Gated Recurrent Convolution Neural Network for OCR," Proceeding of International Conference on Neural Information Processing Systems, pp. 334-343, 2017.
18 D. Bahdanau, K. Cho, and Y. Bengio, "Neural Machine Translation by Jointly Learning to Align and Translate," Proceeding of International Conference on Learning Representations, pp. 1-15, 2015.
19 F. Shafait and R. Smith, "Table Detection in Heterogenous Documents," Proceeding of International Workshop on Document Analysis Systems, pp. 65-72, 2010.
20 T.T. Anh, N.I. Seop, and K.S. Hyung, "A Hybrid Method for Table Detection from Document Image," Proceeding of Asian Conference on Pattern Recognition, pp. 131-135, 2015,
21 M. Li, L. Cui, S. Huang, F. Wei, M. Zhou, and Z. Li, "TableBank: A Benchmark Dataset for Table Detection and Recognition," Proceeding of Conference on Language Resources and Evaluation, pp. 1918-1925, 2020.