Browse > Article
http://dx.doi.org/10.22937/IJCSNS.2022.22.11.31

Intelligent Character Recognition System for Account Payable by using SVM and RBF Kernel  

Farooq, Muhammad Umer (Department of Computer Science and Information Technology, NED University of Engineering and Technology)
Kazi, Abdul Karim (Department of Computer Science and Information Technology, NED University of Engineering and Technology)
Latif, Mustafa (Department of Software Engineering, NED University of Engineering and Technology)
Alauddin, Shoaib (Department of Computer Science and Information Technology, NED University of Engineering and Technology)
Kisa-e-Zehra, Kisa-e-Zehra (Department of Computer Science and Information Technology, NED University of Engineering and Technology)
Baig, Mirza Adnan (Department of Computer Science and Information Technology, NED University of Engineering and Technology)
Publication Information
International Journal of Computer Science & Network Security / v.22, no.11, 2022 , pp. 213-221 More about this Journal
Abstract
Intelligent Character Recognition System for Account Payable (ICRS AP) Automation represents the process of capturing text from scanned invoices and extracting the key fields from invoices and storing the captured fields into properly structured document format. ICRS plays a very critical role in invoice data streamlining, we are interested in data like Vendor Name, Purchase Order Number, Due Date, Total Amount, Payee Name, etc. As companies attempt to cut costs and upgrade their processes, accounts payable (A/P) is an example of a paper-intensive procedure. Invoice processing is a possible candidate for digitization. Most of the companies dealing with an enormous number of invoices, these manual invoice matching procedures start to show their limitations. Receiving a paper invoice and matching it to a purchase order (PO) and general ledger (GL) code can be difficult for businesses. Lack of automation leads to more serious company issues such as accruals for financial close, excessive labor costs, and a lack of insight into corporate expenditures. The proposed system offers tighter control on their invoice processing to make a better and more appropriate decision. AP automation solutions provide tighter controls, quicker clearances, smart payments, and real-time access to transactional data, allowing financial managers to make better and wiser decisions for the bottom line of their organizations. An Intelligent Character Recognition System for AP Automation is a process of extricating fields like Vendor Name, Purchase Order Number, Due Date, Total Amount, Payee Name, etc. based on their x-axis and y-axis position coordinates.
Keywords
Account Payable Automation; Intelligent Character Recognition; Invoice processing; Smart payments;
Citations & Related Records
Times Cited By KSCI : 5  (Citation Analysis)
연도 인용수 순위
1 Joshi S, Shah P, Pandey AK. Location identification, extraction and disambiguation using machine learning in legal contracts. In2018 4th International Conference on Computing Communication and Automation (ICCCA) 2018 Dec 14 (pp. 1-5). IEEE.
2 Li Y, Liu T, Li D, Li Q, Shi J, Wang Y. Character-based bilstm-crf incorporating pos and dictionaries for chinese opinion target extraction. InAsian Conference on Machine Learning 2018 Nov 4 (pp. 518-533). PMLR.
3 Baviskar D, Ahirrao S, Kotecha K. A bibliometric survey on cognitive document processing. Library Philosophy and Practice. 2020 Oct 1:1-31.
4 Baviskar D, Ahirrao S, Kotecha K. Multi-Layout Invoice Document Dataset (MIDD): A Dataset for Named Entity Recognition. Data. 2021 Jul 20;6(7):78.   DOI
5 Abbas A, Afzal M, Hussain J, Lee S. Meaningful information extraction from unstructured clinical documents. Proc. Asia Pac. Adv. Netw. 2019 Oct;48:42-7.
6 Eberendu AC. Unstructured Data: an overview of the data of Big Data. International Journal of Computer Trends and Technology. 2016 Aug;38(1):46-50.   DOI
7 Smith R. An overview of the Tesseract OCR engine. InNinth international conference on document analysis and recognition (ICDAR 2007) 2007 Sep 23 (Vol. 2, pp. 629-633). IEEE.
8 Katti AR, Reisswig C, Guder C, Brarda S, Bickel S, Hohne J, Faddoul JB. Chargrid: Towards understanding 2d documents. arXiv preprint arXiv:1809.08799. 2018 Sep 24.
9 Schaeffer MS. Essentials of accounts payable. John Wiley & Sons; 2002 Oct 15.
10 Adnan K, Akbar R. Limitations of information extraction methods and techniques for heterogeneous unstructured big data. International Journal of Engineering Business Management. 2019 Dec 9;11:1847979019890771.
11 Adnan K, Akbar R. An analytical study of information extraction from unstructured and multidimensional big data. Journal of Big Data. 2019 Dec;6(1):1-38.   DOI
12 Palm RB, Laws F, Winther O. Attend, copy, parse end-toend information extraction from documents. In2019 International Conference on Document Analysis and Recognition (ICDAR) 2019 Sep 20 (pp. 329-336). IEEE.
13 Patel S, Bhatt D. Abstractive information extraction from scanned invoices (AIESI) using end-to-end sequential approach. arXiv preprint arXiv:2009.05728. 2020 Sep 12.
14 Zhao X, Niu E, Wu Z, Wang X. Cutie: Learning to understand documents with convolutional universal text information extractor. arXiv preprint arXiv:1903.12363. 2019 Mar 29.
15 Liu W, Zhang Y, Wan B. Unstructured document recognition on business invoice. Mach. Learn., Stanford iTunes Univ., Stanford, CA, USA, Tech. Rep. 2016.
16 Reul C, Christ D, Hartelt A, Balbach N, Wehner M, Springmann U, Wick C, Grundig C, Buttner A, Puppe F. OCR4all-An open-source tool providing a (semi-) automatic OCR workflow for historical printings. Applied Sciences. 2019 Nov 13;9(22):4853.   DOI
17 Steinkamp JM, Bala W, Sharma A, Kantrowitz JJ. Task definition, annotated dataset, and supervised natural language processing models for symptom extraction from unstructured clinical notes. Journal of biomedical informatics. 2020 Feb 1;102:103354.
18 Tkaczyk D, Szostek P, Bolikowski L. GROTOAP2-the methodology of creating a large ground truth dataset of scientific articles. D-Lib Magazine. 2014 Nov;20(11/12).
19 Yang J, Liu Y, Qian M, Guan C, Yuan X. Information extraction from electronic medical records using multitask recurrent neural network with contextual word embedding. Applied Sciences. 2019 Sep 4;9(18):3658.   DOI
20 Davis B, Morse B, Cohen S, Price B, Tensmeyer C. Deep visual template-free form parsing. In2019 International Conference on Document Analysis and Recognition (ICDAR) 2019 Sep 20 (pp. 134-141). IEEE.
21 Majumder BP, Potti N, Tata S, Wendt JB, Zhao Q, Najork M. Representation learning for information extraction from form-like documents. Inproceedings of the 58th annual meeting of the Association for Computational Linguistics 2020 Jul (pp. 6495-6504).
22 Palm RB, Winther O, Laws F. Cloudscan-a configurationfree invoice analysis system using recurrent neural networks. In2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) 2017 Nov 9 (Vol. 1, pp. 406-413). IEEE.
23 Krieger F, Drews P, Funk B, Wobbe T. Information extraction from invoices: A graph neural network approach for datasets with high layout variety. InInternational Conference on Wirtschaftsinformatik 2021 Mar 9 (pp. 5-20). Springer, Cham.
24 Smith R. Tesseract ocr engine. Lecture. Google Code. Google Inc. 2007 Jul.
25 Si Y, Wang J, Xu H, Roberts K. Enhancing clinical concept extraction with contextual embeddings. Journal of the American Medical Informatics Association. 2019 Nov 1;26(11):1297-304.   DOI
26 Wang B, Wang A, Chen F, Wang Y, Kuo CC. Evaluating word embedding models: methods and experimental results. APSIPA transactions on signal and information processing. 2019;8.
27 Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C. Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360. 2016 Mar 4.
28 Shah P, Joshi S, Pandey AK. Legal clause extraction from contract using machine learning with heuristics improvement. In2018 4th International Conference on Computing Communication and Automation (ICCCA) 2018 Dec 14 (pp. 1-3). IEEE.
29 Integromat. 8 Easy Ways to Automate your Invoices (and Save Hours of Your Time) [Internet]. Integromat Blog. [cited 2022 Jun 28]. Available from: https://www.integromat.com/en/blog/invoice-automation
30 Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991. 2015 Aug 9.