DOI QR코드

DOI QR Code

Handwritten Hangul Graphemes Classification Using Three Artificial Neural Networks

  • Aaron Daniel Snowberger (Department of Information and Communication Engineering, Hanbat National University) ;
  • Choong Ho Lee (Department of Information and Communication Engineering, Hanbat National University)
  • Received : 2023.02.23
  • Accepted : 2023.05.09
  • Published : 2023.06.30

Abstract

Hangul is unique compared to other Asian languages because of its simple letter forms that combine to create syllabic shapes. There are 24 basic letters that can be combined to form 27 additional complex letters. This produces 51 graphemes. Hangul optical character recognition has been a research topic for some time; however, handwritten Hangul recognition continues to be challenging owing to the various writing styles, slants, and cursive-like nature of the handwriting. In this study, a dataset containing thousands of samples of 51 Hangul graphemes was gathered from 110 freshmen university students to create a robust dataset with high variance for training an artificial neural network. The collected dataset included 2200 samples for each consonant grapheme and 1100 samples for each vowel grapheme. The dataset was normalized to the MNIST digits dataset, trained in three neural networks, and the obtained results were compared.

Keywords

References

  1. P. D. Moral, S. Nowaczyk, and S. Pashami, "Why is multiclass classification hard?," IEEE Access, vol. 10, pp. 80448-80462, 2022. DOI: 10.1109/ACCESS.2022.3192514.
  2. D. H. Kim, Y. S. Hwang, S. T. Park, E. J. Kim, S. H. Paek, and S. Y. Bang, "Handwritten Korean character image database PE92," Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93), Tsukuba, Japan, pp. 470-473, 1993. DOI: 10.1109/ICDAR.1993.395693.
  3. I. J. Kim, "HangulDB," GitHub repository, 2019, [Online] Available: https://github.com/callee2006/HangulDB.
  4. D. S. Ham, D. R. Lee, I. S. Jung, and I. S. Oh, "Construction of printed hangul character database PHD08," The Journal of the Korea Contents Association, vol. 8, no. 11, pp. 33-40, Nov. 2008. DOI:10.5392/jkca.2008.8.11.033.
  5. E. J. Kim and Y. Lee, "Handwritten hangul recognition using a modified neocognitron," Neural Networks, vol. 4, no. 6, pp. 743-750, Jan. 1991. DOI: 10.1016/0893-6080(91)90054-9.
  6. P. K. Kim, J. K. Lee, and H. J. Kim, "Handwritten Korean character recognition by stroke extraction and representation," in TENCON '93. IEEE Region 10 International Conference on Computers, Communications and Automation, Beijing, China, pp. 1098-1101, 1993. DOI: 10.1109/TENCON.1993.320070
  7. P. K. Kim and H. J. Kim, "Off-Line handwritten Korean character recognition based on stroke extraction and representation," Pattern Recognition Letters, vol. 15, no. 12, pp. 1245-1253, Dec. 1994. DOI:10.1016/0167-8655(94)90115-5.
  8. W.S. Kim and R. H. Park, "Off-line recognition of handwritten Korean and alphanumeric characters using hidden markov models," Pattern Recognition, vol. 29, no. 5, pp. 845-858, May. 1996. DOI:10.1016/0031-3203(95)00124-7.
  9. I. J. Kim and X. Xie, "Handwritten Hangul recognition using deep convolutional neural networks," International Journal on Document Analysis and Recognition, vol. 18, no. 1, pp 1-13, Mar. 2015. DOI:10.1007/s10032-014-0229-4.
  10. S. W. Park, "A study on the OCR of Korean sentence using deep learning," in Annual Conference on Human and Language Technology, Daejeon, South Korea, pp. 470-474, 2019.
  11. Korea Industrial Standards Association, "Code for Information Interchange (Hangul and Hanja)," Korean Industrial Standard, 1987, Ref. No. KS C 5601-1987.
  12. V. Dziubliuk, M. Zlotnyk, and O. Viatchaninov, "Sequence learning model for syllables recognition arranged in two dimensions," in Document Analysis and Recognition - ICDAR 2021, vol. 12823, pp. 100-111, DOI: 10.1007/978-3-030-86334-0_7.
  13. G. U. Kim, Son J. M., K. H. Lee, and J. S. Min, "Character decomposition to resolve class imbalance problem in Hangul OCR," arXiv preprint arXiv: 2208.06079, Aug. 2022. DOI: 10.48550/arXiv.2208.06079.
  14. A. R. Sreekiran, Checkbox/Table Cell Detection Using OpenCVPython, 22 Nov. 2022, [Online] Available: https://towardsdatascience. com/checkbox-table-cell-detection-using-opencv-python-332c57d25171.
  15. Y. LeCun, C. Cortes, MNIST handwritten digit database, ATT Labs, 2010, [Online] Available: http://yann.lecun.com/exdb/mnist/.
  16. "MNIST digits classification dataset", Keras API, 2022, [Online] Available: https://keras.io/api/datasets/mnist/.
  17. R. Atienza, Advanced Deep Learning with TensorFlow 2 and Keras, 2nd ed., Packt Publishing Ltd., Birmingham, Feb. 2020, [Online] Available: https://www.packtpub.com/book/programming/9781838821654/.