Machine Printed and Handwritten Text Discrimination in Korean Document Images

  • Received : 2016.07.28
  • Accepted : 2016.09.29
  • Published : 2016.09.30

Abstract

Nowadays, there are a lot of Korean documents, which often need to be identified in one of printed or handwritten text. Early methods for the identification use structural features, which can be simple and easy to apply to text of a specific font, but its performance depends on the font type and characteristics of the text. Recently, the bag-of-words model has been used for the identification, which can be invariant to changes in font size, distortions or modifications to the text. The method based on bag-of-words model includes three steps: word segmentation using connected component grouping, feature extraction, and finally classification using SVM(Support Vector Machine). In this paper, bag-of-words model based method is proposed using SURF(Speeded Up Robust Feature) for the identification of machine printed and handwritten text in Korean documents. The experiment shows that the proposed method outperforms methods based on structural features.

Keywords

References

  1. K. Dholakia, "A Survey on Handwritten Character Recognition Techniques for Various Indian Languages", IJCA, Vol. 115, April 2015.
  2. Y. Zheng, H. Li, D. Doermann, "Machine Printed Text and Handwriting Identification in Noisy Document Image", ICDAR, Sep. 2003.
  3. Y.Zheng, C.Liu, X.Ding, "Single Character Type Identification", in Proc. SPIE Conf. Document Recognition and Retrieval, pp.49-56, 2002.
  4. L.F. Silva, A. Sanchez, "Automatic Discrimination between Printed and Handwritten Text in Documents", ICDAR, 1999.
  5. D.Lowe, "Distinctive Image Features from Scale-Invariant Keypoints", IJCV, 60(2):91-110, 2004. https://doi.org/10.1023/B:VISI.0000029664.99615.94
  6. V.Vidyaharan and SubuSurendran, "Automatic Image Registration using SIFT-NCC", Special Issue of IJCA (0975-8887), pp.29-32, June 2012.
  7. H. Bay, A. Ess, T. Tuytelaars, F. V. Gool, "Speeded-Up Robust Features", JCVIU, Vol. 110, Issue 3, pp.346-359, June 2008,.
  8. L.Juan and O. Gwun, "A Comparison of SIFT, PCA-SIFT and SURF", IJIP, Vol. 3, Issue 4, Oct. 2009, pp.143-152.
  9. N. Otsu, "A threshold selection method from gray-level histogram", IEEE Trans. Syst. Man Cybern., Vol. 9, Issue 1, pp. 62-66, 1979, https://doi.org/10.1109/TSMC.1979.4310076
  10. J. Bernsen, "Dynamic thresholding of gray level images", ICPR, pp. 1251-1255, 1986.
  11. G. Johannsen and J. Bille, "A threshold selection method using information measures", ICPR, pp. 140-143, 1982.
  12. N. J. Kapur, P. K. Sahoo, C. K. A. Wong, "A new method for gray-level picture thresholding using the entropy of the histogram", JCVPIP, Vol. 29, Issue 3, 273-285, 1985.
  13. J. Sauvola and M. Pietikainen, "Adaptive document image binarization", Pattern Recognition, Vol. 33, Issue 2, pp. 225-236, 2000. https://doi.org/10.1016/S0031-3203(99)00055-2
  14. W.Niblack, "An introduction to digital image processing", pp. 115-116, Prentice Hall, Eaglewood Cliffs, 1986.
  15. J. Kittler and J. Illingworth, "Minimum error thresholding", Pattern Recognition, Vol. 19, Issue 1, pp. 41-47, 1986. https://doi.org/10.1016/0031-3203(86)90030-0
  16. K.Zagoris, I.Pratikakis, A. Antonacopoulos, B. Gatos, N. Napamarkos, "Distinction between Handwritten and Machine-printed Text Based on the Bag of Visual Words Model", Pattern Recognition Journal, ISSN 0031-3203, Vol.47, Issue 3, March 2014.