Browse > Article
http://dx.doi.org/10.5392/IJoC.2018.14.1.001

Deep-Learning Approach for Text Detection Using Fully Convolutional Networks  

Tung, Trieu Son (Dept. of ECE, Chonnam National University)
Lee, Gueesang (Dept. of ECE, Chonnam National University)
Publication Information
Abstract
Text, as one of the most influential inventions of humanity, has played an important role in human life since ancient times. The rich and precise information embodied in text is very useful in a wide range of vision-based applications such as the text data extracted from images that can provide information for automatic annotation, indexing, language translation, and the assistance systems for impaired persons. Therefore, natural-scene text detection with active research topics regarding computer vision and document analysis is very important. Previous methods have poor performances due to numerous false-positive and true-negative regions. In this paper, a fully-convolutional-network (FCN)-based method that uses supervised architecture is used to localize textual regions. The model was trained directly using images wherein pixel values were used as inputs and binary ground truth was used as label. The method was evaluated using ICDAR-2013 dataset and proved to be comparable to other feature-based methods. It could expedite research on text detection using deep-learning based approach in the future.
Keywords
Text Detection; FCN; Deep Learning; Nature Scene Image;
Citations & Related Records
연도 인용수 순위
  • Reference
1 C. Yao, X. Zhang, X. Bai, W. Liu, Y. Ma, and Z. Tu, "Rotation-invariant features for multi-oriented text detection in natural images," PLoS One, vol. 8, no. 8, 2013.
2 C. Yao, X. Bai, B. Shi, and W. Liu. Strokelets, "A learned multi-scale representation for scene text recognition," CVPR, 2014.
3 ICDAR 2013 robust reading competition, http://dag.cvc.uab.es/icdar2013competition, 2014.
4 X. Chen and A. Yuille, "Detecting and reading text in natural scenes," CVPR, 2004.
5 Y. Zhong, K. Karu, and A. K. Jain, "Locating text in complex color images," Pattern Recognition, vol. 28, no. 10, 1995, pp. 1523-1535.   DOI
6 K. I. Kim, K. Jung, and J. H. Kim, "Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm," IEEE Trans. PAMI, vol. 25, no. 12, 2003, pp.1631-1639.   DOI
7 J. Gllavata, R. Ewerth, and B. Freisleben, "Text detection in images based on unsupervised classification of high-frequency wavelet coefficients," ICPR, 2004.
8 B. Leibe and B. Schiele, "Scale-invariant object categorization using a scale-adaptive mean-shift search," Pattern Recognition, 2004, pp. 145-153.
9 M. R. Lyu, J. Song, and M. Cai, "A comprehensive method for multilingual video text detection, localization, and extraction," IEEE Trans. CSVT, vol. 15, no. 2, 2005, pp. 243-255.
10 Y. Zhong, H. Zhang, and A. K. Jain, "Automatic caption localization in compressed video," IEEE Trans. PAMI, vol. 22, no. 4, 2000, pp. 385-392.   DOI
11 P. Viola and M. Jones, "Fast and robust classification using asymmetric adaboost and a detector cascade," Proc. of NIPS, 2001.
12 V. Wu, R. Manmatha, and E. M. Riseman, "Finding text in images," ACM Int. Conf. Digital Libraries, 1997.
13 C. Wolf and J. M. Jolion, "Extraction and recognition of artificial text in multimedia documents," Formal Pattern Analysis and Applications, vol. 6, no. 4, 2004, pp. 309-326.   DOI
14 K. Wang and S. Belongie, "Word spotting in the wild," ECCV, 2010.
15 B. Epshtein, E. Ofek, and Y. Wexler, "Detecting text in natural scenes with stroke width transform," CVPR, 2010.
16 C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Tu, "Detecting texts of arbitrary orientations in natural images," CVPR, 2012.
17 C. Yi and Y. Tian, "Text string detection from natural scenes by structure-based partition and grouping," IEEE Trans. Image Processing, vol. 20, no. 9, 2011, pp. 2594-2605.   DOI
18 W. Huang, Z. Lin, J. Yang, and J. Wang, "Text localization in natural images using stroke feature transform and text covariance descriptors," ICCV, 2013.
19 A. Jain and B. Yu, "Automatic text location in images and video frames," Pattern Recognition, vol. 31, no. 12, 1998, pp. 2055-2076.   DOI
20 L. Neumann and J. Matas, "A method for text localization and recognition in real-world images," ACCV, 2010.
21 J. Wright, A. Y. Yang, and A. Ganesh, "Robust face recognition via sparse representation," IEEE Trans. PAMI, vol. 31, no. 2, 2009, pp. 210-227.   DOI
22 Y. Pan, X. Hou, and C. Liu, "A hybrid approach to detect and localize texts in natural scene images," IEEE Trans. Image Processing, vol. 20, no. 3, 2011, pp. 800-813.   DOI
23 M. Elad and M. Aharon, "Image denoising via sparse and redundant representations over learned dictionaries," IEEE Trans. Image Processing, vol. 15, no. 12, 2006, pp. 3736-3745.   DOI
24 M. Zhao, S. Li, and J. Kwok, "Text detection in images using sparse representation with discriminative dictionaries," Image and Vision Computing, vol. 28, no. 12, 2010, pp. 1590-1599.   DOI
25 P. Shivakumara, T. Q. Phan, and C. L. Tan, "A laplacian approach to multi-oriented text detection in video," IEEE Trans. PAMI, vol. 33, no. 2, 2011, pp. 412-419.   DOI
26 W. Huang, Y. Qiao, and X. Tang, "Robust scene text detection with convolution neural network induced mser trees," ECCV, 2014.
27 Z. Zhang, C. Zhang, W. Shen, C. Yao, W. Liu, and X. Bai, "Multi-oriented text detection with fully convolutional networks," CVPR, 2016.
28 S. Xie and Z. Tu, "Holistically-Nested Edge Detection," ICCV, 2015.
29 K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," ICLR, 2015.
30 X. C. Yin, W. Y. Pei, J. Zhang, and H. W. Hao, "Multi-orientation scene text detection with adaptive clustering," IEEE Trans. on PAMI, vol. 37, no. 9, Jan. 2015, pp. 1930-1937.   DOI
31 A. Zamberletti, L. Noce, and I. Gallo, "Text localization based on fast feature pyramids and multi-resolution maximally stable extremal regions," ACCV workshop, 2014.
32 Z. Zhang, W, Shen, C. Yao, and X. Bai, "Symmetrybased text line detection in natural scenes," CVPR, 2015.
33 J. Long, E. Shelhamer, and T. Darrell, "Fully convolutional networks for semantic segmentation," CVPR, 2015.