Browse > Article
http://dx.doi.org/10.3837/tiis.2021.05.011

DP-LinkNet: A convolutional network for historical document image binarization  

Xiong, Wei (School of Electrical and Electronic Engineering, Hubei University of Technology)
Jia, Xiuhong (School of Electrical and Electronic Engineering, Hubei University of Technology)
Yang, Dichun (School of Electrical and Electronic Engineering, Hubei University of Technology)
Ai, Meihui (School of Electrical and Electronic Engineering, Hubei University of Technology)
Li, Lirong (School of Electrical and Electronic Engineering, Hubei University of Technology)
Wang, Song (Department of Computer Science and Engineering, University of South Carolina)
Publication Information
KSII Transactions on Internet and Information Systems (TIIS) / v.15, no.5, 2021 , pp. 1778-1797 More about this Journal
Abstract
Document image binarization is an important pre-processing step in document analysis and archiving. The state-of-the-art models for document image binarization are variants of encoder-decoder architectures, such as FCN (fully convolutional network) and U-Net. Despite their success, they still suffer from three limitations: (1) reduced feature map resolution due to consecutive strided pooling or convolutions, (2) multiple scales of target objects, and (3) reduced localization accuracy due to the built-in invariance of deep convolutional neural networks (DCNNs). To overcome these three challenges, we propose an improved semantic segmentation model, referred to as DP-LinkNet, which adopts the D-LinkNet architecture as its backbone, with the proposed hybrid dilated convolution (HDC) and spatial pyramid pooling (SPP) modules between the encoder and the decoder. Extensive experiments are conducted on recent document image binarization competition (DIBCO) and handwritten document image binarization competition (H-DIBCO) benchmark datasets. Results show that our proposed DP-LinkNet outperforms other state-of-the-art techniques by a large margin. Our implementation and the pre-trained models are available at https://github.com/beargolden/DP-LinkNet.
Keywords
Degraded document image binarization; semantic segmentation; DP-LinkNet; encoder-decoder architecture; hybrid dilated convolution (HDC); spatial pyramid pooling (SPP);
Citations & Related Records
연도 인용수 순위
  • Reference
1 F. Yu, V. Koltun, "Multi-scale context aggregation by dilated convolutions," in Proc. of the 4th International Conference on Learning Representations (ICLR 2016), San Juan, Puerto Rico, 2016.
2 K. He, X. Zhang, S. Ren, J. Sun, "Spatial pyramid pooling in deep convolutional networks for visual recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1904-1916, 2015.   DOI
3 S. Eskenazi, P. Gomez-Kramer, J.-M. Ogier, "A comprehensive survey of mostly textual document segmentation algorithms since 2008," Pattern Recognition, vol. 64, pp. 1-14, 2017.   DOI
4 D. Rivest-Henault, R. F. Moghaddam, M. Cheriet, "A local linear level set method for the binarization of degraded historical document images," International Journal on Document Analysis and Recognition, vol. 15, no. 2, pp. 101-124, 2012.   DOI
5 F. Jia, C. Shi, K. He, C. Wang, B. Xiao, "Degraded document image binarization using structural symmetry of strokes," Pattern Recognition, vol. 74, pp. 225-240, 2018.   DOI
6 I. Pratikakis, K. Zagoris, X. Karagiannis, L. Tsochatzidis, T. Mondal, I. Marthot-Santaniello, "Icdar 2019 competition on document image binarization (dibco 2019)," in Proc. of the 15th International Conference on Document Analysis and Recognition (ICDAR 2019), Sydney, AUSTRALIA, 2019.
7 J. Sauvola, M. Pietikainen, "Adaptive document image binarization," Pattern Recognition, vol. 33, no. 2, pp. 225-236, 2000.   DOI
8 M. van Herk, "A fast algorithm for local minimum and maximum filters on rectangular and octagonal kernels," Pattern Recognition Letters, vol. 13, no. 7, pp. 517-521, 1992.   DOI
9 S. Lu, B. Su, C. L. Tan, "Document image binarization using background estimation and stroke edges," International Journal on Document Analysis and Recognition, vol. 13, no. 4, pp. 303-314, 2010.   DOI
10 E. Ahmadi, Z. Azimifar, M. Shams, M. Famouri, M. J. Shafiee, "Document image binarization using a discriminative structural classifier," Pattern Recognition Letters, vol. 63, pp. 36-42, 2015.   DOI
11 N. P. Challa, R. V. K. Mehta, "Applications of image processing techniques on palm leaf manuscripts - a survey," Helix, vol. 7, no. 5, pp. 2013-2017, 2017.
12 I. Pratikakis, B. Gatos, K. Ntirogiannis, "Icdar 2013 document image binarization contest (dibco 2013)," in Proc. of the 12th International Conference on Document Analysis and Recognition (ICDAR 2013), Washington, DC, USA, pp. 1471-1476, 2013.
13 L. Zhou, C. Zhang, M. Wu, "D-linknet: Linknet with pretrained encoder and dilated convolution for high resolution satellite imagery road extraction," in Proc. of the 31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPR 2018), Salt Lake City, UT, USA, pp. 192-196, 2018.
14 P. V. Bezmaternykh, D. A. Ilin, D. P. Nikolaev, "U-net-bin: Hacking the document image binarization contest," Computer Optics, vol. 43, no. 5, pp. 825-832, 2019.   DOI
15 A. Chaurasia, E. Culurciello, "Linknet: Exploiting encoder representations for efficient semantic segmentation," in Proc. of the 2017 IEEE Visual Communications and Image Processing (VCIP 2017), St. Petersburg, FL, USA, pp. 1-4, 2017.
16 Z. Hadjadj, M. Cheriet, A. Meziane, Y. Cherfa, "A new efficient binarization method: Application to degraded historical document images," Signal, Image and Video Processing, vol. 11, pp. 1155-1162, 2017.   DOI
17 J. Bernsen, "Dynamic thresholding for gray-level images," in Proc. of the 8th International Conference on Pattern Recognition (ICPR 1986), Paris, pp. 1251-1255, 1986.
18 B. Su, S. Lu, C. L. Tan, "Binarization of historical document images using the local maximum and minimum," in Proc. of the the 9th IAPR International Workshop on Document Analysis Systems (DAS 2010), Boston, Massachusetts, USA, pp. 159-166, 2010.
19 B. Gatos, K. Ntirogiannis, I. Pratikakis, "Icdar 2009 document image binarization contest (dibco 2009)," in Proc. of the 10th International Conference on Document Analysis and Recognition (ICDAR 2009), Barcelona, SPAIN, pp. 1375-1382, 2009.
20 I. Pratikakis, B. Gatos, K. Ntirogiannis, "Icdar 2011 document image binarization contest (dibco 2011)," in Proc. of the 11th International Conference on Document Analysis and Recognition (ICDAR 2011), Beijing, CHINA, pp. 1506-1510, 2011.
21 J. Zhao, C. Shi, F. Jia, Y. Wang, B. Xiao, "Document image binarization with cascaded generators of conditional generative adversarial networks," Pattern Recognition, vol. 96, 2019.
22 L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille, "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 834-848, 2018.   DOI
23 N. R. Howe, "A laplacian energy for document binarization," in Proc. of the 11th International Conference on Document Analysis and Recognition (ICDAR 2011), Beijing, CHINA, pp. 6-10, 2011.
24 B. Su, S. Lu, C. L. Tan, "Robust document image binarization technique for degraded document images," IEEE Transactions on Image Processing, vol. 22, no. 4, pp. 1408-1417, 2013.   DOI
25 Q. N. Vo, S. H. Kim, H. J. Yang, G. Lee, "An mrf model for binarization of music scores with complex background," Pattern Recognition Letters, vol. 69, pp. 88-95, 2016.   DOI
26 N. R. Howe, "Document binarization with automatic parameter tuning," International Journal on Document Analysis and Recognition, vol. 16, no. 3, pp. 247-258, 2013.   DOI
27 I. Pratikakis, K. Zagoris, P. Kaddas, B. Gatos, "Icfhr 2018 competition on handwritten document image binarization (h-dibco 2018)," in Proc. of the 16th International Conference on Frontiers in Handwriting Recognition (ICFHR 2018), Niagara Falls, USA, pp. 489-493, 2018.
28 I. Pratikakis, B. Gatos, K. Ntirogiannis, "Icfhr 2012 competition on handwritten document image binarization (h-dibco 2012)," in Proc. of the 13th International Conference on Frontiers in Handwriting Recognition (ICFHR 2012), Bari, ITALY, pp. 817-822, 2012.
29 K. Ntirogiannis, B. Gatos, I. Pratikakis, "Icfhr 2014 competition on handwritten document image binarization (h-dibco 2014)," in Proc. of the 14th International Conference on Frontiers in Handwriting Recognition (ICFHR 2014), Hersonissos, GREECE, pp. 809-813, 2014.
30 I. Pratikakis, K. Zagoris, G. Barlas, B. Gatos, "Icfhr 2016 handwritten document image binarization contest (h-dibco 2016)," in Proc. of the 15th International Conference on Frontiers in Handwriting Recognition (ICFHR 2016), Shenzhen, CHINA, pp. 619-623, 2016.
31 S. Bhowmik, R. Sarkar, B. Das, D. Doermann, "Gib: A game theory inspired binarization technique for degraded document images," IEEE Transactions on Image Processing, vol. 28, no. 3, pp. 1443-1455, 2019.   DOI
32 L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, H. Adam, "Encoder-decoder with atrous separable convolution for semantic image segmentation," in Proc. of the 15th European Conference on Computer Vision (ECCV 2018), Munich, GERMANY, pp. 833-851, 2018.
33 G. Wang, W. Li, M. Aertsen, J. Deprest, S. Ourselin, T. Vercauteren, "Aleatoric uncertainty estimation with test-time augmentation for medical image segmentation with convolutional neural networks," Neurocomputing, vol. 338, pp. 34-45, 2019.   DOI
34 W. Xiong, J. Xu, Z. Xiong, J. Wang, M. Liu, "Degraded historical document image binarization using local features and support vector machine (svm)," Optik, vol. 164, pp. 218-223, 2018.   DOI
35 I. Pratikakis, K. Zagoris, G. Barlas, B. Gatos, "Icdar 2017 competition on document image binarization (dibco 2017)," in Proc. of the 14th International Conference on Document Analysis and Recognition (ICDAR 2017), Kyoto, JAPAN, pp. 1395-1403, 2017.
36 I. Pratikakis, B. Gatos, K. Ntirogiannis, "H-dibco 2010 - handwritten document image binarization competition," in Proc. of the 12th International Conference on Frontiers in Handwriting Recognition (ICFHR 2010), Kolkata, INDIA, pp. 727-732, 2010.
37 Q. N. Vo, S. H. Kim, H. J. Yang, G. Lee, "Binarization of degraded document images based on hierarchical deep supervised network," Pattern Recognition, vol. 74, pp. 568-586, 2018.   DOI
38 E. Shelhamer, J. Long, T. Darrell, "Fully convolutional networks for semantic segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 4, pp. 640-651, 2017.   DOI
39 O. Ronneberger, P. Fischer, T. Brox, "U-net: Convolutional networks for biomedical image segmentation," in Proc. of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI 2015), Munich, GERMANY, pp. 234-241, 2015.
40 X. Chen, L. Lin, Y. Gao, "Parallel nonparametric binarization for degraded document images," Neurocomputing, vol. 189, pp. 43-52, 2016.   DOI
41 C. Tensmeyer, T. Martinez, "Document image binarization with fully convolutional neural networks," in Proc. of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR 2017), Kyoto, Japan, pp. 99-104, 2017.
42 J. Calvo-Zaragoza, A.-J. Gallego, "A selectional auto-encoder approach for document image binarization," Pattern Recognition, vol. 86,pp. 37-47, 2019.   DOI
43 C. Wolf, J.-M. Jolion, "Extraction and recognition of artificial text in multimedia documents," Pattern Analysis and Applications, vol. 6, no. 4, pp. 309-326, 2004.
44 R. D. Lins, E. Kavallieratou, E. B. Smith, R. B. Bernardino, D. M. d. Jesus, "Icdar 2019 time-quality binarization competition," in Proc. of the 15th International Conference on Document Analysis and Recognition (ICDAR 2019), Sydney, AUSTRALIA, 2019.
45 X.-J. Mao, C. Shen, Y.-B. Yang, "Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections," in Proc. of the 30th Annual Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, pp. 2810-2818, 2016.
46 X. Peng, C. Wang, H. Cao, "Document binarization via multi-resolutional attention model with drd loss," in Proc. of the 15th IAPR International Conference on Document Analysis and Recognition (ICDAR 2019), Sydney, NSW, Australia, pp. 45-50, 2019.
47 J. Deng, W. Dong, R. Socher, L. Li, K. Li, F. Li, "Imagenet: A large-scale hierarchical image database," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2009), Miami Beach, FL, pp. 248-255, 2009.
48 M. Sezgin, B. Sankur, "Survey over image thresholding techniques and quantitative performance evaluation," Journal of Electronic Imaging, vol. 13, no. 1, pp. 146-168, 2004.   DOI
49 N. Otsu, "A threshold selection method from gray-level histograms," IEEE Transactions on Systems, Man, and Cybernetics, vol. 9, no. 1, pp. 62-66, 1979.   DOI
50 W. Niblack, An introduction to digital image processing. Englewood Cliffs, New Jersey: Prentice-Hall International Inc., 1986.