[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5392/IJoC.2020.16.2.008

Document Image Binarization by GAN with Unpaired Data Training

Dang, Quang-Vinh (Chonnam National University)
Lee, Guee-Sang (Chonnam National University)

Publication Information

International Journal of Contents / v.16, no.2, 2020 , pp. 8-18 More about this Journal

Abstract

Data is critical in deep learning but the scarcity of data often occurs in research, especially in the preparation of the paired training data. In this paper, document image binarization with unpaired data is studied by introducing adversarial learning, excluding the need for supervised or labeled datasets. However, the simple extension of the previous unpaired training to binarization inevitably leads to poor performance compared to paired data training. Thus, a new deep learning approach is proposed by introducing a multi-diversity of higher quality generated images. In this paper, a two-stage model is proposed that comprises the generative adversarial network (GAN) followed by the U-net network. In the first stage, the GAN uses the unpaired image data to create paired image data. With the second stage, the generated paired image data are passed through the U-net network for binarization. Thus, the trained U-net becomes the binarization model during the testing. The proposed model has been evaluated over the publicly available DIBCO dataset and it outperforms other techniques on unpaired training data. The paper shows the potential of using unpaired data for binarization, for the first time in the literature, which can be further improved to replace paired data training for binarization in the future.

Keywords

document image binarization; unpaired training data; gan; unsupervised learning;

Citations & Related Records

Reference

1	S. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, R. Young, K. Ashida, H. Nagai, M. Okamoto, H. Yamamoto, H. Miyao, J. Zhu, W. Ou, C. Wolf, J. Jolion, L. Todoran, M. Worring, and X. Lin, "ICDAR 2003 Robust Reading Competitions: Entries, Results, and Future Directions," in Int. J. Doc. Anal. Recognit, vol. 7, pp. 105-122, 2005, doi: https://doi.org/10.1007/s10032-004-0134-3. DOI
2	N. Stamatopoulos, B. Gatos, G. Louloudis, U. Pal, and A. Alaei, "ICDAR 2013 Handwriting Segmentation Contest," in 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1402-1406, 2013, doi: https://doi.org/10.1109/ICDAR.2013.283.
3	Loannis Pratikakis, Konstantinos Zagoris, George Barlas, and Basilis Gatos, "ICDAR 2017 Competition on Document Image Binarization (DIBCO 2017)," in 14th IAPR International Conference on Document Analysis and Recognition, 2017, doi: https://doi.org/10.1109/ICDAR.2017.228.
4	C.Tensmeyer and T. Martine, "Document Image Binarization with Fully Convolutional Neural networks," in 14th IAPR International Conference on Document Analysis and Recognition, 2017, doi: https://doi.org/10.1109/ICDAR.2017.25.
5	J. Zhu, T. Park, P. Isola, and A. Efros, "Unpaired Image-to-image Translation Using Cycle-Consistent Adversarial Networks," in ICCV, 2017, doi: https://doi.org/10.1109/ICCV.2017.244.
6	Q. N. Vo, S. H. Kim, H. J. Yang, and G. Lee, "Binarization Of Degraded Document Images Based On Hierarchical Deep Supervised Network," Journal Pattern Recognition, vol. 74, issue. C, pp. 568-586, Feb. 2018, doi: https://doi.org/10.1016/j.patcog.2017.08.025. DOI
7	Jorge Calvo-Zaragoza and Antonio-Javier Gallego, "A Selectional Auto-encoder Approach for Document Image Binarization," Journal Pattern Recognition, vol. 86, Jun. 2017, doi: https://doi.org/10.1016/j.patcog.2018.08.011.
8	O.Ronneberger, P.Fischer, and T.Brox, "U-net: Convolutional Networks for Biomedical Image Segmentation," in International Conference on Medical Image Computing and computer-assisted Intervention, Springer, pp. 234-241, 2015, doi: https://doi.org/10.1007/978-3-319.
9	A. K. Bhunia, A. K. Bhunia, and P. P. Roy, "Improving Document Binarization Via Adversarial Noise-Texture Augmentation," in ICIP, 2019, doi: https://doi.org/10.1109/ICIP.2019.8803348.
10	I. J. Goodfellow, J. P. Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative adversarial nets," in NIPS'14 Proceedings of the 27th International Conference on Neural Information Processing Systems, vol. 2, pp. 2672-2680, Dec. 08-13, 2014.
11	E. L. Denton, S. Chintala, and R. Fergus, "Deep generative image models using a laplacian pyramid of adversarial networks," in NIPS, 2015.
12	P. Isola, J. Y. Zhu, T. Zhou, and A. A. Efros, "Image-to-image translation with conditional adversarial networks," in CVPR, 2017, doi: https://doi.org/10.1109/CVPR.2017.632.
13	A. Radford, L. Metz, and S. Chintala, "Unsupervised representation learning with deep convolutional generative adversarial networks," in ICLR, 2016.
14	M. Mirza and S. Osindero, "Conditional generative adversarial nets," arXiv preprint arXiv:1411.1784, 2014.
15	G. Perarnau, J. van de Weijer, B. Raducanu and J.M. Alvarez, "Invertible conditional gans for image editing," in NIPS Workshop, 2016.
16	E. Mansimov, E, Parisotto, J. L. Ba, and R. Salakhutdinov, "Generating images from captions with attention," in ICLR, 2015.
17	H. Liu, B. Jiang, Y. Xiao, and C. Yang, ''Coherent semantic attention for image inpainting,'' in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), pp. 4170-4179, Oct. 2019.
18	J. Y. Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang, and E. Shechtman, "Toward multimodal imageto-image translation," in NIPS, 2017.
19	X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel, "Infogan: Interpretable representation learning by information maximizing generative adversarial nets," in NIPS, 2016.
20	I. Higgins, L. Matthey, A. Pal, C. Burgess X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner, "beta-vae: Learning basic visual concepts with a constrained variational framework," in ICLR, 2017.
21	X. Huang, M. Y. Liu, S. Belongie, and J. Kautz, "Multimodal unsupervised image-to-image translation," in ECCV, 2018.
22	L. A. Gatys, M. Bethge, A. Hertzmann, and E. Shechtman, "Preserving color in neural artistic style transfer," in preprint arXiv:1606.05897, 2016.
23	L. A. Gatys, A. S. Ecker, and M. Bethge, "Image style transfer using convolutional neural networks," in CVPR, pp. 2414-2423, 2016, doi: https://doi.org/10.1109/CVPR.2016.265.
24	J. Johnson, A. Alahi, and L. Fei-Fei "Perceptual losses for real- time style transfer and super-resolution," in ECCV, 2016.
25	D. Ulyanov, V. Lebedev, A. Vedaldi, and V. Lempitsky, "Texture networks: Feed-forward synthesis of textures and stylized images," in ICML, 2016.
26	I. Pratikakis, B. Gatos, and K. Ntirogiannis, "H-DIBCO 2010 - handwritten document image binarization competition," in ICFHR, pp. 727-732, 2010, doi: https://doi.org/10.1109/ICFHR.2010.118.
27	B. Gatos, K. Ntirogiannis, and I. Pratikakis, "ICDAR 2009 document image binarization contest (DIBCO 2009)," in ICDAR, pp. 1375-1382, 2009, doi: https://doi.org/10.1109/ICDAR.2009.246.
28	I. Pratikakis, B. Gatos, and K. Ntirogiannis, "ICDAR 2011 document image binarization contest (DIBCO 2011)," in ICDAR, pp. 1506-1510, 2011, doi: https://doi.org/10.1109/ICDAR.2011.299.
29	I. Pratikakis, B. Gatos, and K. Ntirogiannis, "ICDAR 2013 document image binarization contest (DIBCO 2013)," in ICDAR, pp. 1471-1476, 2013, doi: https://doi.org/10.1109/ICDAR.2013.219.
30	I. Pratikakis, B. Gatos, and K. Ntirogiannis, "ICFHR 2012 competition on handwritten document image binarization (H-DIBCO 2012)," in ICFHR, pp. 817-822, 2012, doi: https://doi.org/10.1109/ICFHR.2012.216.
31	I. Pratikakis, B. Gatos, and K. Ntirogiannis, "ICFHR 2014 competition on handwritten document image binarization (H-DIBCO 2014)," in ICFHR, pp. 809-813, 2014, doi: https://doi.org/10.1109/ICFHR.2014.141.
32	L. Gatys, A. S. Ecker, and M. Bethge, "Texture synthesis using convolutional neural networks," in NIPS, pp. 262-270, 2015.
33	F. Deng, Z. Wu, Z. Lu, and M.S. Brown, "BinarizationShop: a user-assisted software suite for converting old documents to black-and-white," in Proc. Annu. Joint Conf. Digit. Libraries, pp. 255-258, 2010, doi: https://doi.org/10.1145/1816123.1816161.
34	H. Z. Nafchi, S. M. Ayatollahi, R. F. Moghaddam, and M. Cheriet, "An efficient ground-truthing tool for binarization of historical manuscripts," in ICDAR, pp. 807-811, 2013, doi: https://doi.org/10.1109/ICDAR.2013.165.
35	R. Hedjam, H. Z. Nafchi, R. F. Moghaddam, M. Kalacska, and M. Cheriet, "ICDAR 2015 multispectral text extraction contest (MS-TEx 2015)," in ICDAR, pp. 1181-1185, 2015, doi: https://doi.org/10.1109/ICDAR.2015.7333947.
36	B. Gatos, K. Ntirogiannis, and I. Pratikakis, "ICDAR 2009 document image binarization contest (DIBCO 2009)," in Document Analysis and Recognition, 2009, ICDAR'09. 10th International Conference, IEEE, pp. 1375-1382, 2009.
37	K. Ntirogiannis, B. Gatos, and I. Pratikakis, "Performance Evaluation Methodology for Historical Document Image Binarization," IEEE Transactions on Image Processing, vol. 22, no. 2, pp. 595-609, 2013. DOI