[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.33851/JMIS.2019.6.4.225

Training Data Sets Construction from Large Data Set for PCB Character Recognition

NDAYISHIMIYE, Fabrice (Department of Computer Engineering, Keimyung University)
Gang, Sumyung (Department of Computer Engineering, Keimyung University)
Lee, Joon Jae (Department of Computer Engineering, Keimyung University)

Publication Information

Journal of Multimedia Information System / v.6, no.4, 2019 , pp. 225-234 More about this Journal

Abstract

Deep learning has become increasingly popular in both academic and industrial areas nowadays. Various domains including pattern recognition, Computer vision have witnessed the great power of deep neural networks. However, current studies on deep learning mainly focus on quality data sets with balanced class labels, while training on bad and imbalanced data set have been providing great challenges for classification tasks. We propose in this paper a method of data analysis-based data reduction techniques for selecting good and diversity data samples from a large dataset for a deep learning model. Furthermore, data sampling techniques could be applied to decrease the large size of raw data by retrieving its useful knowledge as representatives. Therefore, instead of dealing with large size of raw data, we can use some data reduction techniques to sample data without losing important information. We group PCB characters in classes and train deep learning on the ResNet56 v2 and SENet model in order to improve the classification performance of optical character recognition (OCR) character classifier.

Keywords

PCB inspection; Optical character recognition; Deep learning; Data reduction; Sampling;

Citations & Related Records

Reference

1	Liu, H. and H. Motoda, "On Issues of Instance Selection," Data Min. Knowl. Discov., vol. 6, no. 2, pp. 115-130, Apr. 2002. DOI
2	Whelan, M., Le-Khac, N-A. and Kecahdi, M-T., "Data Reduction in Very Large Spatio-Temporal Data Sets," in Proceedings of IEEE International Workshop On Cooperative Knowledge Discovery and Data Mining (WETICE 2010), Larissa, Greece, June 2010.
3	N.V. Chawla, N. Japkowicz and A. Kotcz, "Editorial: special issue on learning from imbalanced data sets," ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp.1-6, 2004. DOI
4	c. A. K. Bhunia, A. Das, A. K. Bhunia, S. R. K. Perla and P. P. Roy, "Handwriting Recognition in Low-resource Scripts using Adversarial Learning," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), USA, 2019.
5	a. P. Keserwani, T. Ali and P. P. Roy, "Handwritten Bangla Character and Numeral Recognition using Convolutional Neural Network for low-memory GPU," International Journal of Machine Learning and Cybernetics, vol. 10, no. 12, pp. 3485-3497, 2019. DOI
6	b. P. P. Roy, F.Rayar and J.Y.Ramel, "Word Spotting in Historical Documents using Primitive based Dynamic Programming," Image and Vision Computing, vol. 44, pp. 15-28, 2015. DOI
7	K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of IEEE Conf. Comp. Vis. Patt. Recogn., pp. xx-xx, 2016.
8	K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," in Proceeding of ICLR, pp. xx-xx, 2015.
9	J. Hu, S. Li, and S. Gang, "Squeeze-and-excitation networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. xx-xx, 2018.
10	Kivinen, J. and Mannila, H., "The power of sampling in knowledge discovery," in Proceedings of ACM SIGACT-SIGMODSIGART '94, New York, NY, pp. 77-85, 1994.
11	Feldman, D., Schmidt, M., Sohler, C., "Turning big data into tiny data: Constant-size coresets for k-means, pca and projective clustering," in Proceedings of the twenty fourth annual ACM-SIAM symposium on Discrete algorithms. pp. 1434-1453. SIAM 2013.
12	Cohn, D., Atlas, L., and Ladner, R., "Improving Generalization with Active Learnin," Machine Learning, vol. 15, no. 2, pp.201-221, May 1994. DOI
13	Agarwal, P.K., Har-Peled, S., Varadarajan, K.R., "Geometric approximation via coreset," Combinatorial and computational geometry, vol. 52, pp. 1-30, 2005.
14	Feldman, D., Monemizadeh, M., Sohler, C., "A ptas for k-means clustering based on weak coresets," in Proceedings of the ACM twenty-third annual symposium on Computational geometry, pp. 11-18. 2007,
15	D. D. Lewis and J. Catlett, "Heterogeneous uncertainty sampling for supervised learning," in Proceedings of the 11th International Conference on Machine Learning, p. 148-156, 1994.
16	O. Bachem, M. Lucic, and A. Krause, "Practical coreset constructions for machine learning," arXiv preprint arXiv:1703.06476, 2017.
17	Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten L., "Densely connected convolutional networks," arXiv preprint arXiv:1608.06993,2016.
18	Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," in Proceedings of the IEEE, vol. 86, no. 11, pp.2278-2324, Nov. 1998. DOI
19	A. Buja, D. Cook, D.F. Swayne, "Interactive high-dimensional data visualization," Journal of Computational and Graphical Statistics, vol. 5, pp. 78-99, 1996. DOI
20	K.U. Sattler and E. Schallehn, "A Data Preparation Framework Based on a Multidatabase Language," in Proceedings of Int'l Symp. Database Eng. & Applications, pp. 219-228, 2001.
21	Bertolotto, M., Di Martino, S., Ferrucci, F., and Kechadi, T., Towards a Framework for Mining and Analysing Spatio-Temporal Datasets, International Journal of Geographical Information Science, vol. 21, no. 8, pp.895-906, July 2007. DOI
22	C. J. Burges, "A tutorial on support vector machines for pattern recognition," Journal name???, vol. 2, pp. 121-167, January 1998. DOI
23	W. Zhao, R. Chellappa, J. Phillips, and A. Rosenfeld, "Face recognition: A literature survey," ACM Computing Surveys, vol. ??, no. ??, pp. 399-458, 2003. DOI
24	Johnston W.L., "Model visualisation, in: Information Visualisation in Data Mining and Knowledge Discovery", Morgan Kaufmann, Los Altos, CA, pp. 223-227, 2001.