Browse > Article
http://dx.doi.org/10.13088/jiis.2018.24.1.001

The way to make training data for deep learning model to recognize keywords in product catalog image at E-commerce  

Kim, Kitae (R&D Center, Mindgroup)
Oh, Wonseok (School of Business, Hanyang University)
Lim, Geunwon (School of Business, Hanyang University)
Cha, Eunwoo (School of Business, Hanyang University)
Shin, Minyoung (Department of Chinese Language & Literature, Hanyang University)
Kim, Jongwoo (School of Business, Hanyang University)
Publication Information
Journal of Intelligence and Information Systems / v.24, no.1, 2018 , pp. 1-23 More about this Journal
Abstract
From the 21st century, various high-quality services have come up with the growth of the internet or 'Information and Communication Technologies'. Especially, the scale of E-commerce industry in which Amazon and E-bay are standing out is exploding in a large way. As E-commerce grows, Customers could get what they want to buy easily while comparing various products because more products have been registered at online shopping malls. However, a problem has arisen with the growth of E-commerce. As too many products have been registered, it has become difficult for customers to search what they really need in the flood of products. When customers search for desired products with a generalized keyword, too many products have come out as a result. On the contrary, few products have been searched if customers type in details of products because concrete product-attributes have been registered rarely. In this situation, recognizing texts in images automatically with a machine can be a solution. Because bulk of product details are written in catalogs as image format, most of product information are not searched with text inputs in the current text-based searching system. It means if information in images can be converted to text format, customers can search products with product-details, which make them shop more conveniently. There are various existing OCR(Optical Character Recognition) programs which can recognize texts in images. But existing OCR programs are hard to be applied to catalog because they have problems in recognizing texts in certain circumstances, like texts are not big enough or fonts are not consistent. Therefore, this research suggests the way to recognize keywords in catalog with the Deep Learning algorithm which is state of the art in image-recognition area from 2010s. Single Shot Multibox Detector(SSD), which is a credited model for object-detection performance, can be used with structures re-designed to take into account the difference of text from object. But there is an issue that SSD model needs a lot of labeled-train data to be trained, because of the characteristic of deep learning algorithms, that it should be trained by supervised-learning. To collect data, we can try labelling location and classification information to texts in catalog manually. But if data are collected manually, many problems would come up. Some keywords would be missed because human can make mistakes while labelling train data. And it becomes too time-consuming to collect train data considering the scale of data needed or costly if a lot of workers are hired to shorten the time. Furthermore, if some specific keywords are needed to be trained, searching images that have the words would be difficult, as well. To solve the data issue, this research developed a program which create train data automatically. This program can make images which have various keywords and pictures like catalog and save location-information of keywords at the same time. With this program, not only data can be collected efficiently, but also the performance of SSD model becomes better. The SSD model recorded 81.99% of recognition rate with 20,000 data created by the program. Moreover, this research had an efficiency test of SSD model according to data differences to analyze what feature of data exert influence upon the performance of recognizing texts in images. As a result, it is figured out that the number of labeled keywords, the addition of overlapped keyword label, the existence of keywords that is not labeled, the spaces among keywords and the differences of background images are related to the performance of SSD model. This test can lead performance improvement of SSD model or other text-recognizing machine based on deep learning algorithm with high-quality data. SSD model which is re-designed to recognize texts in images and the program developed for creating train data are expected to contribute to improvement of searching system in E-commerce. Suppliers can put less time to register keywords for products and customers can search products with product-details which is written on the catalog.
Keywords
Deep learning; train data generation; OCR; attribute-based search; Single Shot MultiBox Detector;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 Cao, G., X. Xie, W. Yang, Q. Liao, G. Shi, and J. Wu, "Feature-Fused SSD: Fast Detection for Small Objects," arXiv preprint, (2017).
2 Cho, S. Y., J. E. Choi, K. H. Lee, and H. W. Kim, "An online review mining approach to a recommendation system," Information Systems Review, Vol.17, No.3(2015), 95-111.   DOI
3 Choi, H. Y., and Y. H. Min, "Introduction to deep learning and major issues[written in Korean]," Korea Information Processing Society Review, Vol.22, No.1(2015), 1-15.
4 Choi, S. I., Y. J. Hyun, and N. G. Kim, "Improving performance of recommendation systems using topic modeling," Journal of Intelligence and Information Systems, Vol.21, No.3(2015), 101-116.   DOI
5 Deselaers, T., T. Gass, G. Heigold, and H. Ney, "Latent log-linear models for handwritten digit classification," IEEE transactions on pattern analysis and machine intelligence, Vol.34, No.6(2012), 1105-1117.   DOI
6 Everingham, M., L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, "The pascal visual object classes (voc) challenge," International journal of computer vision, Vol.88, No. 2(2010), 303-338.   DOI
7 Eikvil, L., "Optical character recognition," Technical Report, Norwegian Computing Center, 1993.
8 Fu, C. Y., W. Liu, A. Ranga, A. Tyagi, and A. C. Berg, "DSSD: Deconvolutional Single Shot Detector," arXiv preprint, (2017).
9 Girshick, R., "Fast r-cnn," The IEEE International Conference on Computer Vision (ICCV), (2015), 1440-1448.
10 Girshick, R., J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2014), 580-587.
11 Gupta, A., A. Vedaldi, and A. Zisserman, "Synthetic data for text localisation in natural images," The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 2315-2324.
12 Hong, M. D., J. W. Kim, and G. S. Jo, "A wordnet-based open market category search system for efficient goods registration," Journal of the Korea society of computer and information, Vol.17, No.9(2012), 17-27.   DOI
13 Kim, H. J., "Dynamic hand gesture recognition using CNN model and FMM neural networks," Journal of Intelligence and Information Systems, Vol. 16, No. 2(2010), 95-108.
14 Hwang, C. G., M. N. Yi, and G. D. Jung, "Design of merchandise retrieval system based on ontology on EC," Proceedings of the Korean Society for Internet Information, Vol.6, No.1(2005), 213-216.
15 Jung, K. H., H. J. Kim, and Y. H. Lee, "Character recognition in general video using deep learning[written in Korean]," Korea Information Processing Society Review, Vol.22, No.1(2015), 42-54.
16 Kim, H. A., Free 'ROSE document recognition', image to excel conversion function added[written in Korean], EDAYIL, 2016. Available at http://www.edaily.co.kr/news/NewsRead.edy?newsid=01466166612883112(Accessed 13 July, 2017)
17 Kim, J. W., H. A. Pyo, J. W. Ha, C. K. Lee, and J. H. Lee, "Deep learning algorithms and applications," Communications of the Korean Institute of Information Scientists and Engineers, Vol. 33, No. 8(2015), 25-31.
18 Krizhevsky, A., I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolution neural networks," Advances in neural information processing systems, Vol.25(2013), 1097-1105.
19 Kim, K. J., B. G. Kim, "Product recommender system for online shopping malls using data mining techniques," Journal of Intelligence and Information Systems, Vol.11, No.1(2005), 191-205.
20 Kim, K. S., "A hybrid collaborative filtering algorithm for personalized recommendations and its application to the internet electronic commerce," The Journal of Internet Electronic Commerce Research, Vol.8, No.4(2008), 1-20.   DOI
21 LeCun, Y., B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, "Backpropagation applied to handwritten zip code recognition," Neural computation, Vol. 1, No. 4(1989), 541-551.   DOI
22 Patel, C., A. Patel, and D. Patel, "Optical character recognition by open source OCR tool tesseract: A case study," International Journal of Computer Applications, Vol.55, No.10(2012), 50-56.   DOI
23 Ma, J., I. H. Jeon, and Y. K. Choi, "Design of an efficient keyword-based retrieval system using concept lattice," Journal of Internet Computing and Services, Vol.16, No.3(2015), 43-57.   DOI
24 Minsky, M., and S. Papert, Perceptrons. M.I.T. Press, Oxford, England, 1969.
25 Mo, Y. I., and C. G. Lee, "A study on increasing the efficiency of image search using image attribute in the area of content-based image retrieval," Journal of the Korea society for simulation, Vol.18, No.2(2009), 39-48.
26 Singh, S., "Optical character recognition techniques: a survey," Journal of emerging Trends in Computing and information Sciences, Vol.4, No.6(2013), 545-550.
27 Redmon, J., S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 779-788.
28 Ren, S., k. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards real-time object detection with region proposal networks," Advances in neural information processing systems, (2015), 91-99.
29 Rosenblatt, F., "The perceptron: A probabilistic model for information storage and organization in the brain," Psychological review, Vol.65, No.6(1958), 386-408.   DOI
30 Yang, G. M., E-commerce industry to attract investment attraction 'hot'... The market gets bigger.[written in Korean], NEWSIS, 2017. Available at http://www.newsis.com/view/?id=NISX20170426_0014856681 (Accessed 13 July, 2017).
31 Liu, W., D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg, "SSD: Single shot multibox detector," arXiv preprint, (2016).
32 Zhang, B. T., "Deep Hypernetwork Models," Communications of the Korean Institute of Information Scientists and Engineers, Vol.33, No.8(2015), 11-24.
33 Yang, J. G., S. I. Kwon, and Y. M. Yu, "A study on the current state of cross-border e-commerce and strategic activation plans for overseas direct sales," E-Trade Review, Vol.14, No.1(2016), 23-46.
34 Yao, C., X. Bai, and W. Liu, "A unified framework for multioriented text detection and recognition," IEEE Transactions on Image Processing, Vol.23, No.11(2014), 4737-4749.   DOI