DOI QR코드

DOI QR Code

Deep Learning Model for Incomplete Data

불완전한 데이터를 위한 딥러닝 모델

  • Lee, Jong Chan (Department of Computer Engineering, ChungWoon University)
  • 이종찬 (청운대학교 컴퓨터공학과)
  • Received : 2018.12.18
  • Accepted : 2019.02.20
  • Published : 2019.02.28

Abstract

The proposed model is developed to minimize the loss of information in incomplete data including missing data. The first step is to transform the learning data to compensate for the loss information using the data extension technique. In this conversion process, the attribute values of the data are filled with binary or probability values in one-hot encoding. Next, this conversion data is input to the deep learning model, where the number of entries is not constant depending on the cardinality of each attribute. Then, the entry values of each attribute are assigned to the respective input nodes, and learning proceeds. This is different from existing learning models, and has an unusual structure in which arbitrary attribute values are distributedly input to multiple nodes in the input layer. In order to evaluate the learning performance of the proposed model, various experiments are performed on the missing data and it shows that it is superior in terms of performance. The proposed model will be useful as an algorithm to minimize the loss in the ubiquitous environment.

제안 모델은 소실 데이터를 포함하는 불완전한 데이터에서 정보의 손실을 최소화할 수 있도록 개발되었다. 이를 위한 과정은 우선 데이터 확장기법을 이용하여 손실 정보를 보상하도록 학습 데이터를 변환한다. 이 변환 과정에서 데이터의 속성값은 원-핫 인코딩으로 이진 또는 확률값으로 채워진다. 다음 이 변환 데이터는 딥러닝 모델에 입력되는데, 이때 각 속성의 카디너리티에 따라 엔트리 수가 일정하지 않게 된다. 그리고 각 속성의 엔트리 값들을 각각의 입력 노드에 할당하고 학습을 진행한다. 이점이 기존 학습 모델과의 차이점으로, 임의의 속성값이 입력층에서 여러 개의 노드로 분산되는 특이한 구조를 가진다. 제안 모델의 학습 성능을 평가하기 위해, 소실 데이터를 대상으로 다양한 실험을 수행하여 성능 면에서 우수함을 보인다. 제안 모델은 유비쿼터스 환경에서 손실을 최소화하기 위한 알고리즘으로 유용하게 사용될 것으로 본다.

Keywords

OHHGBW_2019_v10n2_1_f0001.png 이미지

Fig. 1. Learning model for extended data representation

Table 1. The example of training data

OHHGBW_2019_v10n2_1_t0001.png 이미지

Table 2. The attribute values in Table 1 are quantified

OHHGBW_2019_v10n2_1_t0002.png 이미지

Table 3. The example of additional Training data

OHHGBW_2019_v10n2_1_t0003.png 이미지

Table 4. The extended data representation.

OHHGBW_2019_v10n2_1_t0004.png 이미지

Table 5. Experimental results

OHHGBW_2019_v10n2_1_t0005.png 이미지

References

  1. D. E.Rumelhart, G. E.Hinton, & R. J.Williams. (1986). Learning Internal Representations by Error Propagation, PDP, I, 318-362
  2. Y. L. Cun, Y. Bengio, & G. Hinton. (2015) Deep learning, Nature, 521(7553), 436-444. DOI : 10.1038/nature14539
  3. L. Deng, & D. Yu. (2014). Deep learning: methods and applications, Foundations and Trends in Signal Processing, 197-387. https://doi.org/10.1561/2000000039
  4. J. Schmidhuber. (2015). Deep learning in neural networks : An overview, Elsevier.
  5. D. Lee, W. Yu, & H. Lim. (2017). Bi-directional LSTM-CNN-CRF for Korean Named Entity Recognition System with Feature Augmentation, Journal of the Korea Convergence Society, 8(12), 55-62. https://doi.org/10.15207/JKCS.2017.8.12.055
  6. J. Lee. (2018). A Method of Eye and Lip Region Detection using Faster R-CNN in Face Image, Journal of the Korea Convergence Society, 9(8), 1-8. https://doi.org/10.15207/JKCS.2018.9.8.001
  7. D. Kim,D. Lee, & W. D. Lee. (2006). Classifier using Extended Data Expression, IEEE Mountain Workshop on Adaptive and Learning Systems., 154-159, DOI: 10.1109/SMCALS.2006.250708
  8. D. Kim, D. Seo, Y. Li, & W. D. Lee. (2008). A classifier capable of rule refinement, International Conference on Service Operations and Logistics, and Informatics, 168-173
  9. J. M. Kong, D. H. Seo, & W. D. Lee. (2007) Rule refinement with extended data expression, Sixth International Conference on Machine Learning and Applications, 310-315
  10. J. Wu, Y. S. Kim, C. H. Song, & W.D.Lee. (2008) A new classifier to deal with incomplete data, International Conference on Software Engineering, Artificial Intelligence, Networking, 105-110
  11. J. W. Grzymala-Busse. (2003). Rough set strategies to data with missing attribute values, Workshop on Foundations & New Directions in Data Mining, 19-22
  12. Y. Jeong. (2017).Subnet Generation Scheme based on Deep Learing for Healthcare Information Gathering, Journal of Digital Convergence, 15(3), 221-228 https://doi.org/10.14400/JDC.2017.15.3.221
  13. M. Yang & S. Yoon. (2018). Production of agricultural weather information by Deep Learning, Journal of Digital Convergence, 16(12), 293-299 https://doi.org/10.14400/JDC.2018.16.12.293
  14. B. Ahn. (2018).Study for Drowsy Driving Detection & Prevention System, Journal of Convergence for Information Technology, 8(3), 193-198
  15. E. Keogh, C. Blake, C. J. Merz. UCI Repository of Machine Learning Databases, http://www.ics.uci.edu/-mlearn/MLRepository.html
  16. J. C. Lee, D. H. Seo, C. H. Song, & W. D. Lee.(2007). FLDF based Decision Tree using Extended Data Expression, Conference on Machine Learning & Cybernetics,HongKong, 3478-3483