Browse > Article
http://dx.doi.org/10.15207/JKCS.2020.11.3.001

Algorithms for Handling Incomplete Data in SVM and Deep Learning  

Lee, Jong-Chan (Dept. of Computer Engineering, Chungwoon University)
Publication Information
Journal of the Korea Convergence Society / v.11, no.3, 2020 , pp. 1-7 More about this Journal
Abstract
This paper introduces two different techniques for dealing with incomplete data and algorithms for learning this data. The first method is to process the incomplete data by assigning the missing value with equal probability that the missing variable can have, and learn this data with the SVM. This technique ensures that the higher the frequency of missing for any variable, the higher the entropy so that it is not selected in the decision tree. This method is characterized by ignoring all remaining information in the missing variable and assigning a new value. On the other hand, the new method is to calculate the entropy probability from the remaining information except the missing value and use it as an estimate of the missing variable. In other words, using a lot of information that is not lost from incomplete learning data to recover some missing information and learn using deep learning. These two methods measure performance by selecting one variable in turn from the training data and iteratively comparing the results of different measurements with varying proportions of data lost in the variable.
Keywords
SVM; Entropy; UChoo; Extended data expression; Incomplete data; Deep learning;
Citations & Related Records
Times Cited By KSCI : 9  (Citation Analysis)
연도 인용수 순위
1 J. C. Lee. (2019). Deep Learning Model for Incomplete Data, Journal of the Korea Convergence Society, 10(2), 1-6. DOI : 10.15207/JKCS.2019.10.2.001   DOI
2 R. Kohavi & J. R. Quinlan. (2002). Data mining tasks and methods: Classification: Decision-tree discovery, Handbook of data mining and knowledge discovery, Oxford University Press, 267-276.
3 H. Lee, S. Chung & E. Choi. (2016). A Case Study on Machine Learning Applications and Perfor- mance Improvement in Learning Algorithm, Journal of Digital Convergence, 14(2), 245-258.   DOI
4 Y. Jeong. (2019). Machine Learning Based Domain Classification for Korean Dialog System, Journal of Convergence for Information Technology, 9(8), 1-8.   DOI
5 J. C. Lee, D. H. Seo, C. H. Song & W. D. Lee. (2007). FLDF based Decision Tree using Extended Data Expression, The 6th Conference on Machine Learning & Cybernetics, 3478-3483.
6 J. C. Lee & W. D. Lee. (2010). Classifier handling incomplete data. Journal of the Korea Institute of Information and Communication Engineering, 14(1), 53-62.   DOI
7 D. Kim, D. Lee & W. D. Lee. (2006). Classifier using Extended Data Expression, IEEE Mountain Workshop on Adaptive and Learning Systems. DOI : 10.1109/SMCALS.2006.250708
8 T. Delavallade & T. H. Dang. (2007). Using Entropy to Impute Missing Data in a Classification Task. IEEE International Fuzzy Systems Conference. DOI : 10.1109/FUZZY.2007.4295430
9 A. McCallum, D. Freitag & F. Pereira. (2000). Maximum Entropy Markov Models for Information Extraction and Segmentation. Proc. Of 17th International Conference on Machine Learning, 591-598.
10 J. C. Lee. (2018). Application Examples Applying Extended Data Expression Technique to Classification Problems, Journal of the Korea Convergence Society, 9(12), 9-15. DOI : 10.15207/JKCS.2018.9.12.009   DOI
11 E. Keogh, C. Blake & C. J. Merz. (1998). UCI Repository of Machine Learning Databases, UCI Machine Learning Repository. http://www.ics .uci.edu/-mlearn/MLRepository.html
12 J. Han, J. Pei & M. Kamber. (2011), Data Mining: Concepts and Techniques, Elsevier.
13 M. Kantardzic. (2002). Data Mining : Concepts, Models, Methods, and Algorithms, Wiley-IEEE Press.
14 T. Maszczyk & W. Duch. (2008). Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees. Lecture Notes in Computer Science, 5097, 643-651.
15 M. Srednicki & Theoretical Physics Group.(1993) Entropy and Area. Physical Review letters. DOI : 10.1103/PhysRevLett.71.666
16 R.M.Gray.(2011).Entropy and Information Theory. Springer 2ED. DOI : 10.1007/978-1-4419-7970-4
17 J. R. Quinlan. (1993). C4.5 : Program for Machine Learning. San Mateo : Morgan Kaufmann
18 P. E. Utgoff. (1989). Incremental Induction of Decision Trees. Machine Learning, 4(2), 161-186.   DOI