Browse > Article
http://dx.doi.org/10.15207/JKCS.2021.12.8.039

Incomplete data handling technique using decision trees  

Lee, Jong Chan (Dept. of Computer Engineering, Chungwoon University)
Publication Information
Journal of the Korea Convergence Society / v.12, no.8, 2021 , pp. 39-45 More about this Journal
Abstract
This paper discusses how to handle incomplete data including missing values. Optimally processing the missing value means obtaining an estimate that is the closest to the original value from the information contained in the training data, and replacing the missing value with this value. The way to achieve this is to use a decision tree that is completed in the process of classifying information by the classifier. In other words, this decision tree is obtained in the process of learning by inputting only complete information that does not include loss values among all training data into the C4.5 classifier. The nodes of this decision tree have classification variable information, and the higher node closer to the root contains more information, and the leaf node forms a classification region through a path from the root. In addition, the average of classified data events is recorded in each region. Events including the missing value are input to this decision tree, and the region closest to the event is searched through a traversal process according to the information of each node. The average value recorded in this area is regarded as an estimate of the missing value, and the compensation process is completed.
Keywords
Missing value; FLDF; Incomplete data; Imputation mean; Decision tree; Classifier;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 T. F. Johnson, N. J. B. Isaac, A. Paviolo & M. Gonzalez-Suarez. (2020). Handling missing values in trait data. Global Ecology & Biogeography, 1-12. DOI : 10.1111/geb.13185   DOI
2 S. Huang & C. Cheng. (2020). A Safe-Region Imputation Method for Handling Medical Data with Missing Values. Symmetry, 12(11), 1792. DOI : 10.3390/sym12111792   DOI
3 J. R. Quinlan. (1993). C4.5 : Program for Machine Learning. San Mateo : Morgan Kaufmann.
4 J. C. Lee, D. H. Seo, C. H. Song & W. D. Lee. (2007). FLDF based Decision Tree using Extended Data Expression. The 6th Conference on Machine Learning & Cybernetics. (pp.3478-3483).
5 D. Kim, D. Lee & W. D. Lee. (2006). Classifier using Extended Data Expression. In 2006 IEEE Mountain Workshop on Adaptive and Learning Systems. (pp. 154-159). DOI : 10.1109/SMCALS.2006.250708   DOI
6 J. C. Lee. (2018). Application Examples Applying Extended Data Expression Technique to Classification Problems. Journal of the Korea convergence society, 9(12), 9-15. DOI : 10.15207 /JKCS.2018.9.12.009   DOI
7 J. C. Lee. (2019). Deep Learning Model for Incomplete Data, Journal of the Korea Convergence Society, 10(2), 1-6. DOI : 10.15207 /JKCS.2019.10.2.001   DOI
8 J. C. Lee & W. D. Lee. (2010). Classifier handling incomplete data. Journal of the Korea Institute of Information and Communication Engineering, 14(1), 53-62.   DOI
9 J. You, X. Ma, D. Y. Ding, M. Kochenderfer & J. Leskovec. (2020). Handling Missing Data with Graph Representation Learning. arXiv preprint arXiv:2010.16418.
10 J. Han, J. Pei & M. Kamber. (2011). Data Mining: Concepts and Techniques. Waltham : Elsevier.
11 T. Delavallade & T. H. Dang. (2007). Using Entropy to Impute Missing Data in a Classification Task. IEEE International Fuzzy Systems Conference. (pp. 1-6). DOI : 10.1109/FUZZY.2007.4295430   DOI
12 A. Sportisse, C. Boyer, A. Dieuleveut & J. Josse. (2020). Debiasing Averaged Stochastic Gradient Descent to handle missing values. 34th Conference on Neural Information Processing Systems. (pp. 1-11). Vancouver.
13 J. C. Lee. (2021). A data extension technique to handle incomplete data. Journal of the Korea Convergence Society, 12(2), 7-13. DOI: 10.15207 /JKCS.2021.12.2.007   DOI
14 Center for Machine Learning and Intelligent Systems, University of California, Irvine. (2020). UCI Machine Learning Repository. https:// archive.ics.uci.edu/ml/datasets.php
15 R. Kohavi & J. R. Quinlan. (2002). Data mining tasks and methods: Classification: Decision-tree discovery, Handbook of data mining and knowledge discovery. New York : Oxford University Press, 267-276.