[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.15207/JKCS.2021.12.8.039

Incomplete data handling technique using decision trees

Lee, Jong Chan (Dept. of Computer Engineering, Chungwoon University)

Publication Information

Journal of the Korea Convergence Society / v.12, no.8, 2021 , pp. 39-45 More about this Journal

Abstract

This paper discusses how to handle incomplete data including missing values. Optimally processing the missing value means obtaining an estimate that is the closest to the original value from the information contained in the training data, and replacing the missing value with this value. The way to achieve this is to use a decision tree that is completed in the process of classifying information by the classifier. In other words, this decision tree is obtained in the process of learning by inputting only complete information that does not include loss values among all training data into the C4.5 classifier. The nodes of this decision tree have classification variable information, and the higher node closer to the root contains more information, and the leaf node forms a classification region through a path from the root. In addition, the average of classified data events is recorded in each region. Events including the missing value are input to this decision tree, and the region closest to the event is searched through a traversal process according to the information of each node. The average value recorded in this area is regarded as an estimate of the missing value, and the compensation process is completed.

Keywords

Missing value; FLDF; Incomplete data; Imputation mean; Decision tree; Classifier;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	T. F. Johnson, N. J. B. Isaac, A. Paviolo & M. Gonzalez-Suarez. (2020). Handling missing values in trait data. Global Ecology & Biogeography, 1-12. DOI : 10.1111/geb.13185 DOI
2	S. Huang & C. Cheng. (2020). A Safe-Region Imputation Method for Handling Medical Data with Missing Values. Symmetry, 12(11), 1792. DOI : 10.3390/sym12111792 DOI
3	J. R. Quinlan. (1993). C4.5 : Program for Machine Learning. San Mateo : Morgan Kaufmann.
4	J. C. Lee, D. H. Seo, C. H. Song & W. D. Lee. (2007). FLDF based Decision Tree using Extended Data Expression. The 6th Conference on Machine Learning & Cybernetics. (pp.3478-3483).
5	D. Kim, D. Lee & W. D. Lee. (2006). Classifier using Extended Data Expression. In 2006 IEEE Mountain Workshop on Adaptive and Learning Systems. (pp. 154-159). DOI : 10.1109/SMCALS.2006.250708 DOI
6	J. C. Lee. (2018). Application Examples Applying Extended Data Expression Technique to Classification Problems. Journal of the Korea convergence society, 9(12), 9-15. DOI : 10.15207 /JKCS.2018.9.12.009 DOI
7	J. C. Lee. (2019). Deep Learning Model for Incomplete Data, Journal of the Korea Convergence Society, 10(2), 1-6. DOI : 10.15207 /JKCS.2019.10.2.001 DOI
8	J. C. Lee & W. D. Lee. (2010). Classifier handling incomplete data. Journal of the Korea Institute of Information and Communication Engineering, 14(1), 53-62. DOI
9	J. You, X. Ma, D. Y. Ding, M. Kochenderfer & J. Leskovec. (2020). Handling Missing Data with Graph Representation Learning. arXiv preprint arXiv:2010.16418.
10	J. Han, J. Pei & M. Kamber. (2011). Data Mining: Concepts and Techniques. Waltham : Elsevier.
11	T. Delavallade & T. H. Dang. (2007). Using Entropy to Impute Missing Data in a Classification Task. IEEE International Fuzzy Systems Conference. (pp. 1-6). DOI : 10.1109/FUZZY.2007.4295430 DOI
12	A. Sportisse, C. Boyer, A. Dieuleveut & J. Josse. (2020). Debiasing Averaged Stochastic Gradient Descent to handle missing values. 34th Conference on Neural Information Processing Systems. (pp. 1-11). Vancouver.
13	J. C. Lee. (2021). A data extension technique to handle incomplete data. Journal of the Korea Convergence Society, 12(2), 7-13. DOI: 10.15207 /JKCS.2021.12.2.007 DOI
14	Center for Machine Learning and Intelligent Systems, University of California, Irvine. (2020). UCI Machine Learning Repository. https:// archive.ics.uci.edu/ml/datasets.php
15	R. Kohavi & J. R. Quinlan. (2002). Data mining tasks and methods: Classification: Decision-tree discovery, Handbook of data mining and knowledge discovery. New York : Oxford University Press, 267-276.

KSCI

Incomplete data handling technique using decision trees 결정트리를 이용하는 불완전한 데이터 처리기법

Incomplete data handling technique using decision trees