[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.6109/jkiice.2010.14.1.053

A Classifier Capable of Handling Incomplete Data Set

Lee, Jong-Chan (청운대학교 인터넷학과)
Lee, Won-Don (충남대학교 전기정보통신공학부 컴퓨터)

Publication Information

Journal of the Korea Institute of Information and Communication Engineering / v.14, no.1, 2010 , pp. 53-62 More about this Journal

Abstract

This paper introduces a classification algorithm which can be applied to a learning problem with incomplete data sets, missing variable values or a class value. This algorithm uses a data expansion method which utilizes weighted values and probability techniques. It operates by extending a classifier which are considered to be in the optimal projection plane based on Fisher's formula. To do this, some equations are derived from the procedure to be applied to the data expansion. To evaluate the performance of the proposed algorithm, results of different measurements are iteratively compared by choosing one variable in the data set and then modifying the rate of missing and non-missing values in this selected variable. And objective evaluation of data sets can be achieved by comparing, the result of a data set with non-missing variable with that of C4.5 which is a known knowledge acquisition tool in machine learning.

Keywords

FLDF; Missing value; Extended data expression; Optimized projection plane; Entropy function;

Citations & Related Records

Reference

1	J.M.Robins, A.Rotnitzky, L. P. Zhao," Analysis of Semiparametric Regression Models for Repeated Outcomes in the Presence of Missing Data", J. Am. Statist. Assoc. 90, pp 106-121, 1995. DOI ScienceOn
2	A. P. Dempster, N. M. Laird, D. B. Rubin, "Maximum-likelihood from incomplete data via the EM algorithm", Journal of the Royal Statistical Society, Vol.B39, pp1-38, 1977.
3	J. Han, M. Kamber, Data Mining : Concept and Techniques, Morgan Kaufmann publishers, 2001.
4	I. Koninenko, I. Brtko, E. Roskar, "Experiments in automatic learning of medical diagnostic rules", Technical Report, Jozef Stefan Institute, Ljubljana, 1984.
5	R.Slowinski, J. Stefanowski, "Handling various types of uncertainty in the rough set approach", International Workshop on Rough Sets and Knowlege Discovery, pp366-376, 1993
6	J. C. Lee, Y. R Kiln, W. D. Lee, S. H. Lee, "Pattern Classifying Neural Network Based on Fisher's Linear Discreminant", Inter'l Joint Conference on Neural Networks (IJCNN), Vol. 1, pp743-748. July 1992.
7	J. C. Lee, Y. H. Kim, W. D. Lee, S. H. Lee, "A method to find the structure and weights of layered neural networks", World Congress on Neural Networks, Vol llI, July 1993.
8	T.P.Hong, L.H. Tseng, B.C. Chien, "Learning fuzzy rules from incomplete numerical data by rough sets", IEEE international Conference on Fuzzy Syatems, pp1438-1443, 2002.
9	D. Kim, D. Lee, W. D. Lee, "Classifier using Extended Data Expression," IEEE Mountain Workshop on Adaptive and Learning Systems, July. 2006
10	N.H.Nie, C.H.Hull, J.G.Jenkins, K. Steinbrenner, Bent D.H, SPSS, 2nd ed. NewYork: McGraw -Hill, 1975.
11	J.H.Friedman, "A recursive partitioning decision rule for non-parametric classification", IEEE Transactions on Computer Science, pp404- 408, 1977.
12	Ronny Kohavi, J.R.Quinlan, "Data mining tasks and methods: Classification; Decision-tree discovery," Handbook of data mining and knowledge discovery, Oxford University Press, pp.267-276, 2002.
13	R. J. Hathaway, J. C. Bezdek, "Fuzzy c-means clustering of incomplete data", IEEE Trans. on Systems, Man, Cybernetics-part B: Cybernetics, Vol.31, No. 5, 2001.
14	M. Kryszkiewicz, "Rough set approach to incomplete information systems", Information Science, Vol.112, pp39-49, 1998. DOI ScienceOn
15	J. W. Grzymala-Busse,"vOn the unknown attribute values in learning from examples", ISMIS-91, 6th International Symposium on Methodologies for Intelligent Systems, pp368-377, Oct. 1991.
16	Roderick J. A. Littile, Donald B. Rubin, Statistical Analysis with Missing Data, 2ED, John Wiley & Sons, 2002
17	J. R Quinlan, "C4.5:Program for Machine Learning," San Mateo, Calif, Morgan Kaufmann, 1993.
18	M. Weiser, "Some Computer Science Issues in Ubiquitous Computing," Com. ACM, Vol. 36, No.7, pp.75-84, July. 1993 DOI
19	Thomas G. Dietterich, "An Experimental Com-parison of three methods for constructing emsembles for decision trees: Bagging, Boosting and randomization.", Machine Learning, Vol.40, No. 2, pp139-157, August, 2000. DOI ScienceOn
20	J. W. Grzymala-Busse, "Rough set strategies to data with missing attribute values", Workshop on Foundations & New Directions in Data Mining, pp19-22, Nov. 2003.
21	Mehmed Kantardzic, "Data Mining:Concepts, Models, Methods, and Algorithms," Wiley- IEEE Press, pp. 139-161, 2002.

3	(2010) 한국융합학회논문지 SVM과 딥러닝에서 불완전한 데이터를 처리하기 위한 알고리즘 / 11 (3) , 1
2	(2010) 한국융합학회논문지 불완전한 데이터를 처리하기 위한 데이터 확장기법 / 12 (2) , 7
8	(2021) 한국융합학회논문지 결정트리를 이용하는 불완전한 데이터 처리기법 / 12 (8) , 39

KSCI

A Classifier Capable of Handling Incomplete Data Set 불완전한 데이터를 처리할수 있는 분류기

A Classifier Capable of Handling Incomplete Data Set