Browse > Article
http://dx.doi.org/10.6109/jkiice.2010.14.1.053

A Classifier Capable of Handling Incomplete Data Set  

Lee, Jong-Chan (청운대학교 인터넷학과)
Lee, Won-Don (충남대학교 전기정보통신공학부 컴퓨터)
Abstract
This paper introduces a classification algorithm which can be applied to a learning problem with incomplete data sets, missing variable values or a class value. This algorithm uses a data expansion method which utilizes weighted values and probability techniques. It operates by extending a classifier which are considered to be in the optimal projection plane based on Fisher's formula. To do this, some equations are derived from the procedure to be applied to the data expansion. To evaluate the performance of the proposed algorithm, results of different measurements are iteratively compared by choosing one variable in the data set and then modifying the rate of missing and non-missing values in this selected variable. And objective evaluation of data sets can be achieved by comparing, the result of a data set with non-missing variable with that of C4.5 which is a known knowledge acquisition tool in machine learning.
Keywords
FLDF; Missing value; Extended data expression; Optimized projection plane; Entropy function;
Citations & Related Records
연도 인용수 순위
  • Reference
1 J.M.Robins, A.Rotnitzky, L. P. Zhao," Analysis of Semiparametric Regression Models for Repeated Outcomes in the Presence of Missing Data", J. Am. Statist. Assoc. 90, pp 106-121, 1995.   DOI   ScienceOn
2 A. P. Dempster, N. M. Laird, D. B. Rubin, "Maximum-likelihood from incomplete data via the EM algorithm", Journal of the Royal Statistical Society, Vol.B39, pp1-38, 1977.
3 J. Han, M. Kamber, Data Mining : Concept and Techniques, Morgan Kaufmann publishers, 2001.
4 I. Koninenko, I. Brtko, E. Roskar, "Experiments in automatic learning of medical diagnostic rules", Technical Report, Jozef Stefan Institute, Ljubljana, 1984.
5 R.Slowinski, J. Stefanowski, "Handling various types of uncertainty in the rough set approach", International Workshop on Rough Sets and Knowlege Discovery, pp366-376, 1993
6 J. C. Lee, Y. R Kiln, W. D. Lee, S. H. Lee, "Pattern Classifying Neural Network Based on Fisher's Linear Discreminant", Inter'l Joint Conference on Neural Networks (IJCNN), Vol. 1, pp743-748. July 1992.
7 J. C. Lee, Y. H. Kim, W. D. Lee, S. H. Lee, "A method to find the structure and weights of layered neural networks", World Congress on Neural Networks, Vol llI, July 1993.
8 T.P.Hong, L.H. Tseng, B.C. Chien, "Learning fuzzy rules from incomplete numerical data by rough sets", IEEE international Conference on Fuzzy Syatems, pp1438-1443, 2002.
9 D. Kim, D. Lee, W. D. Lee, "Classifier using Extended Data Expression," IEEE Mountain Workshop on Adaptive and Learning Systems, July. 2006
10 N.H.Nie, C.H.Hull, J.G.Jenkins, K. Steinbrenner, Bent D.H, SPSS, 2nd ed. NewYork: McGraw -Hill, 1975.
11 J.H.Friedman, "A recursive partitioning decision rule for non-parametric classification", IEEE Transactions on Computer Science, pp404- 408, 1977.
12 Ronny Kohavi, J.R.Quinlan, "Data mining tasks and methods: Classification; Decision-tree discovery," Handbook of data mining and knowledge discovery, Oxford University Press, pp.267-276, 2002.
13 R. J. Hathaway, J. C. Bezdek, "Fuzzy c-means clustering of incomplete data", IEEE Trans. on Systems, Man, Cybernetics-part B: Cybernetics, Vol.31, No. 5, 2001.
14 M. Kryszkiewicz, "Rough set approach to incomplete information systems", Information Science, Vol.112, pp39-49, 1998.   DOI   ScienceOn
15 J. W. Grzymala-Busse,"vOn the unknown attribute values in learning from examples", ISMIS-91, 6th International Symposium on Methodologies for Intelligent Systems, pp368-377, Oct. 1991.
16 Roderick J. A. Littile, Donald B. Rubin, Statistical Analysis with Missing Data, 2ED, John Wiley & Sons, 2002
17 J. R Quinlan, "C4.5:Program for Machine Learning," San Mateo, Calif, Morgan Kaufmann, 1993.
18 M. Weiser, "Some Computer Science Issues in Ubiquitous Computing," Com. ACM, Vol. 36, No.7, pp.75-84, July. 1993   DOI
19 Thomas G. Dietterich, "An Experimental Com-parison of three methods for constructing emsembles for decision trees: Bagging, Boosting and randomization.", Machine Learning, Vol.40, No. 2, pp139-157, August, 2000.   DOI   ScienceOn
20 J. W. Grzymala-Busse, "Rough set strategies to data with missing attribute values", Workshop on Foundations & New Directions in Data Mining, pp19-22, Nov. 2003.
21 Mehmed Kantardzic, "Data Mining:Concepts, Models, Methods, and Algorithms," Wiley- IEEE Press, pp. 139-161, 2002.