Browse > Article
http://dx.doi.org/10.5391/JKIIS.2003.13.4.391

Discretization of Continuous-Valued Attributes considering Data Distribution  

Lee, Sang-Hoon (서강대학교 컴퓨터학과)
Park, Jung-Eun (서강대학교 컴퓨터학과)
Oh, Kyung-Whan (서강대학교 컴퓨터학과)
Publication Information
Journal of the Korean Institute of Intelligent Systems / v.13, no.4, 2003 , pp. 391-396 More about this Journal
Abstract
This paper proposes a new approach that converts continuous-valued attributes to categorical-valued ones considering the distribution of target attributes(classes). In this approach, It can be possible to get optimal interval boundaries by considering the distribution of data itself without any requirements of parameters. For each attributes, the distribution of target attributes is projected to one-dimensional space. And this space is clustered according to the criteria like as the density value of each target attributes and the amount of overlapped areas among each density values of target attributes. Clusters which are made in this ways are based on the probabilities that can predict a target attribute of instances. Therefore it has an interval boundaries that minimize a loss of information of original data. An improved performance of proposed discretization method can be validated using C4.5 algorithm and UCI Machine Learning Data Repository data sets.
Keywords
Discretization; Data Distribution; Density based Clustering; Decision Tree;
Citations & Related Records
연도 인용수 순위
  • Reference
1 T. Elomaa, J. Rousu, "General and Efficient Multisplitting of Numerical Attributes", Kluwer Academic Publishers, 1999
2 H. Liu, R. Setiono, "Feature selection via discretization", IEEE Transactions on Knowledge and Data Engineering, vol.9, page(s): 642-645, 1997   DOI   ScienceOn
3 J. Han and M. Kamber, "Data Mining Conceip and Techniques", Morgan Kaufmann Publishers, 2001, page(s): 363-369
4 http://www.ics.uci.edu/~mlearn
5 lan H. Witten, Eibe Frank, "Data Mining", Morgan Kaufmann Publishers, 2000, page(s): 238-246
6 R. Kerber. "ChiMerge: Discretization of numeric attribute." In Proc. Tenth National Conf. on Artificial Intelligence (AAAI-92), San Jose, CA, 123-127, 1992.
7 Ren-Pu Li, Zheng-Ou Wang, "An entropy-based discretization method for classif-ication rules with inconsistency checking", Machine Learning and Cybernetics, 2002. Proceedings. 2002 International Conference, On page(s): 243- 246