Browse > Article
http://dx.doi.org/10.5351/CKSS.2011.18.1.089

A Comparative Study on Discretization Algorithms for Data Mining  

Choi, Byong-Su (Department of Multimedia Engineering, Hansung University)
Kim, Hyun-Ji (Department of Statistics, Sungkyunkwan University)
Cha, Woon-Ock (Department of Multimedia Engineering, Hansung University)
Publication Information
Communications for Statistical Applications and Methods / v.18, no.1, 2011 , pp. 89-102 More about this Journal
Abstract
The discretization process that converts continuous attributes into discrete ones is a preprocessing step in data mining such as classification. Some classification algorithms can handle only discrete attributes. The purpose of discretization is to obtain discretized data without losing the information for the original data and to obtain a high predictive accuracy when discretized data are used in classification. Many discretization algorithms have been developed. This paper presents the results of our comparative study on recently proposed representative discretization algorithms from the view point of splitting versus merging and supervised versus unsupervised. We implemented R codes for discretization algorithms and made them available for public users.
Keywords
Discretization; classification efficiency; R;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Witten, I. H. and Frank, E. (2000). Data Mining Practical Machine learning Tools and Techniques, Morgan kaufmann. Available from: http://www.cs.waikato.ac.nz/ml/weka/
2 Zhaoa, Y. H. and Zhang, Y. (2008). Comparison of decision tree methods for finding active objects, Advances in Space Research, 41, 1955-1959.   DOI   ScienceOn
3 Ziarko, W. (1993). Variable precision rough set model, Journal of Computer and System Sciences, 46, 39-59.   DOI   ScienceOn
4 Liu, H. and Setiono, R. (1995). Chi2: Feature selection and discretization of numeric attributes, Tools with Artificial Intelligence, 388–391.
5 Liu, H. and Setiono, R. (1997). Feature selection and discretization, IEEE Transactions on Knowledge and Data Engineering, 9, 642-645.   DOI   ScienceOn
6 Quinlan, R. (1993). C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers Inc., San Francisco.
7 Liu, H., Hussain, H. F., Tan, C. L. and Dash, M. (2002). Discretization : An enabling technique, Data Mining and Knowledge Discovery, 6, 393-423.   DOI   ScienceOn
8 Merz, C. J. and Murphy, P. M. (1998). UCI repository of machine learning database, department of information and computer science, University of California, Irvine, California, Available from: http://www.ics.uci.edu/ mlearn/MLRepository.html
9 Pawlak, Z. (1982). Rough sets, International Journal of Computer and Information Sciences, 11, 341-356.   DOI
10 R Development Core Team (2005). R: A language and environment for statistical computing, R Foundation for statistical computing, Vienna, Austria, ISBN 3-900051-07-0, URL http://www.R-project.org.
11 Sotiris, K. and Dimitris, K. (2006). Discretization techiniques: A recent survey, GESTES International Transactions on Computer Science and Engineering, 32, 47-58.
12 Su, C. T. and Hsu, J. H. (2005). An extended Chi2 algorithm for discretization of real value attributes, IEEE Transactions on Knowledge and Data Engineering, 17, 437–441.   DOI
13 Tay, F. E. H. and Shen, L. (2002). Modified Chi2 algorithm for discretization, IEEE Transactions on Knowledge and Data Engineering, 14, 666-670.   DOI   ScienceOn
14 Tsai, C. J., Lee, C. I. and Yang, W. P. (2008). A discretization algorithm based on class-attribute contingency coefficient, Information Sciences, 178, 714-731.   DOI   ScienceOn
15 Fayyad, U. M. and Irani, K. B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning, Artificial Intelligence, 13, 1022-1027.
16 Acuna, E. (2005). Dprep: Data preprocessing and visualization functions for classification, R package version 1.0. http://paginas.fe.up.pt/˜ec/files 0506/R/dprep.pdf.
17 Chmielewski, M. R. and Grzymala-Busse, J. W. (1996). Global discretization of continuous attributes as preprocessing for machine learning, International Journal of Approximate Reasoning, 15, 319-331.   DOI   ScienceOn
18 Dougherty, J., Kohavi, R. and Sahami, M. (1995). Supervised and unsupervised discretization of continuous features, Machine learning, 194-202.
19 Gonzalez-Abril, L., Cuberos, F. J., Velasco, F. and Ortega, J. A. (2009). Ameva: An autonomous discretization algorithm, Expert Systems with Applications, 36, 5327–5332.
20 Jin, H. and Charles, L. (2005). Using AUC and accuracy in evaluating learning algorithms, IEEE Transactions on Knowledge and Data Engineering, 17, 299-310.   DOI   ScienceOn
21 Kerber, R. (1992). ChiMerge: Discretization of numeric attributes, In Proceedings of the Tenth National Conference on Artificial Intelligence, 123-128.
22 Kim, H. J. (2010). Discretization: Data preprocessing, discretization for classification. R package version 1.0. http://lib.stat.cmu.edu/R/CRAN/web/packages/discretization/index.html.
23 Kurgan, L. A. and Cios, K. J. (2004). CAIM discretization algorithm, IEEE Transactions on Knowledge and Data Engineering, 16, 145-153.
24 Ling, C. X., Huang, J. and Zhang, H. (2003). AUC : A better measure than accuracy in comparing learning algorithm, Advances in Artificial Intelligence, 2671, 991.