[KSCI] Korea Science Citation Index Service

An Incremental Method Using Sample Split Points for Global Discretization

한경식 ((주)인우기술)
이수원 (숭실대학교 컴퓨터학부)

Publication Information

Journal of KIISE:Software and Applications / v.31, no.7, 2004 , pp. 849-858 More about this Journal

Abstract

Most of supervised teaming algorithms could be applied after that continuous variables are transformed to categorical ones at the preprocessing stage in order to avoid the difficulty of processing continuous variables. This preprocessing stage is called global discretization, uses the class distribution list called bins. But, when data are large and the range of the variable to be discretized is very large, many sorting and merging should be performed to produce a single bin because most of global discretization methods need a single bin. Also, if new data are added, they have to perform discretization from scratch to construct categories influenced by the data because the existing methods perform discretization in batch mode. This paper proposes a method that extracts sample points and performs discretization from these sample points in order to solve these problems. Because the approach in this paper does not require merging for producing a single bin, it is efficient when large data are needed to be discretized. In this study, an experiment using real and synthetic datasets was made to compare the proposed method with an existing one.

Keywords

Global Discretization; Machine Learning; Incremental Learning; Large Dataset; Data Mining;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	C. J. Merz, P. M. Murphy. UCI repository of machine learning databases. Irvine, CA: University of California, Department of Information and Computer Science, 1996
2	J. Gehrke, V. Ganti, R. Ramakrishnan, and W. Loh. Boat-- optimistic decision tree construction. Proceedings of the ACM SIGMOD Conference on Management of Data, 1999 DOI
3	N. Friedman, D. Geiger, and M. Goldszmidt. Bayesian Network Classifiers. Machine Learning, 29, pp. 131-161, 1997 DOI
4	J. Dougherty, R. Kohavi, and M. Sahami. Supervised and unsupervised discretization of continuous Features. Proceedings of Twelfth International Conference on Machine Learning, pp. 194-202, 1995
5	U. M. Fayyad, K. B. Irani. On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8, 87-102, 1992 DOI
6	T. Elomaa and J. Rousu. Generalizing boundary points. In Proceedings of the Seventeenth National Conference on Artificial Intelligence, Menlo Park, CA, 2000. AAAl Press. In press
7	R. Agrawal, T. Imielinski, and A. Swami. Database mining: A performance perspective. IEEE Transactions on Knowledge and Data Engineering, 5(6):914-952, 1993 DOI ScienceOn
8	K. Alsabti, S. Ranka, and V. Singh. CLOUDS: A Decision Tree Classifier for Large Datasets. In Proc. KDD-98, New York City, New York, 1998
9	L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth, Belmont, 1984
10	J. R. Quinlan. Induction of decision trees. Machine Learning, 1:8 1-106, 1986
11	T. Elomaa and J. Rousu. General and efficient multisplitting of numerical attributes. Machine Learning, 36:200-244, 1999 DOI
12	U. M. Fayyad, K. B. Irani. Multi-interval discretization of continuous-valued attributes for classification learning, Proceedings of the 13th International Joint Conference on Artificial Intelligence, Morgan Kaufmann, pp. 1022-1027

KSCI

An Incremental Method Using Sample Split Points for Global Discretization 전역적 범주화를 위한 샘플 분할 포인트를 이용한 점진적 기법

An Incremental Method Using Sample Split Points for Global Discretization