Browse > Article

Note on classification and regression tree analysis  

임용빈 (이화여자대학교 통계학과)
오만숙 (이화여자대학교 통계학과)
Publication Information
Abstract
The analysis of large data sets with hundreds of thousands observations and thousands of independent variables is a formidable computational task. A less parametric method, capable of identifying important independent variables and their interactions, is a tree structured approach to regression and classification. It gives a graphical and often illuminating way of looking at data in classification and regression problems. In this paper, we have reviewed and summarized tile methodology used to construct a tree, multiple trees and the sequential strategy for identifying active compounds in large chemical databases.
Keywords
Classification and Regression tree; Multiple trees; Sequential Strategy;
Citations & Related Records
연도 인용수 순위
  • Reference
1 강현철 등(1999), '데이터마이닝, 방법론 및 활용' , 자유아카데미
2 Freund, Y. and Schapire,R. (1996). Experiments with a new boosting algorithm, Machine Learning: Proceedings of the Thirteenth International Conference, July, 1996
3 Kass, G. (1980). An exploratory technique for investigating large quantities of categorical data, Applied Statistics, vol. 29, 119-127   DOI   ScienceOn
4 Quinlan, J.R. (1993). C4.5 Programs for machine learning. San Mateo: Morgan Kaufmann
5 Rusinko, A., Farmen, M., Lambert, C. Brown, P., Yound, S. (1999), Analysis of a large structure/biological activity data set using recursive partitoning, J. Amer. Chem. Soc.. vol. 40. 1017-1026
6 Breiman, L.(1996). Bagging predictors, Machine Learning, vol. 26, No. 2, 123-140
7 Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. (1984). Classification and regression trees, Chapman and Hall, Belmont, CA, Wadsworth
8 Breiman L, (1997). Arcing Classifiers. ftp://ftp.stat.berkeley.edu pub/breiman/ arc97.ps
9 Kay Tatsuoka, Chong Gu, Jerome Sacks and S. Stanley Young (1999). Prediction Extreme Values in Large Datasets, Accepted for publication in J. Compt. Graph. Statist
10 임용빈, 이소영, 정종희(2001), '대용량 화학 데이터 베이스를 선별하기 위한 결합다중회귀나무 예측치', '응용통계연구' , 14권(1호), PP. 91-101
11 Kwok, S. and Carter, C. (1990). Multiple decision trees, Uncertainty in Artifical Intelligence, vol. 4, 327-335
12 Abt, M., Lim, Y.B., Sacks, J., Xie, M. and Young, S. (2001), A sequential approach for identifying lead compounds in large chemical databases, Accepted for publication in Statistical Sciences