Browse > Article
http://dx.doi.org/10.3745/KIPSTD.2003.10D.1.023

Decision Tree Classifier for Multiple Abstraction Levels of Data  

Jeong, Min-A (광주과학기술원 정보통신학과)
Lee, Do-Heon (한국과학기술원 바이오시스템학과)
Abstract
Since the data is collected from disparate sources in many actual data mining environments, it is common to have data values in different abstraction levels. This paper shows that such multiple abstraction levels of data can cause undesirable effects in decision tree classification. After explaining that equalizing abstraction levels by force cannot provide satisfactory solutions of this problem, it presents a method to utilize the data as it is. The proposed method accommodates the generalization/specialization relationship between data values in both of the construction and the class assignment phase of decision tree classification. The experimental results show that the proposed method reduces classification error rates significantly when multiple abstraction levels of data are involved.
Keywords
Data Mining; Decision Tree; Abstraction Level; Data Quality; Classification;
Citations & Related Records
연도 인용수 순위
  • Reference
1 J. Quinlan, C4.5 : Programs for Machine Learning, Mo-rgan Kaufmann Pub., 1993
2 J. Gehrke, V. Ganti, R. Ramakrishnan and W. Loh, 'BOAT Optimistic Decision Tree Construction,' In Proc. of ACM SIGMOD Conf., Philadelphia, Pennsylvania, pp.169-180, June, 1999   DOI
3 M. Berry and G. Linoff, Data Mining Techniques For Marketing, Sales, and Customer Support, Wiley and Sons, 1997
4 R. Wang, V. Storey and C. Firth, 'A Framework for Analysis of Data Quality Research,' IEEE Transactions on Knowledge and Engineering, Vol.7, No.4, pp.623-640, August, 1995   DOI   ScienceOn
5 J. Gehrke, R. Ramakrishinan and V. Ganti, 'RainForest A Framework for Fast Decision Tree Construction of Large Datasets,' Data Mining and Knowledge Discovery, Vol.4, pp.127-162, 2000   DOI
6 L. English, Improving Data Warehouse and Business Information Quality-Method for Reducing Costs and In-creasing Profits, Wiley & Sons, 1999
7 M. Mehta, R. Agrawal and J. Rissanen, 'SLIQ : A Fast Scalable Classifier for Data Mining,' Proc. of the Fifth Int'l Conference on Extending Database Technulogy (EDBT), Avignon, France, March, 1996
8 J. Shafer, R. Agrawal, M. Mehta, 'SPRINT : A Scalable Parallel Classifier for Data Mining,' Proc. of the 22th Int'l Conference on Very Large Databases, Mumbai (Bombay), India, September, 1996
9 K. Hatonen, M. Klemettinen, H. Mannila, P. Ronkainen and H. Toivonen, 'Knowledge Discovery from Telecommu-nication Network Alarm Databases,' In Proc. of the 12th International Conference on Data Engineering, New Orleans, Louisiana, pp.115-122, February, 1996
10 Trillium Software System, 'A Practical Guide to Achiev-ing Enterprise Data Quality,' White Paper, Trillium Soft-ware, 1998
11 J. Williams, Tools for Traveling Data, DBMS, Miller Freeman Inc., June, 1997
12 Vality Technology Inc., 'The Five Legacy Data Contam-inants You Will Encounter in Your Warehouse Migra-tion,' White Paper, Vality Technology Inc., 1998
13 G. Klir and T. Folger, Fuzzy Sets, Uncertainty, and In-formation, Prentice-Hall Int'l Inc., 1988
14 C. Shannon, 'The Mathematical Theory of Communica-tion,' The Bell System Tech., 1948
15 M Dong, R. Kothari, 'Look-ahead based fuzzy decision tree induction,' IEEE Transactions on Fuzzy Systems, Vol.9, Issue.3, pp.461-468, June, 2001   DOI   ScienceOn
16 X. Wang and H. Jiarong, 'On the handling of fuzziness for continuous valued attributes in decision tree generation,' Fuzzy Sets and Systems 99, pp.283-290, 1998   DOI   ScienceOn
17 C. Batini, S. Ceri and Navathe, Conceptual Database De-sign, Benjamin Cummings, Inc., 1992
18 C. Janikow, 'Fuzzy decision trees : issues and methods,' IEEE Transactions on, Systems, Man and Cybernetics, Part B, Vol.28, Issue.l, pp.1-14, February, 1998   DOI   ScienceOn