Browse > Article
http://dx.doi.org/10.3745/KTSDE.2013.2.12.855

A Study on the Implementation of SQL Primitives for Decision Tree Classification  

An, Hyoung Geun (울산대학교 LINC사업단)
Koh, Jae Jin (울산대학교 전기공학부)
Publication Information
KIPS Transactions on Software and Data Engineering / v.2, no.12, 2013 , pp. 855-864 More about this Journal
Abstract
Decision tree classification is one of the important problems in data mining fields and data minings have been important tasks in the fields of large database technologies. Therefore the coupling efforts of data mining systems and database systems have led the developments of database primitives supporting data mining functions such as decision tree classification. These primitives consist of the special database operations which support the SQL implementation of decision tree classification algorithms. These primitives have become the consisting modules of database systems for the implementations of the specific algorithms. There are two aspects in the developments of database primitives which support the data mining functions. The first is the identification of database common primitives which support data mining functions by analysis. The other is the provision of the extended mechanism for the implementations of these primitives as an interface of database systems. In data mining, some primitives want be stored in DBMS is one of the difficult problems. In this paper, to solve of the problem, we describe the database primitives which construct and apply the optimized decision tree classifiers. Then we identify the useful operations for various classification algorithms and discuss the implementations of these primitives on the commercial DBMS. We implement these primitives on the commercial DBMS and present experimental results demonstrating the performance comparisons.
Keywords
Data Mining; Primitive; Decision Tree; Classification;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Surajit Chaudhuri, "Data Mining and Database Systems: Where is the Intersection?," Data Engineering Bulletin, 21(1): 4-8, 1998.
2 R. Meo, G. Psaila, and S. Ceri. A New SQL-like Operators for Mining Association Rules. VLDB'96, pp. 122-133, Mumbai, India, Sept., 3-6, 1996. R.
3 A. Netz, S. Chaudhuri, J. Bernhardt, and U. M. Fayyad, "Integration of Data Mining with Database Technology," Proceedings of 26th International Conference on Very Large Data Bases, September 10-14, 2000.
4 Vipin Kumar, etc., Introduction to data mining, Addison-Wesley, May 12, 2005.
5 L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, "Classification and Regression Trees," Chapman and Hall, 1984.
6 M. Xu, J. Wang, and T. Chen, "Improved decision tree algorithm: ID3+," Intelligent Computing in Signal Processing and Pattern Recognition, Vol.345, pp.141-149, 2006.   DOI
7 M. Mehta, I. Rissanen, and R. Agrawal, "MDL-based Decision Tree Pruning," Proc. of Intl. Conf. on Knowledge Discovery in Databases and Data Mining, Montreal, Canada, 1995.
8 S. Chaudhuri, U. M. Fayyad, and J. Bernhardt, "Scalable Classification over SQL Databases," ICDE-99, pp.470-479, Sydney, Australia, 1999.
9 J. Gerhke, R. Ramakrishnan, and V. Ganti, "RainForest - A Framework for Fast Decision Tree Construction of Large Datasets," VLDB'98, pp.416-427, New York City, New York, USA, 1999.
10 S.B. Kotsiantis, D. Kanellopoulos and P.E. Pintelas, "Data Preprocessing for supervised learning," International Journal of Computer Science, Vol.1, No.2, 2006.
11 M. BenHajHmida and A. Congiusta, "Parallel, distributed, and grid-based data mining : algorithms, systems, and applications," Handbook of Research on Computational Grid, IGI Global, pp.90-119, May, 2009.
12 L. Zhou, Z. Zhang, and M. Xu, "Massive data mining based on item sequence set grid space," In Proceedings of the 2nd International Asia Conference on Informatics in Control, Automation and Robotics, pp.208-211, March, 2010.