Classification and Analysis of Data Mining Algorithms

Lee, Jung-Won;Kim, Ho-Sook;Choi, Ji-Young;Kim, Hyon-Hee;Yong, Hwan-Seung;Lee, Sang-Ho;Park, Seung-Soo;

Journal of KIISE:Databases (한국정보과학회논문지:데이타베이스)

Volume 28 Issue 3
/
Pages.279-300
/
2001
/
1229-7739(pISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

Classification and Analysis of Data Mining Algorithms

데이터마이닝 알고리즘의 분류 및 분석

Lee, Jung-Won (Dept.of Computer, Ewah Wonan's University) ;
Kim, Ho-Sook (Dept.of Computer, Ewah Wonan's University) ;
Choi, Ji-Young (Dept.of Computer, Ewah Wonan's University) ;
Kim, Hyon-Hee (Dept.of Computer, Ewah Wonan's University) ;
Yong, Hwan-Seung (Dept.of Computer, Ewah Wonan's University) ;
Lee, Sang-Ho (Dept.of Computer, Ewah Wonan's University) ;
Park, Seung-Soo (Dept.of Computer, Ewah Wonan's University)

이정원 (이화여자대학교 컴퓨터학과) ;
김호숙 (이화여자대학교 컴퓨터학과) ;
최지영 (이화여자대학교 컴퓨터학과) ;
김현희 (이화여자대학교 컴퓨터학과) ;
용환승 (이화여자대학교 컴퓨터학과) ;
이상호 (이화여자대학교 컴퓨터학과) ;
박승수 (이화여자대학교 컴퓨터학과)

Published : 2001.09.01

PDF

Download PDF

⟨ Previous Next ⟩

Abstract

Data mining plays an important role in knowledge discovery process and usually various existing algorithms are selected for the specific purpose of the mining. Currently, data mining techniques are actively to the statistics, business, electronic commerce, biology, and medical area and currently numerous algorithms are being researched and developed for these applications. However, in a long run, only a few algorithms, which are well-suited to specific applications with excellent performance in large database, will survive. So it is reasonable to focus our effort on those selected algorithms in the future. This paper classifies about 30 existing algorithms into 7 categories - association rule, clustering, neural network, decision tree, genetic algorithm, memory-based reasoning, and bayesian network. First of all, this work analyzes systematic hierarchy and characteristics of algorithms and we present 14 criteria for classifying the algorithms and the results based on this criteria. Finally, we propose the best algorithms among some comparable algorithms with different features and performances. The result of this paper can be used as a guideline for data mining researches as well as field applications of data mining.

지식탐사 프로세스의 핵심적인 역할을 담당하는 데이터마이닝 단계에서는 여러 가지 목적에 따라 알고리즘을 선택하여 사용한다. 최근 통계, 비즈니스, 전자 상거래, 의학, 생물학 등의 분야에서 데이터마이닝 기술아 적극적으로 활용되고 있으며, 이를 위해 다양한 알고리즘들이 계속해서 연구.개발되고 있다. 그러나 시간이 지나면 이들 중 각 분야 별로 우수한 응용성을 보이는 알고리즘이나 방대한 양의 데이터를 다루는데 있어 좋은 성능을 보이는 몇몇 알고리즘만이 남게 될 것이며 또한 앞으로는 이러한 알고리즘들만을 선별하여 집중 연구할 필요가 있다. 따라서 본 논문에서는 데이터마이닝에 널리 사용되고 활발한 연구가 진행중인 알고리즘들 중에서 연관규칙(association rule), 클러스터링(clustering), 신경망(neural network), 결정트리(decision tree), 유전자 알고리즘(genetic algorithm), 베이지안 네트워크(bayesian network), 메모리 기반 추론(memory-based reasoning)등 7가지 카테고리에 속하는 알고리즘들을 선정하여 분류.분석하였다. 우선 각 알고리즘의 계통과 특성들을 분석하였고 이를 토대로 비교.분석을 위한 14가지의 분류 기준을 제시하였다. 이러한 분류 기준에 근거하여 세부 알고리즘들을 분석해 보고 비교 가능한 일부 알고리즘은 여러 특징과 성능을 중심으로 각각 최상의 알고리즘을 도출해 보았다. 본 연구 결과는 데이터마이닝 분야의 흔재된 알고리즘들을 분류.분석함으로써 마이닝 기술 적용시 사용자에게 알고리즘 선택의 지표를 제시할 수 있을 것이다.

Keywords

References

Michael J. A Berry, and Gorden Linoff, Data Mining Techniques : For Marketing, Sales, and Customer Support, John Wiley & Sons, Inc., 1997
R.Agrawal, T. Imielinski, and A. Swami. 'Mining association rules between sets of items in large databases,' In Proc. of the ACM SIGMOD Conference on Management of Data, pp. 207-216, Washington, D.C., May 1993 https://doi.org/10.1145/170036.170072
R. Agrawal and R. Srikant, 'Fast algorithms for mining association rules,' In Proc.of the 20th International Conference on Very Large Data Bases (VLDB94), pp. 487-499, Santiage, Chile, September 1994
Jong Soo Park, Ming Syan Chen and Philip S.Yu, 'Efficient parallel mining for association rules,' In the 4th International Conference on Information and Knowledge Management, pp. 31-36, Baltimore, MD, November 1995 https://doi.org/10.1145/221270.221320
Rakesh Agrawal and John C. Shafer, 'Parallel Mining of Association Rules,' IEEE Transations on Knowledge and Data Engineering, Vol. 8, No. 6, pp. 962-969, December 1996 https://doi.org/10.1109/69.553164
D. W. Cheung, J. Han, V. Ng, A. W. Fu and Y.Fu, 'A fast distribution algorithm for mining association rules,' International Conference on Parallel and Distributed Information Systems, Miami Beach, Florida, December 1996
Jung Soo Park, Ming-Syan Chen, and Philip S. Yu., 'An effective hash-based algorithm for mining association rules,' In Proc. of ACM SIGMOD Conference on Management of Data(SIGMOD'95), pp. 175-186, San Jose, California, May 1995 https://doi.org/10.1145/568271.223813
Ashok Savasere, Edward Omiecinski, and Shamkant Navathe, 'An effective algorithm for mining association rules in large databases,' In Proc. of the 21st International Conference on Very Large Data Bases (VLDB'95), pp. 432-444, Zurich, Swizerland, 1995
Hannu Toivonen, 'Sampling Large Database for Association rules,' In Proc. of the 22nd International Conference on Very Large Data Bases (VLDB'96), Mumbai(Bombay), India, 1996
D. W. Cheung, J. Han, V. Ng and C. Y. Wong, 'Maintenance of discovered association rules in large database : An incremental updating technique,' International Conference on Data Engineering, New Orleans, Louisiana, February 1996
Sergey Brin, Rajeev Motwani, Jeffrey D. Ulman, and Shalom Tsur., 'Dynamic Itemset Counting and Implication Rules for Market Basket Data,' In Proc. of ACM SIGMOD Conference on Management of Data (SIGMOD'97), pp. 255-264, 1997 https://doi.org/10.1145/253262.253325
Alexander Hinneburg, Daniel A. Keim, 'Clustering Techniques for Large Data Sets-From the Past to the Future,' In Proc. of ACM SGMOD International Conference on KDD, San Diego, CA, USA, August 1999 https://doi.org/10.1145/312179.312189
Anders L. Madsen, and Finn V. Jensen, Parallelization of Inference in Bayesian Networks, 1999
Raymond T. Ng, Jiawei Han, 'Efficient and Effective Clustering Method for Spatial Data Mining,' In Proc. of the VLDB Conference, Santiago, Chile, 20th Int, pp. 144-155, September 1994
Tian Zhang, Raghu Ramakrishnan, and Miron Livny, 'BIRCH : An Efficient Data Clustering Method for Very Large Databases,' In Proc. of the ACM SIGMOD Conference on Management of Data, Montreal, Canada, pp. 103-114, June 1996 https://doi.org/10.1145/235968.233324
Martin Ester, Hans-Peter Kriegel, Jorg Sander, and Xiaowei Xu, 'A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,' In Proc. of ACM SIGMOD 3rd International Conference on Knowledge Discovery and Data Mining, pp. 226-231, AAAI Press, 1996
Xiaowei Xu, Martin Ester, Hans-Peter Kriegel, and Jorg Sander, 'A Distribution- Based Clustering Algorithm for Mining in Large Spatial Databases,' In proc. of 14th International Conference on Data Engineering(ICDE), Orlando, Florida, USA, pp. 324-331, february 1998 https://doi.org/10.1109/ICDE.1998.655795
Sudipto Guha, Rajeev Rastogi, and Kyuseok Shim, 'CURE : An Efficient Clustering Algorithm for Large Databases,' In Proc. ot the ACM SIGMOD Conference on Management of Data, Seattle, Washinton, USA, pp. 73-84, May 1998 https://doi.org/10.1145/276304.276312
Alexander Hinneburg, and Daniel A.Keim, 'An Efficient Approach to Chustering in Large Multimedia Databases with Noise,' In proc. of 4th International Conference of Knowledge Discovery and Data Mining, New York, pp. 58-65, 1998
Mihael Ankerst, Markus M. Breuning, Hans-Peter Kriegel, and Jorg Sander, 'OPTICS: Ordering Points To Identify the Clustering Structure,' In proc. of ACM SIGMOD International Conference on Management of Data, Philadephia, Pennsylvania, USA, pp. 49-60, June 1999 https://doi.org/10.1145/304182.304187
Sudipto Guha, Rajeev Rastogi, and Kyuseok Shim, 'ROCK : A Robust Clustering Algorithm for Categorical Attributes,' In proc. of the 15th International Conference on Data Engineering (ICDE), Sydney, Austrialia, March 1999 https://doi.org/10.1109/ICDE.1999.754967
Minsky, M. and S. Pappert, Perceptrons, Cambridge : MIT Press, 1969
Specht, D. F., Probobilistic neural networks, Neural Networks, 1990
Mark J. L. Orr, Introduction to Radial Basis Function Networks, Edinburgh University, 1996
Kohenen, T, Learning Vector Quantization, Neural Networks, 1988
Specht, D. F., 'A Generalized Regression Neural Network,' IEEE Transactions on Neural Networks, 1991
J. P. Bigus, Data Mining with Neural Networks, McGraw-Hill, 1996
Kohonen, T., Self-Organizing Maps, 2nd Ed., Berlin: Springer-Verlag., 1997
http://ftp.sas.com/pub/neural/FAQ.html
J. Shafer, R. Agrawal, and M.Mehta, 'SPRINT: A scalable parallel classifier for data mining,' In proc. of the VLDB Conference, 1996
J.Gehtke, R. Ramakrishman, and V. Ganti, 'Rainforest - A framework for fast decision tree construction of large datasets,' In proc. of the VLDB Conference, 1996
R. Rastogi and K. Shim. 'PUBLIC: A decision tree classifier that integrates building and pruning,' In proc. of the VLDB Conference, 1998
Jhannnes Gehrke, Venkatesh Ganti, and Raghu Ramakrishnan. 'BOAT: Optimistic Decision Tree Construction,' In proc. of the ACM SIGMOD Conference on Management of Data, Philadelphia, 1999 https://doi.org/10.1145/304182.304197
David Heckerman, A Tutorial on Learning With Bayesian Networks, 1995
David Heckerman, 'Bayesian Networks for Knowledge Discovery,' in Advances in knowledge discovery and data mining, pp. 273-305, 1996
David Heckerman, and Michael P. Wellman, 'Bayesian Networks,' CACM Vol. 38, No. 3, 1995
John H.Holland, Adaptation in natural and artificial systems, Ann Arbor:the University of Michigan Press,1975
David Beasley,David R.Bull,and Ralph R.Martin 'An Overview of Genetic Algorithms:Part1, Fundamentals,' University Computing,15(2) pp.58-69, Inter-University Committee on Computing, 1993
David Beasley,David R.Bull and Ralph R.Martin 'An Overviw of Genetic Algorithms:Part2, Research Topics,' University Computing, 15(4) page170-181,1993
Koza John R, Genetic Programming : On the Programming of computers by means of Natural Selection, Cambridge,MA,MIT Press,1992. http://ailife.santafe.edu/~joke/encore/ www
Goldberg David.E, korb Bradley,and Deb K.'Messy Genetic Algorithms:Motivation, Analysis and Results,' TCGA Report 90005, May 1995. http://cs.felk.cvut.cz/~xobitko/ga
Pooja P.Mutalik,Leslie R.Knight,Joe L.Blanton, and Roger L.Wainwright 'Solving Combinational Optimization problems using parallel simulated annealing and parallel genetic algorithms,' ACM 0-89791-502-x/92/00002/ 1031,1992
H.Muchlenbein,' Parallel Genetic Algorithms, Population Genetics and combinatorial Optimization,' In Proc. of third International Conference on Genetic Algorithms, Morgan Kaufmann publisher,1989
Pretty,Chrisila B,Michael R Leuze, and john J.Grefenstette,'A Parallel genetic algorithm,' In Proc. of the 2nd International conference on Genetic Algorithms, pp. 155-161,1987
Kenneth De Jong,and Wiliam Spears, 'Learning Concept Classification Rules Using Genetic Algorithms,' In Proc. of the 12th International Joint Conference on Artificial Intelligence, pp.651-656, Morgan Kaufmann Publisher,1991
J.Bala, J.Huang, H.Vafaie, K.DeJong and H.Wechsler,' Hybrid Learning Using Genetic Algorithms and Decision Tree for Pattern Classification,' In Proc. ot the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI95), Volume I pp.719-724, August 1995
James D Kelly,and Lawrence Davis,' Hybridizing the Genetic Algorithms and the K Nearest Neighbors Classification Algorithms,' In Proc.of the 4th International Conference on Genetic Algorithms and their Applications, Morgan Kaufmann Publishers,1991
S.S.Anand,D.Patterson,J.G.hughes. and D.A.Bell, Discovering Case Knowledge using Data Mining, Northern Ireland knowledge engineering Laboratory, School of Information and software Engineering, University of Ulster.1998
Eliseo Reategui, John A. Campell, and Shirley Borghetti, 'Using a Neural Network to Learn General Knowledge in a Case-Based System,' Case-Based Reasoning Research and Development, 1995
John W. Sheppard and Steven L. Salzberg, 'Genetic Algorithms: Bootstrapping Memory-Based Learning with Genetic,' 12th National Conference on Artificial Intelligence, AAAI, Seattle, August 1994
Simoudis and James S. Miller, 'The Application of CBR to Help Desk Applications,' In Proc. of the DARPA Case-Based Reasoning Workshop, 1991
Kihong Park and Bob carter, 'On the Effectiveness of Genetic Search in Combinatorial Optimization ' ACM , 1995
W. D. Penny, and S. J. Roberts, 'Bayesian neural networks for classification: how useful is the evidence framework?,' Neural Networks 12, pp. 877-892, 1999 https://doi.org/10.1016/S0893-6080(99)00040-4
Peter Cheeseman, John Stutz, Bayesian Classification (AutoClass): Theory and Results, Advances in knowledge discovery and data mining, pp. 153-180, 1996
Wray Buntine, Graphical Models for Discovering Knowledge, Advances in knowledge discovery and data mining, pp. 59-82, 1996
Graphical Models for Discovering Knowledge,Advances in Knowledge discovery and data mining Wray Buntine

Journal of KIISE:Databases (한국정보과학회논문지:데이타베이스)

Classification and Analysis of Data Mining Algorithms

데이터마이닝 알고리즘의 분류 및 분석

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)