Browse > Article
http://dx.doi.org/10.9723/jksiis.2012.17.2.057

The Document Clustering using Multi-Objective Genetic Algorithms  

Lee, Jung-Song (전북대학교 컴퓨터공학부)
Park, Soon-Cheol (전북대학교 컴퓨터공학부)
Publication Information
Journal of Korea Society of Industrial Information Systems / v.17, no.2, 2012 , pp. 57-64 More about this Journal
Abstract
In this paper, the multi-objective genetic algorithm is proposed for the document clustering which is important in the text mining field. The most important function in the document clustering algorithm is to group the similar documents in a corpus. So far, the k-means clustering and genetic algorithms are much in progress in this field. However, the k-means clustering depends too much on the initial centroid, the genetic algorithm has the disadvantage of coming off in the local optimal value easily according to the fitness function. In this paper, the multi-objective genetic algorithm is applied to the document clustering in order to complement these disadvantages while its accuracy is analyzed and compared to the existing algorithms. In our experimental results, the multi-objective genetic algorithm introduced in this paper shows the accuracy improvement which is superior to the k-means clustering(about 20 %) and the general genetic algorithm (about 17 %) for the document clustering.
Keywords
Document Clustering; k-means clustering; Multi-Objective Genetic Algorithm; Genetic Algorithm;
Citations & Related Records
Times Cited By KSCI : 4  (Citation Analysis)
연도 인용수 순위
1 W. B. Croft, D. Metzler and T. Strohman, Search Engines Information Retrieval in Practice, Addison Wesley, 2009.
2 정영미, 정보 검색 연구, 구미무역, 2005.
3 J. B. MacQueen, "Some Methods for classification and Analysis of Multivariate Observation", Proc. 5th Berkeley Symp, vol. 1, pp. 281-297, 1967.
4 W. Song and S.C Park, "Genetic algorithm for text clustering based on latent semantic indexing", Computers & Mathematics with Applications, vol. 57, pp.1901-1907, 2009.   DOI
5 최임천, 쏭웨이, 박순철, "개체 구조에 따른 유전자 알고리즘 기반의 문서 클러스터링 성능 비교", 한국산업정보학회논문지, 제16권, 3호, pp. 47-56, 2011.   과학기술학회마을
6 김대희, 박상호, "분류시스템의 분류 규칙 발견을 위한 유전자 알고리즘", 한국산업정보학회논문지, 제9권, 4호, pp. 16-25, 2004.   과학기술학회마을
7 차성민, 권기호, "다중 개체군 유전자 알고리즘의 새로운 이주 방식", 정보과학회논문지, 제28권, 1호, pp. 26-30, 2001.   과학기술학회마을
8 A. Osyczka, Multicriteria Optimization for Engineering Design, New York Academic Press, 1985.
9 Censor. Y, "Pareto Optimality in Multiobjective Problems", Applied Mathematics & Optimization, vol. 4, pp. 41-59, 1977.   DOI
10 김갑환, 조정복, 고창성, 네트워크 모델과 다목적 GA, 한산, 2010.
11 K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, "A Fast Elitist Multiobjective Genetic algorithm: NSGA-II", IEEE Transaction on Evolutionary Computation, vol. 6, no. 2, pp. 182-197, 2002.   DOI
12 박순규, 이수복, 이원철, "다목적 최적화를 위한 Goal-Pareto 기반의 NSGA-II 알고리즘", 한국통신학회논문지, 제32권, 11호, pp. 1079-1085, 2007.   과학기술학회마을
13 최임천, 최경웅, 박순철, "An Automatic Semantic Term-Network Constriction System", Computer Science and its Applications, pp. 48-51, 2009
14 T. Calinski, and J. Harabasz, "A Dendrite Method for Cluster Analysis", Communications in Statistics, vol. 3, no. 1, 1974.
15 D. L. Davies and D. W. Bouldin, "A Cluster Separation measure", IEEE transactions on Pattern analysis and Machine Intelligene, vol. 1, no. 2, 1979.
16 문병로, 쉽게 배우는 유전 알고리즘-진화적 접근법, 한빛미디어, 2008.
17 D. Fragoudis, D. Meretakis, and S. Likothanassi, "Best terms: an efficient feature-selection algorithm for text categorization", Knowledge and Information, vol. 8, pp. 16-33, 2005.   DOI   ScienceOn