Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2010.17B.2.177

Document Clustering Method using PCA and Fuzzy Association  

Park, Sun (전북대학교 전기전자정보인력양성사업단)
An, Dong-Un (전북대학교 전기전자컴퓨터공학부)
Abstract
This paper proposes a new document clustering method using PCA and fuzzy association. The proposed method can represent an inherent structure of document clusters better since it select the cluster label and terms of representing cluster by semantic features based on PCA. Also it can improve the quality of document clustering because the clustered documents by using fuzzy association values distinguish well dissimilar documents in clusters. The experimental results demonstrate that the proposed method achieves better performance than other document clustering methods.
Keywords
Document Clustering; Principal Component Analysis; Semantic Features; Fuzzy Association;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 T. Li, S. Ma, M. Ogihara, "Document Clustering via Adaptive Subspace Iteration," In proceeding of SIGIR'04, pp.218-225, 2004.   DOI
2 이창범, 김민수, 이기호, 이귀상, 박혁로. “주성분 분석을 이용한 문서 주제어 추출”, 정보과학회논문지 : 소프트웨어 및 응용 제 29권 제 10호, 2002.   과학기술학회마을
3 The 20 newsgroups data set. http://people.csail.mit.edu/jrennie/20Newsgroups/, 2009.
4 S. Basu, A.Banerjee, R. Mooney, “Semi-supervised Clustering by Seeding,” Proceeding of International Conference on Machine Learning (ICML), pp.19-26, 2002.
5 S. Chakrabarti, “mining the web: Discovering Knowledge from Hypertext Data,” Morgan Kaufmann Publishers, 2003.
6 W. B. Frankes, B. Y. Ricardo, “Information Retrieval : Data Structure & Algorithms,” Prentice-Hall, 1992.
7 X. Ji, W. Xu, S. Zhu, “Document Clustering with Prior Knowledge”, Proceeding of Special Interest Group on Information Retrieval (SIGIR), pp.405-412, 2006.   DOI
8 R. A. Johnson, D. W. Wichern, Applied Multivariate Statistical Analysis 5th ed., Prentice hall, 2007.
9 J. Han, M. Kamber, “Second Edition Data Mining Concepts and Techniques,” Morgan Kaufman, 2006.
10 C. Haruechaiyasak, M. L. Shyu, S. C. Chen, "Web Document Classification Based on Fuzzy Association," In proceedings of the 25th Annual International Computer Software and Applications Conference (COMPSAC'02), 2002.   DOI
11 Y. Huang, T. M. Mitchell, “Text Clustering with Extended User Feedback”, Proceeding of Special Interest Group on Information Retrieval (SIGIR), pp.413-420, 2006.   DOI
12 S. Park, D. U. An, B. R. Char, C. W. Kim, "Document Clustering with Cluster Refinement and Non-negative Matrix Factorization," In proceeding of ICONIP'09, pp.281-288, 2009.   DOI   ScienceOn
13 B. Y. Ricardo, R. N. Berthier, “Moden Information Retrieval,” ACM Press, 1999.
14 F. Wang, C. Zhang, "Regularized Clustering for Documents," In proceeding of ACM SIGIR'07, 95-102, 2007.   DOI
15 W. Xu, X. Liu, Y. Gon, “Document Clustering Based On Non-negative Matrix Factorization,” Proceeding of Special Interest Group on Information Retrieval (SIGIR), pp.267-274, 2003.   DOI
16 L. A. Zadeh, "Fuzzy Sets, in Dubois, D., Prade, H. and Yager, R. R. editiors, Readings in Fuzzy Sets for Intelligent Systems," Morgan Kaufmann Publiishers, 1993.
17 H. J. Zeng, Q. C. He, Z. Chen, W. Y. Ma, J. Ma, “Learning to Cluster Web Search Results,” Proceeding of Special Interest Group on Information Retrieval (SIGIR), 210-217, 2004.   DOI