Browse > Article
http://dx.doi.org/10.6109/JKIICE.2009.13.12.2603

Document Clustering Method using Coherence of Cluster and Non-negative Matrix Factorization  

Kim, Chul-Won (호남대학교 컴퓨터공학과)
Park, Sun (전북대학교 BK21-전북 전자정보고급인력양성사업단)
Abstract
Document clustering is an important method for document analysis and is used in many different information retrieval applications. This paper proposes a new document clustering model using the clustering method based NMF(non-negative matrix factorization) and refinement of documents in cluster by using coherence of cluster. The proposed method can improve the quality of document clustering because the re-assigned documents in cluster by using coherence of cluster based similarity between documents, the semantic feature matrix and the semantic variable matrix, which is used in document clustering, can represent an inherent structure of document set more well. The experimental results demonstrate appling the proposed method to document clustering methods achieves better performance than documents clustering methods.
Keywords
document clustering; NMF:Non-negative Matrix Factorizat; coherence of cluster; document classification;
Citations & Related Records
연도 인용수 순위
  • Reference
1 W. Xu, X. Liu, Y. Gon, 'Document Clustering Based On Non-negative Matrix Factorization', Proceeding of Special Interest Group on Information Retrieval (SIGIR), 267-274, 2003
2 The 20 newsgroups data set. http://people.csail.mit. edu/jrennie/20Newsgroups/, 2007
3 J. Han, M. Kamber, 'Second Edition Data Mining Concepts and Techniques', Morgan Kaufman, 2006
4 X. Ji, W. Xu, S. Zhu, 'Document Clustering with Prior Knowledge', Proceeding of Special Interest Group on Information Retrieval (SIGIR), 405-412, 2006
5 H. J. Zeng, Q. C. He, Z. Chen, W. Y. Ma, J. Ma, 'Learning to Cluster Web Search Results', Proceeding of Special Interest Group on Information Retrieval (SIGIR), 210-217, 2004
6 D. D. Lee, H. S. Seung, 'Algorithms for non-negative matrix factorization', In Advances in Neural Information Processing Systems, vol.13, 556-562, 2001   ScienceOn
7 Y. Huang, T. M. Mitchell, 'Text Clustering with Extended User Feedback', Proceeding of Special Interest Group on Information Retrieval (SIGIR), 413-420, 2006
8 S. Basu, A.Banerjee, R. Mooney, 'Semi-supervised Clustering by Seeding', Proceeding of International Conference on Machine Learning (ICML), 19-26, 2002
9 B. Y. Ricardo, R. N. Berthier, 'Moden Information Retrieval', ACMPress, 1999
10 주길홍, 이원석, '효율적인 문서검색을 위한 레벨별 불용어 제거에 기반한 문서클러스터링', 컴퓨터교육학회 논문지 11권 3호, 2008.5
11 D. D. Lee, H. S. Seung, 'Learning the parts of objects by non-negative matrix factorization', Nature, vol.401, 788-791, 1999   DOI   ScienceOn
12 S. Chakrabarti, 'mining the web: Discovering Knowledge from Hypertext Data', Morgan Kaufmann Publishers, 2003