[KSCI] Korea Science Citation Index Service

Enhancing Document Clustering Method using Synonym of Cluster Topic and Similarity

Park, Sun (Institute of Information Science and Engineering Research, Mokpo National University)
Kim, Kyung-Jun (Department of Computer Science, KAIST)
Lee, Jin-Seok (NIPA)
Lee, Seong-Ro (Department of Information and Electronics)

Publication Information

Journal of the Institute of Electronics Engineers of Korea SP / v.48, no.5, 2011 , pp. 30-38 More about this Journal

Abstract

This paper proposes a new enhancing document clustering method using a synonym of cluster topic and the similarity. The proposed method can well represent the inherent structure of document cluster set by means of selecting terms of cluster topic based on the semantic features by NMF. It can solve the problem of "bags of words" by using of expanding the terms of cluster topics which uses the synonyms of WordNet. Also, it can improve the quality of document clustering which uses the cosine similarity between the expanded cluster topic terms and document set to well cluster document with respect to the appropriation cluster. The experimental results demonstrate that the proposed method achieves better performance than other document clustering methods.

Keywords

document clustering; NMF, non-negative matrix factorization; semantic features; synonym; cosine similarity;

Citations & Related Records

Times Cited By KSCI : 3 (Citation Analysis)

Reference
Cited By KSCI

1	G. Miller, "WordNet: A lexical database for english", CACM, vol. 38(11), 1995, pp.39-41. DOI ScienceOn
2	The 20 newsgroups data set. http://people.csail.mit.edu/jrennie/20Newsgroups/, 2011.
3	W. Xu, X. Liu, Y. Gon, "Document Clustering Based On Non-negative Matrix Factorization", Proceeding of Special Interest Group on Information Retrieval (SIGIR), pp. 267-274, 2003.
4	S. Park, D. U. An, B. R. Char, C. W. Kim, "Document Clustering with Cluster Refinement and Non-negative Matrix Factorization", In proceeding of ICONIP'09, pp. 281-288, 2009.
5	박선, 김철원, "비음수 행렬 분해와 군집의 응집도를 이용한 문서군집", 한국해양정보통신학회 논문지, 제13권 제12호, 2603-2608쪽, 2009년.
6	박선, 김경준, "비음수 행렬 분해와 퍼지 관계를 이용한 문서군집", 한국항행학회 논문지, 제14권 제2호, 239-246쪽, 2010년.
7	S. Basu, A.Banerjee, R. Mooney, "Semi-supervised Clustering by Seeding", Proceeding of International Conference on Machine Learning (ICML), pp. 19-26, 2002.
8	박선, 안동언, "주성분 분석과 퍼지 연관을 이용한 문서군집 방법", 한국정보처리학회 논문지, 제17-B권, 제2호, 177-182쪽, 2010년.
9	한경한, 남경완, "한국어 정보 처리 입문 : 컴퓨터가 우리말을 이해하려면", 커뮤니케이션북스, 2007년.
10	W. B. Frankes, B. Y. Ricardo, "Information Retrieval : Data Structure & Algorithms", Prentice-Hall, 1992.
11	B. Y. Ricardo, R. N. Berthier, "Moden Information Retrieval", ACM Press, 1999.
12	X. Hu, X. Zhang, C. Lu, E. K. Park, X. Zhou, "Exploiting Wikipedia as External Knowledge for Document Clustering," In proceeding of 15th ACM SIGKDD Conference On Knowledge Discover and Data Mining (KDD'09), Paris, Fance, Jun. 2009. pp. 389-396
13	S. Chakrabarti, "mining the web: Discovering Knowledge from Hypertext Data", Morgan Kaufmann Publishers, 2003.
14	J. Han, M. Kamber, "Second Edition Data Mining Concepts and Techniques", Morgan Kaufman, 2006.
15	D. D. Lee, H. S. Seung, "Learning the parts of objects by non-negative matrix factorization," Nature, 401, pp. 788-791, Oct. 1999. DOI ScienceOn
16	T. Li, S. Ma, M. Ogihara, "Document Clustering via Adaptive Subspace Iteration", In proceeding of SIGIR'04, pp. 218-225, 2004.
17	F. Wang, C. Zhang, "Regularized Clustering for Documents", In proceeding of ACM SIGIR'07, pp. 95-102, 2007.

1	Enhancing Document Clustering using Important Term of Cluster and Wikipedia / [Park, Sun;Lee, Yeon-Woo;Jeong, Min-A;Lee, Seong-Ro;] / Journal of the Institute of Electronics Engineers of Korea SP
2	User-based Document Summarization using Non-negative Matrix Factorization and Wikipedia / [Park, Sun;Jeong, Min-A;Lee, Seong-Ro;] / Journal of the Institute of Electronics Engineers of Korea SP
3	Personalized Document Snippet Extraction Method using Fuzzy Association and Pseudo Relevance Feedback / [Park, Seon;Jo, Gwang-Mun;Yang, Hu-Yeol;Lee, Seong-Ro;] / Journal of the Institute of Electronics Engineers of Korea SP
4	Enhancing Document Clustering Using Term Re-weighting Based on Semantic Features / [Park, Sun;Kim, Kyungjun;Kim, Kyung Ho;Lee, Seong Ro;] / Journal of the Korea Institute of Information and Communication Engineering

KSCI

Enhancing Document Clustering Method using Synonym of Cluster Topic and Similarity 군집 주제의 유의어와 유사도를 이용한 문서군집 향상 방법

Enhancing Document Clustering Method using Synonym of Cluster Topic and Similarity