A Text-based Similarity Measure for Scientific Literature

Yoon, Seok-Ho;Kim, Sang-Wook;

doi:10.3745/KIPSTD.2011.18D.5.317

The KIPS Transactions:PartD (정보처리학회논문지D)

Volume 18D Issue 5
/
Pages.317-322
/
2011
/
1598-2866(pISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

A Text-based Similarity Measure for Scientific Literature

논문 데이터베이스를 위한 텍스트 기반 유사도 계산 방안

윤석호 (한양대학교 전자컴퓨터통신공학과) ;
김상욱 (한양대학교 정보통신대학 정보통신학부)

Received : 2011.06.15
Accepted : 2011.08.09
Published : 2011.10.31

https://doi.org/10.3745/KIPSTD.2011.18D.5.317 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

This paper addresses computing of similarity among papers using text-based measures. First, we analyze the accuracy of the similarities computed using different parts of a paper, and propose a method of Keyword-Extension, which is very useful when text information is incomplete. Via a series of experiments, we verify the effectiveness of Keyword-Extension.

본 논문에서는 기존 텍스트 기반 유사도 계산 방안을 이용해서 논문들 간의 유사도를 계산하는 방안에 대해서 논의한다. 먼저, 실험을 통해서 논문의 제목, 요약, 그리고 본문 중에서 어떤 부분이 유사도를 계산하는데 더 유용한지 확인하고 적절한 가중치를 부여한다. 두 번째로 논문의 텍스트 정보가 불완전한 상황에서 논문들 간의 유사도를 보다 정확하게 계산할 수 있는 키워드 확장 방안을 제안한다. 실제 논문 데이터베이스를 이용해서 제안하는 방안의 우수성을 검증한다.

Keywords

References

J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su, "ArnetMiner: Extraction and Mining of Academic Social Networks," In Proc. of ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, pp.990-998, 2008.
X. Liu, S. Yu, Y. Moreau, B. Moor, and W. Glanzel, "Hybrid Clustering of Text Mining and Bibliometrics Applied to Journal Sets," In Proc. of SIAM Int'l Conf. on Data Mining, pp.49-60, 2009.
J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2006.
S. Yoon, S. Kim, and S. Park. A link-based similarity measure for scientific literature. In Proc. of Int''l. Conf. on World Wide Web, pp.1213-1214, April, 2010.
R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Addison Wesley, 1999.
D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins, "Information diffusion through blogspace." In Proc. Int'l. Conf. on World Wide Web, pp.491-501, 2004.
T. Zhang, R. Ramakrishnam, and M. Livny, "BIRCH: an Efficient Data Clustering Method for Very Large Databases," In Proc. Int'l. Conf. on Management of Data, pp.103-114, 1996.

Cited by

A study on Similarity analysis of National R&D Programs using R&D Project's technical classification vol.13, pp.3, 2012, https://doi.org/10.9728/dcs.2012.13.3.317

The KIPS Transactions:PartD (정보처리학회논문지D)

A Text-based Similarity Measure for Scientific Literature

논문 데이터베이스를 위한 텍스트 기반 유사도 계산 방안

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)