Measurement of Document Similarity using Word and Word-Pair Frequencies

;;;

Proceedings of the IEEK Conference (대한전자공학회:학술대회논문집)

2003.07d
/
Pages.1311-1314
/
2003

The Institute of Electronics and Information Engineers (대한전자공학회)

Measurement of Document Similarity using Word and Word-Pair Frequencies

단어 및 단어쌍 별 빈도수를 이용한 문서간 유사도 측정

김혜숙 (전남대학교 전산학과) ;
박상철 (전남대학교 전산학과) ;
김수형 (전남대학교 전산학과)

Published : 2003.07.01

PDF

Download PDF

⟨ Previous Next ⟩

Abstract

In this paper, we propose a method to measure document similarity. First, we have exploited single-term method that extracts nouns by using a lexical analyzer as a preprocessing step to match one index to one noun. In spite of irrelevance between documents, possibility of increasing document similarity is high with this method. For this reason, a term-phrase method has been reported. This method constructs co-occurrence between two words as an index to measure document similarity. In this paper, we tried another method that combine these two methods to compensate the problems in these two methods. Six types of features are extracted from two input documents, and they are fed into a neural network to calculate the final value of document similarity. Reliability of our method has been proved by an experiment of document retrieval.

Proceedings of the IEEK Conference (대한전자공학회:학술대회논문집)

Measurement of Document Similarity using Word and Word-Pair Frequencies

단어 및 단어쌍 별 빈도수를 이용한 문서간 유사도 측정

Abstract

Keywords

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)