Word Sense Disambiguation based on Concept Learning with a focus on the Lowest Frequency Words

Kim Dong-Sung;Choe Jae-Woong;

Language and Information (한국언어정보학회지:언어와정보)

Volume 10 Issue 1
/
Pages.21-46
/
2006
/
1226-7430(pISSN)

Korean Society for Language and Information (한국언어정보학회)

Word Sense Disambiguation based on Concept Learning with a focus on the Lowest Frequency Words

저빈도어를 고려한 개념학습 기반 의미 중의성 해소

Kim Dong-Sung (Korea University) ;
Choe Jae-Woong (Korea University)

김동성 (고려대학교) ;
최재웅 (고려대학교 언어학과)

Published : 2006.06.01

PDF

Download PDF

⟨ Previous Next ⟩

Abstract

This study proposes a Word Sense Disambiguation (WSD) algorithm, based on concept learning with special emphasis on statistically meaningful lowest frequency words. Previous works on WSD typically make use of frequency of collocation and its probability. Such probability based WSD approaches tend to ignore the lowest frequency words which could be meaningful in the context. In this paper, we show an algorithm to extract and make use of the meaningful lowest frequency words in WSD. Learning method is adopted from the Find-Specific algorithm of Mitchell (1997), according to which the search proceeds from the specific predefined hypothetical spaces to the general ones. In our model, this algorithm is used to find contexts with the most specific classifiers and then moves to the more general ones. We build up small seed data and apply those data to the relatively large test data. Following the algorithm in Yarowsky (1995), the classified test data are exhaustively included in the seed data, thus expanding the seed data. However, this might result in lots of noise in the seed data. Thus we introduce the 'maximum a posterior hypothesis' based on the Bayes' assumption to validate the noise status of the new seed data. We use the Naive Bayes Classifier and prove that the application of Find-Specific algorithm enhances the correctness of WSD.

Language and Information (한국언어정보학회지:언어와정보)

Word Sense Disambiguation based on Concept Learning with a focus on the Lowest Frequency Words

저빈도어를 고려한 개념학습 기반 의미 중의성 해소

Abstract

Keywords

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)