Selection of Korean General Vocabulary for Machine Readable Dictionaries

;;;;

Language and Information (한국언어정보학회지:언어와정보)

Volume 7 Issue 1
/
Pages.41-54
/
2003
/
1226-7430(pISSN)

Korean Society for Language and Information (한국언어정보학회)

Selection of Korean General Vocabulary for Machine Readable Dictionaries

자연언어처리용 전자사전을 위한 한국어 기본어휘 선정

배희숙 (한국과학기술원 전문용언언어공학연구센터) ;
이주호 (한국과학기술원 전문용언언어공학연구센터) ;
시정곤 (한국과학기술원 전문용언언어공학연구센터) ;
최기선 (한국과학기술원 전문용언언어공학연구센터)

Published : 2003.06.01

PDF

Download PDF

⟨ Previous Next ⟩

Abstract

According to Jeong Ho-seong (1999), Koreans use an average of only 20% of the 508,771 entries of the Korean standard unabridged dictionary. To establish MRD for natural language processing, it is necessary to select Korean lexical units that are used frequently and are considered as basic words. In this study, this selection process is done semi-automatically using the KAIST large corpus. Among about 220,000 morphemes extracted from the corpus of 40,000,000 eojeols, 50,637 morphemes (54,797 senses) are selected. In addition, the coverage of these morphemes in various texts is examined with two sub-corpora of different styles. The total coverage is 91.21 % in formal style and 93.24% in informal style. The coverage of 6,130 first degree morphemes is 73.64% and 81.45%, respectively.

Language and Information (한국언어정보학회지:언어와정보)

Selection of Korean General Vocabulary for Machine Readable Dictionaries

자연언어처리용 전자사전을 위한 한국어 기본어휘 선정

Abstract

Keywords

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)