• Title/Summary/Keyword: Lexical Dictionary

Search Result 41, Processing Time 0.029 seconds

Construction of Vietnamese SentiWordNet by using Vietnamese Dictionary (베트남어 사전을 사용한 베트남어 SentiWordNet 구축)

  • Vu, Xuan-Son;Park, Seong-Bae
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.04a
    • /
    • pp.745-748
    • /
    • 2014
  • SentiWordNet is an important lexical resource supporting sentiment analysis in opinion mining applications. In this paper, we propose a novel approach to construct a Vietnamese SentiWordNet (VSWN). SentiWordNet is typically generated from WordNet in which each synset has numerical scores to indicate its opinion polarities. Many previous studies obtained these scores by applying a machine learning method to WordNet. However, Vietnamese WordNet is not available unfortunately by the time of this paper. Therefore, we propose a method to construct VSWN from a Vietnamese dictionary, not from WordNet. We show the effectiveness of the proposed method by generating a VSWN with 39,561 synsets automatically. The method is experimentally tested with 266 synsets with aspect of positivity and negativity. It attains a competitive result compared with English SentiWordNet that is 0.066 and 0.052 differences for positivity and negativity sets respectively.

YDK : A Thesaurus Developing System for Korean Language (한국어 통합정보사전 시스템)

  • Hwang, Do-Sam;Choi, Key-Sun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.9
    • /
    • pp.2885-2893
    • /
    • 2000
  • Dictionaries are indispensable for NLP(natural language processing) systems. Sophisticated algorithms in the NLP systems can be fully appreciated only with matching dictionaries that are built systematically based on computational linguistics. Only few dictionaries are developed for natural language processing. Available dictionaries are far from complete specifications for practical uses. So, it is necessary to develop an integrated information dictionary that includes useful lexical information for processing and understanding natural languages such as morphology and syntactic and semantic information. In this paper, we propose a method to build an integrated dictionary, and introduce a dictionary developing system.

  • PDF

English-Korean Transfer Dictionary Extension Tool in English-Korean Machine Translation System (영한 기계번역 시스템의 영한 변환사전 확장 도구)

  • Kim, Sung-Dong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.1
    • /
    • pp.35-42
    • /
    • 2013
  • Developing English-Korean machine translation system requires the construction of information about the languages, and the amount of information in English-Korean transfer dictionary is especially critical to the translation quality. Newly created words are out-of-vocabulary words and they appear as they are in the translated sentence, which decreases the translation quality. Also, compound nouns make lexical and syntactic analysis complex and it is difficult to accurately translate compound nouns due to the lack of information in the transfer dictionary. In order to improve the translation quality of English-Korean machine translation, we must continuously expand the information of the English-Korean transfer dictionary by collecting the out-of-vocabulary words and the compound nouns frequently used. This paper proposes a method for expanding of the transfer dictionary, which consists of constructing corpus from internet newspapers, extracting the words which are not in the existing dictionary and the frequently used compound nouns, attaching meaning to the extracted words, and integrating with the transfer dictionary. We also develop the tool supporting the expansion of the transfer dictionary. The expansion of the dictionary information is critical to improving the machine translation system but requires much human efforts. The developed tool can be useful for continuously expanding the transfer dictionary, and so it is expected to contribute to enhancing the translation quality.

Korean Semantic Role Labeling Using Case Frame Dictionary and Subcategorization (격틀 사전과 하위 범주 정보를 이용한 한국어 의미역 결정)

  • Kim, Wan-Su;Ock, Cheol-Young
    • Journal of KIISE
    • /
    • v.43 no.12
    • /
    • pp.1376-1384
    • /
    • 2016
  • Computers require analytic and processing capability for all possibilities of human expression in order to process sentences like human beings. Linguistic information processing thus forms the initial basis. When analyzing a sentence syntactically, it is necessary to divide the sentence into components, find obligatory arguments focusing on predicates, identify the sentence core, and understand semantic relations between the arguments and predicates. In this study, the method applied a case frame dictionary based on The Korean Standard Dictionary of The National Institute of the Korean Language; in addition, we used a CRF Model that constructed subcategorization of predicates as featured in Korean Lexical Semantic Network (UWordMap) for semantic role labeling. Automatically tagged semantic roles based on the CRF model, which established the information of words, predicates, the case-frame dictionary and hypernyms of words as features, were used. This method demonstrated higher performance in comparison with the existing method, with accuracy rate of 83.13% as compared to 81.2%, respectively.

Learning Rules for Identifying Hypernyms in Machine Readable Dictionaries (기계가독형사전에서 상위어 판별을 위한 규칙 학습)

  • Choi Seon-Hwa;Park Hyuk-Ro
    • The KIPS Transactions:PartB
    • /
    • v.13B no.2 s.105
    • /
    • pp.171-178
    • /
    • 2006
  • Most approaches for extracting hypernyms of a noun from its definitions in an MRD rely on lexical patterns compiled by human experts. Not only these approaches require high cost for compiling lexical patterns but also it is very difficult for human experts to compile a set of lexical patterns with a broad-coverage because in natural languages there are various expressions which represent same concept. To alleviate these problems, this paper proposes a new method for extracting hypernyms of a noun from its definitions in an MRD. In proposed approach, we use only syntactic (part-of-speech) patterns instead of lexical patterns in identifying hypernyms to reduce the number of patterns with keeping their coverage broad. Our experiment has shown that the classification accuracy of the proposed method is 92.37% which is significantly much better than that of previous approaches.

A Web-Based Multimedia Dictionary System Supporting Media Synchronization (미디어 동기화를 지원하는 웹기반 멀티미디어 전자사전 시스템)

  • Choi, Yong-Jun;Hwang, Do-Sam
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.8
    • /
    • pp.1145-1161
    • /
    • 2004
  • The purpose of this research is to establish a method for the construction of a multimedia electronic dictionary system by integrating the media data available from linguistic resources on the Internet. As the result of this study, existing text-oriented electronic dictionary systems can be developed into multimedia lexical systems with greater efficiency and effectiveness. A method is proposed to integrate the media data of linguistic resources on the Internet by a web browser. In the proposed method, a web browser carries out all the work related to integration of media data, and it does not need a dedicated server system. The system constructed by our web browser environment integrates text, image, and voice sources, and also can produce moving pictures. Each media is associated with the meaning of data so that the data integration and movement may be specified in the associations. SMIL documents are generated by analyzing the meaning of each data unit and they are executed in a web browser. The proposed system can be operated without a dedicated server system. And also, the system saves storage space by sharing the each media data distributed on the Internet, and makes it easier to update data.

  • PDF

A Method of Function-word Recognition by Relative Frequency (상대빈도를 이용한 문법형태소의 인식 방법)

  • 강승식
    • Korean Journal of Cognitive Science
    • /
    • v.10 no.2
    • /
    • pp.11-16
    • /
    • 1999
  • It is expected that some Josa/Eomi's are frequently used and others are not in the Korean documents. In this paper. we confirm it through the experiment and show that such information is very useful for Korean language processing. In case of Josa. most frequent 9 Josa's occupied 70% of total Josa's and 20. 32. 69 Josa's occupied 90%. 95%. and 99% respectively. Similarly, most frequent 10 numbers of Eomi's occupied 70% of total Eomi's and 33. 54. 117 Eomi's occupied 90%. 95%. and 99% respectively. We propose a dictionary construction method for Josa/Eomi dictionary that is classified by the frequency information. Furthermore. Josa/Eomi frequency results are very useful for the identification of unregistered morphemes and the disambiguation of lexical ambiguities.

  • PDF

Morphological Analysis with Adjacency Attributes and Phrase Dictionary (접속 특성과 말마디 사전을 이용한 형태소 분석)

  • Im, Gwon-Muk;Song, Man-Seok
    • The Transactions of the Korea Information Processing Society
    • /
    • v.1 no.1
    • /
    • pp.129-139
    • /
    • 1994
  • This paper presents a morphological analysis method for the Korean language. The characteristics and adjacency information of the words can be obtained from sentences in a large corpus. Generally a word can be analyzed to a result by applying the adjacency attributes and rules. However, we have to choose one from the several results for the ambiguous words. The collected morpheme's adjacency attributes and relations with neighbor words are recorded in a well designed dictionaries. With this information, abbreviated words as well as ambiguous words can be almost analyzed successfully. Efficiency of morphological analyzer depends on the information in the dictionaries. A morpheme dictionary and a phrase dictionary have been designed with lexical database, and necessary information extracted from the corpus is stored in the dictionaries.

  • PDF

Selection of Korean General Vocabulary for Machine Readable Dictionaries (자연언어처리용 전자사전을 위한 한국어 기본어휘 선정)

  • 배희숙;이주호;시정곤;최기선
    • Language and Information
    • /
    • v.7 no.1
    • /
    • pp.41-54
    • /
    • 2003
  • According to Jeong Ho-seong (1999), Koreans use an average of only 20% of the 508,771 entries of the Korean standard unabridged dictionary. To establish MRD for natural language processing, it is necessary to select Korean lexical units that are used frequently and are considered as basic words. In this study, this selection process is done semi-automatically using the KAIST large corpus. Among about 220,000 morphemes extracted from the corpus of 40,000,000 eojeols, 50,637 morphemes (54,797 senses) are selected. In addition, the coverage of these morphemes in various texts is examined with two sub-corpora of different styles. The total coverage is 91.21 % in formal style and 93.24% in informal style. The coverage of 6,130 first degree morphemes is 73.64% and 81.45%, respectively.

  • PDF

Korean Nominal Bank, Using Language Resources of Sejong Project (세종계획 언어자원 기반 한국어 명사은행)

  • Kim, Dong-Sung
    • Language and Information
    • /
    • v.17 no.2
    • /
    • pp.67-91
    • /
    • 2013
  • This paper describes Korean Nominal Bank, a project that provides argument structure for instances of the predicative nouns in the Sejong parsed Corpus. We use the language resources of the Sejong project, so that the same set of data is annotated with more and more levels of annotation, since a new type of a language resource building project could bring new information of separate and isolated processing. We have based on the annotation scheme based on the Sejong electronic dictionary, semantically tagged corpus, and syntactically analyzed corpus. Our work also involves the deep linguistic knowledge of syntaxsemantic interface in general. We consider the semantic theories including the Frame Semantics of Fillmore (1976), argument structure of Grimshaw (1990) and argument alternation of Levin (1993), and Levin and Rappaport Hovav (2005). Various syntactic theories should be needed in explaining various sentence types, including empty categories, raising, left (or right dislocation). We also need an explanation on the idiosyncratic lexical feature, such as collocation and etc.

  • PDF