• Title/Summary/Keyword: Lexical Dictionary

Search Result 41, Processing Time 0.022 seconds

Semantic Clustering of Predicates using Word Definition in Dictionary (사전 뜻풀이를 이용한 용언 의미 군집화)

  • Bae, Young-Jun;Choe, Ho-Seop;Song, Yoo-Hwa;Ock, Cheol-Young
    • Korean Journal of Cognitive Science
    • /
    • v.22 no.3
    • /
    • pp.271-298
    • /
    • 2011
  • The lexical semantic system should be built to grasp lexical semantic information more clearly. In this paper, we studied a semantic clustering of predicates that is one of the steps in building the lexical semantic system. Unlike previous studies that used argument of subcategorization(subject and object), selectional restrictions and interaction information of adverb, we used sense tagged definition in dictionary for the semantic clustering of predicate, and also attempted hierarchical clustering of predicate using the relationship between the generic concept and the specific concept. Most of the predicates in the dictionary were used for clustering. Total of 106,501 predicates(85,754 verbs, 20,747 adjectives) were used for the test. We got results of clustering which is 2,748 clusters of predicate and 130 recursive definition clusters and 261 sub-clusters. The maximum depth of cluster was 16 depth. We compared results of clustering with the Sejong semantic classes for evaluation. The results showed 70.14% of the cohesion.

  • PDF

Disambiguation of Homograph Suffixes using Lexical Semantic Network(U-WIN) (어휘의미망(U-WIN)을 이용한 동형이의어 접미사의 의미 중의성 해소)

  • Bae, Young-Jun;Ock, Cheol-Young
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.1 no.1
    • /
    • pp.31-42
    • /
    • 2012
  • In order to process the suffix derived nouns of Korean, most of Korean processing systems have been registering the suffix derived nouns in dictionary. However, this approach is limited because the suffix is very high productive. Therefore, it is necessary to analyze semantically the unregistered suffix derived nouns. In this paper, we propose a method to disambiguate homograph suffixes using Korean lexical semantic network(U-WIN) for the purpose of semantic analysis of the suffix derived nouns. 33,104 suffix derived nouns including the homograph suffixes in the morphological and semantic tagged Sejong Corpus were used for experiments. For the experiments first of all we semantically tagged the homograph suffixes and extracted root of the suffix derived nouns and mapped the root to nodes in the U-WIN. And we assigned the distance weight to the nodes in U-WIN that could combine with each homograph suffix and we used the distance weight for disambiguating the homograph suffixes. The experiments for 35 homograph suffixes occurred in the Sejong corpus among 49 homograph suffixes in a Korean dictionary result in 91.01% accuracy.

Korean Compound Noun Decomposition and Semantic Tagging System using User-Word Intelligent Network (U-WIN을 이용한 한국어 복합명사 분해 및 의미태깅 시스템)

  • Lee, Yong-Hoon;Ock, Cheol-Young;Lee, Eung-Bong
    • The KIPS Transactions:PartB
    • /
    • v.19B no.1
    • /
    • pp.63-76
    • /
    • 2012
  • We propose a Korean compound noun semantic tagging system using statistical compound noun decomposition and semantic relation information extracted from a lexical semantic network(U-WIN) and dictionary definitions. The system consists of three phases including compound noun decomposition, semantic constraint, and semantic tagging. In compound noun decomposition, best candidates are selected using noun location frequencies extracted from a Sejong corpus, and re-decomposes noun for semantic constraint and restores foreign nouns. The semantic constraints phase finds possible semantic combinations by using origin information in dictionary and Naive Bayes Classifier, in order to decrease the computation time and increase the accuracy of semantic tagging. The semantic tagging phase calculates the semantic similarity between decomposed nouns and decides the semantic tags. We have constructed 40,717 experimental compound nouns data set from Standard Korean Language Dictionary, which consists of more than 3 characters and is semantically tagged. From the experiments, the accuracy of compound noun decomposition is 99.26%, and the accuracy of semantic tagging is 95.38% respectively.

A Development of the Automatic Predicate-Argument Analyzer for Construction of Semantically Tagged Korean Corpus (한국어 의미 표지 부착 말뭉치 구축을 위한 자동 술어-논항 분석기 개발)

  • Cho, Jung-Hyun;Jung, Hyun-Ki;Kim, Yu-Seop
    • The KIPS Transactions:PartB
    • /
    • v.19B no.1
    • /
    • pp.43-52
    • /
    • 2012
  • Semantic role labeling is the research area analyzing the semantic relationship between elements in a sentence and it is considered as one of the most important semantic analysis research areas in natural language processing, such as word sense disambiguation. However, due to the lack of the relative linguistic resources, Korean semantic role labeling research has not been sufficiently developed. We, in this paper, propose an automatic predicate-argument analyzer to begin constructing the Korean PropBank which has been widely utilized in the semantic role labeling. The analyzer has mainly two components: the semantic lexical dictionary and the automatic predicate-argument extractor. The dictionary has the case frame information of verbs and the extractor is a module to decide the semantic class of the argument for a specific predicate existing in the syntactically annotated corpus. The analyzer developed in this research will help the construction of Korean PropBank and will finally play a big role in Korean semantic role labeling.

A Study on the Utilization Plan of Lexical Resources for Disaster and Safety Information Management Based on Current Status Analysis (재난안전정보 관리를 위한 어휘자원 현황분석 및 활용방안)

  • Jeong, Him-Chan;Kim, Tae-Young;Kim, Yong;Oh, Hyo-Jung
    • Journal of the Korean Society for information Management
    • /
    • v.34 no.2
    • /
    • pp.137-158
    • /
    • 2017
  • Disaster has a direct influence on the lives of the people, the body, and the property. For effective and rapid disaster responses, coordination process based on sharing and utilizing disaster information is the essential requirement Disaster and safety control agencies produce and manage heterogeneous information. They also develop and use word dictionaries individually. This is a major obstacle to retrieve and access disaster and safety information in terms of practitioners. To solve this problem, standardization of lexical resources related disaster and safety is essentially required. In this paper, we conducted current status analysis about lexical resources in disaster and safety domain. Consequently, we identified the characteristics according to lexical groups. And then we proposed the utilization plan of lexical resources for disaster and safety information management.

A minimal pair searching tool based on dictionary (사전 기반 최소대립쌍 검색 도구)

  • Kim, Tae-Hoon;Lee, Jae-Ho;Chang, Moon-Soo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.2
    • /
    • pp.117-122
    • /
    • 2014
  • The minimal pairs mean the pairs that have same phonotactics except just one sound in the sequences cause different lexical items. This paper proposes the searching tool of minimal pairs for efficiency of phonological researches with minimal pairs. We suggest a guide to develop Korean minimal pair searching programs by comparing to other programs. Proposing tool has user-friendly interface, minimizing key inputs, for linguistics who are not fluent in computer programs. And it serves the function which classifies the words in dictionary for the detailed researches. And for efficiency, it increases speed of dictionary loading by separating syllables through Unicode analysis, and optimizes dictionary structure for searching efficiency. The searching algorithm gains in speed by hashing algorithm using syllable counts. In our tool, the speed is improved more than earlier version about 5 times at converting dictionary and about 3 times at searching.

Alignment of Hypernym-Hyponym Noun Pairs between Korean and English, Based on the EuroWordNet Approach (유로워드넷 방식에 기반한 한국어와 영어의 명사 상하위어 정렬)

  • Kim, Dong-Sung
    • Language and Information
    • /
    • v.12 no.1
    • /
    • pp.27-65
    • /
    • 2008
  • This paper presents a set of methodologies for aligning hypernym-hyponym noun pairs between Korean and English, based on the EuroWordNet approach. Following the methods conducted in EuroWordNet, our approach makes extensive use of WordNet in four steps of the building process: 1) Monolingual dictionaries have been used to extract proper hypernym-hyponym noun pairs, 2) bilingual dictionary has converted the extracted pairs, 3) Word Net has been used as a backbone of alignment criteria, and 4) WordNet has been used to select the most similar pair among the candidates. The importance of this study lies not only on enriching semantic links between two languages, but also on integrating lexical resources based on a language specific and dependent structure. Our approaches are aimed at building an accurate and detailed lexical resource with proper measures rather than at fast development of generic one using NLP technique.

  • PDF

Cross-Enrichment of the Heterogenous Ontologies Through Mapping Their Conceptual Structures: the Case of Sejong Semantic Classes and KorLexNoun 1.5 (이종 개념체계의 상호보완방안 연구 - 세종의미부류와 KorLexNoun 1.5 의 사상을 중심으로)

  • Bae, Sun-Mee;Yoon, Ae-Sun
    • Language and Information
    • /
    • v.14 no.1
    • /
    • pp.165-196
    • /
    • 2010
  • The primary goal of this paper is to propose methods of enriching two heterogeneous ontologies: Sejong Semantic Classes (SJSC) and KorLexNoun 1.5 (KLN). In order to achieve this goal, this study introduces the pros and cons of two ontologies, and analyzes the error patterns found during the fine-grained manual mapping processes between them. Error patterns can be classified into four types: (1) structural defectives involved in node branching, (2) errors in assigning the semantic classes, (3) deficiency in providing linguistic information, and (4) lack of the lexical units representing specific concepts. According to these error patterns, we propose different solutions in order to correct the node branching defectives and the semantic class assignment, to complement the deficiency of linguistic information, and to increase the number of lexical units suitably allotted to their corresponding concepts. Using the results of this study, we can obtain more enriched ontologies by correcting the defects and errors in each ontology, which will lead to the enhancement of practicality for syntactic and semantic analysis.

  • PDF

Extending the MARTIF and TEI for Korean Lexical Entities (한국어사전 인코딩체계의 확장에 관한 연구: MARTIF와 TEI를 중심으로)

  • 백지원;최석두
    • Journal of the Korean Society for information Management
    • /
    • v.18 no.2
    • /
    • pp.295-322
    • /
    • 2001
  • The purpose of this study is to present a scheme to encode all possible lexical entities in dictionaries, glossaries, encyclopedias, and thesaurus, etc. First, it discussed the nature and structure of dictionaries. Second, two current major terminological data encoding schemes, MARTIF and TEI were analyzed in terms of their flexibility for extension to encompass all lexical entities. Third, an integrated microstructure of dictionaries was presented and compared with the MARTIF and TEI for print dictionaries. Then, the need and 17 suggestions for extended MARTIF and TEI formats were addressed with specific cases, which combined with the suggestions from two studies concerning MARTIF and TEI DTD modification for the markup of Korean dictionary entries.

  • PDF

Ranking Translation Word Selection Using a Bilingual Dictionary and WordNet

  • Kim, Kweon-Yang;Park, Se-Young
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.16 no.1
    • /
    • pp.124-129
    • /
    • 2006
  • This parer presents a method of ranking translation word selection for Korean verbs based on lexical knowledge contained in a bilingual Korean-English dictionary and WordNet that are easily obtainable knowledge resources. We focus on deciding which translation of the target word is the most appropriate using the measure of semantic relatedness through the 45 extended relations between possible translations of target word and some indicative clue words that play a role of predicate-arguments in source language text. In order to reduce the weight of application of possibly unwanted senses, we rank the possible word senses for each translation word by measuring semantic similarity between the translation word and its near synonyms. We report an average accuracy of $51\%$ with ten Korean ambiguous verbs. The evaluation suggests that our approach outperforms the default baseline performance and previous works.