• Title/Summary/Keyword: word dictionary

Search Result 276, Processing Time 0.021 seconds

Automatic Construction of Foreign Word Transliteration Dictionary from English-Korean Parallel Corpus (영-한 병렬 코퍼스로부터 외래어 표기 사전의 자동 구축)

  • Lee, Jae Sung
    • The Journal of Korean Association of Computer Education
    • /
    • v.6 no.2
    • /
    • pp.9-21
    • /
    • 2003
  • This paper proposes an automatic construction system for transliteration dictionary from English-Korean parallel corpus. The system works in 3 steps: it extracts all nouns from Korean documents as the first step, filters transliterated foreign word nouns out of them with the language identification method as the second step, and extracts the corresponding English words by using a probabilistic alignment method as the final step. Specially, the fact that there is a corresponding English word in most cases, is utilized to extract the purely transliterated part from a Koreans word phrase, which is usually used in combined forms with Korean endings(Eomi) or particles(Josa). Moreover, the direct phonetic comparison is done to the words in two different alphabet systems without converting them to the same alphabet system. The experiment showed that the performance was influenced by the first and the second preprocessing steps; the most efficient model among manually preprocessed ones showed 85.4% recall, 91.0% precision and the most efficient model among fully automated ones got 68.3% recall, 89.2% precision.

  • PDF

Korean Word Learning System Using Automatic Question Generation Technique (자동 문제 생성 기술을 이용한 한국어 어휘학습시스템)

  • Choe, Su-Il;Im, Ji-Hui;Choe, Ho-Seop;Ock, Cheol-Young
    • Korean Journal of Cognitive Science
    • /
    • v.17 no.4
    • /
    • pp.271-286
    • /
    • 2006
  • In this paper, we introduce automatic question generation technique using the language resources like User-Word Intelligent Network(U-WIN) and Korean dictionary including quite a for of information. And we present Korean word learning system with this technique. The item pool method which almost learning-system are using makes some problems. As a solution of the problems, we classified into 8 question type and implemented the Korean word learning system which is making the Korean question automatically by using the morphological and semantic information according to the automatic question generation pattern of each type.

  • PDF

중국 코퍼스와 인터넷을 이용한 중한사전 표제어의 오류 연구 - F2-1을 중심으로

  • Baek, Jong-In
    • 중국학논총
    • /
    • no.63
    • /
    • pp.47-64
    • /
    • 2019
  • 当今在韩国流通的中韩词典收词颇多, 但词典里翻开哪已叶不难发现令人莫名其妙的词汇, 而且这些词汇当中有的甚至连汉语大词典里都找不到. 我们发现这些词汇里往往出现解释有误的问题. 本文主要探讨了这些解释有误词汇. 为此, 先从中韩词典里筛选出在现代汉语语料库中出现的次数少于十次的词汇. 我们认为此文里筛选出的这些词汇很可能不太正规或现在不怎幺使用. 为了使这种推测能得到更准确的印证, 作者在百度网上又检索了是否出现它们的用例, 之后, 就发现这些词汇确实存在各种问题, 需要校正这些解释有误的词汇. 本文以F2-1部分一千五百个词条为研究对象进行了适当性调查. 通过这次研究发现F2-1部分低频率词条有348个词, 其中45个词有各种问题. 值得探讨的是在汉韩词典里对这些低频率词条的说明出现不少错误, 许多词汇根本不适合被收录到词典里. 我们把这些带错误的词汇分成三各部分 : 1. 词汇解释有误, 2. 漏意味项, 3. 其他错误, 进行讨论. 我们将要继续研究其他项目的词条. 希望这些研究对中韩词典的编辑有所帮助.

An Empirical Study on Quality Improvement by Data Standardization for Distributed Goods (유통 상품의 데이터 품질 관리를 위한 데이터 표준화에 대한 연구)

  • Song, Jang-Seop;Rhew, Sung-Yul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.9
    • /
    • pp.101-109
    • /
    • 2013
  • Data quality management is extremely important. In this study, we proposed data standardization for effective quality management of enterprise-owned data about distributed goods and validated its effectiveness by case study. For the standardization of data, we designed data category and data dictionary. Additionally, we categorized data and identified its attributes for data category design, and we developed design process for data dictionary and built the dictionary of word, term, domain and code for data dictionary design. And then we proposed output documents which have to be written for data standardization. Proposed data standardization approach is validated its efficiency by quantitative and qualitative measurement. and as a result the data quality of the data standardization improved 24% and the data quality of the consistency of the data dictionary improved 7%.

Automatic Mapping Between Large-Scale Heterogeneous Language Resources for NLP Applications: A Case of Sejong Semantic Classes and KorLexNoun for Korean

  • Park, Heum;Yoon, Ae-Sun
    • Language and Information
    • /
    • v.15 no.2
    • /
    • pp.23-45
    • /
    • 2011
  • This paper proposes a statistical-based linguistic methodology for automatic mapping between large-scale heterogeneous languages resources for NLP applications in general. As a particular case, it treats automatic mapping between two large-scale heterogeneous Korean language resources: Sejong Semantic Classes (SJSC) in the Sejong Electronic Dictionary (SJD) and nouns in KorLex. KorLex is a large-scale Korean WordNet, but it lacks syntactic information. SJD contains refined semantic-syntactic information, with semantic labels depending on SJSC, but the list of its entry words is much smaller than that of KorLex. The goal of our study is to build a rich language resource by integrating useful information within SJD into KorLex. In this paper, we use both linguistic and statistical methods for constructing an automatic mapping methodology. The linguistic aspect of the methodology focuses on the following three linguistic clues: monosemy/polysemy of word forms, instances (example words), and semantically related words. The statistical aspect of the methodology uses the three statistical formulae ${\chi}^2$, Mutual Information and Information Gain to obtain candidate synsets. Compared with the performance of manual mapping, the automatic mapping based on our proposed statistical linguistic methods shows good performance rates in terms of correctness, specifically giving recall 0.838, precision 0.718, and F1 0.774.

  • PDF

Constructing Ontology based on Korean Parts of Speech and Applying to Vehicle Services (한국어 품사 기반 온톨로지 구축 방법 및 차량 서비스 적용 방안)

  • Cha, Si-Ho;Ryu, Minwoo
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.17 no.4
    • /
    • pp.103-108
    • /
    • 2021
  • Knowledge graph is a technology that improves search results by using semantic information based on various resources. Therefore, due to these advantages, the knowledge graph is being defined as one of the core research technologies to provide AI-based services recently. However, in the case of the knowledge graph, since the form of knowledge collected from various service domains is defined as plain text, it is very important to be able to analyze the text and understand its meaning. Recently, various lexical dictionaries have been proposed together with the knowledge graph, but since most lexical dictionaries are defined in a language other than Korean, there is a problem in that the corresponding language dictionary cannot be used when providing a Korean knowledge service. To solve this problem, this paper proposes an ontology based on the parts of speech of Korean. The proposed ontology uses 9 parts of speech in Korean to enable the interpretation of words and their semantic meaning through a semantic connection between word class and word class. We also studied various scenarios to apply the proposed ontology to vehicle services.

Open Korean WordNet (KWN): Dictionary-based Semi-Automatic Development (한국어 오픈 워드넷 (KWN) : 사전 기반의 반자동 구축)

  • Lee, In Keun;Hwang, Dosam;Hahm, Younggyun;Choi, Key-Sun
    • Annual Conference on Human and Language Technology
    • /
    • 2014.10a
    • /
    • pp.193-196
    • /
    • 2014
  • 본 논문에서는 사전자원에 기반한 한국어 워드넷(Open Korean WordNet: KWN)의 반자동 구축 방법을 제안한다. 제안한 방법에서는 각 전문분야별로 분류된 영어-한국어 대역사전, 일본어-한국어 대역사전을 이용하여 영어 워드넷(Princeton WordNet 3.0)과 일본어 워드넷(Japanese WordNet 1.1)의 어휘를 번역하였다. 그리고 번역 결과의 애매성을 해소하기 위하여, (1)영어와 일본어에 대한 한국어 대역어의 중복 여부, (2)사전의 분야 정보와 워드넷의 계층구조를 고려하였다. 제안한 방법으로 117,659 개의 워드넷 synset 중 63,221 개(약 54 %)의 synset에 대한 자동번역을 수행하여 한국어 워드넷을 구축하였다. 그리고 워드넷 synset의 정의문은 한국어 사전의 정의문을 참조하여 한글화 할 수 있도록 하고, 이 과정을 지원하기 위한 정의문 추천 알고리즘을 제안한다. 제안한 방법에 기반하여 전문가들이 상호 협력하여 한국어 워드넷을 구축할 수 있는 시스템을 개발한다.

  • PDF

A Framework for WordNet-based Word Sense Disambiguation (워드넷 기반의 단어 중의성 해소 프레임워크)

  • Ren, Chulan;Cho, Sehyeong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.23 no.4
    • /
    • pp.325-331
    • /
    • 2013
  • This paper a framework and method for resolving word sense disambiguation and present the results. In this work, WordNet is used for two different purposes: one as a dictionary and the other as an ontology, containing the hierarchical structure, representing hypernym-hyponym relations. The advantage of this approach is twofold. First, it provides a very simple method that is easily implemented. Second, we do not suffer from the lack of large corpus data which would have been necessary in a statistical method. In the future this can be extended to incorporate other relations, such as synonyms, meronyms, and antonyms.

Product Evaluation Summarization Through Linguistic Analysis of Product Reviews (상품평의 언어적 분석을 통한 상품 평가 요약 시스템)

  • Lee, Woo-Chul;Lee, Hyun-Ah;Lee, Kong-Joo
    • The KIPS Transactions:PartB
    • /
    • v.17B no.1
    • /
    • pp.93-98
    • /
    • 2010
  • In this paper, we introduce a system that summarizes product evaluation through linguistic analysis to effectively utilize explosively increasing product reviews. Our system analyzes polarities of product reviews by product features, based on which customers evaluate each product like 'design' and 'material' for a skirt product category. The system shows to customers a graph as a review summary that represents percentages of positive and negative reviews. We build an opinion word dictionary for each product feature through context based automatic expansion with small seed words, and judge polarity of reviews by product features with the extracted dictionary. In experiment using product reviews from online shopping malls, our system shows average accuracy of 69.8% in extracting judgemental word dictionary and 81.8% in polarity resolution for each sentence.