• 제목/요약/키워드: Dictionary Construction

Search Result 113, Processing Time 0.025 seconds

Construction of Local Terminology Dictionary in NM Imaging Report Forms

  • Hwang, Kyung-Hoon;Jeong, Ji-Young;Park, Kuk-Yang
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2010.04a
    • /
    • pp.352-352
    • /
    • 2010
  • It is difficult to settle the well-designed local terminology for imaging report in the hospital information system (HIS). One of the major reasons is the local terminology with poor contents have been used in the hospital. Thus, we mapped the locally used terms in nuclear medicine imaging report to the SNOMED-CT, which had been widely used in the electronic medical record system, for implementation of hospital information system. Preliminary construction of terminology dictionary was done by mapping of local terms to SNOMED-CT and LexCare Suite. Further study may be warranted.

Automatic Error Correction System for Erroneous SMS Strings (SMS 변형된 문자열의 자동 오류 교정 시스템)

  • Kang, Seung-Shik;Chang, Du-Seong
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.6
    • /
    • pp.386-391
    • /
    • 2008
  • Some spoken word errors that violate grammatical or writing rules occurs frequently in communication environments like mobile phone and messenger. These unexpected errors cause a problem in a language processing system for many applications like speech recognition, text-to-speech translation, and so on. In this paper, we proposed and implemented an automatic correction system of ill-formed words and word spacing errors in SMS sentences that has been the major errors of poor accuracy. We experimented three methods of constructing the word correction dictionary and evaluated the results of those methods. They are (1) manual construction of error words from the vocabulary list of ill-formed communication languages, (2) automatic construction of error dictionary from the manually constructed corpus, and (3) context-dependent method of automatic construction of error dictionary.

Construction and Evaluation of a Sentiment Dictionary Using a Web Corpus Collected from Game Domain (게임 도메인 웹 코퍼스를 이용한 감성사전 구축 및 평가)

  • Jeong, Woo-Young;Bae, Byung-Chull;Cho, Sung Hyun;Kang, Shin-Jin
    • Journal of Korea Game Society
    • /
    • v.18 no.5
    • /
    • pp.113-122
    • /
    • 2018
  • This paper describes an approach to building and evaluating a sentiment dictionary using a Web corpus in the game domain. To build a sentiment dictionary, we collected vocabulary based on game-related web documents from a domestic portal site, using the Twitter Korean Processor. From the collected vocabulary, we selected the words whose POS are tagged as either verbs or adjectives, and assigned sentiment score for each selected word. To evaluate the constructed sentiment dictionary, we calculated F1 score with precision and recall, using Korean-SWN that is based on English Senti-word Net(SWN). The evaluation results show that average F1 scores are 0.85 for adjectives and 0.77 for verbs, respectively.

Analysis of Signboard Characteristics and Dictionary Construction for Text Recognition in Signboard Images (간판영상의 텍스트 인식을 위한 영상데이터 특성 분석 및 사전 구축)

  • Lee, Myung-Hun;Yang, Hyung-Jeong;Kim, Soo-Hyung;Lee, Guee-Sang;Oh, Sang-Wook;Kim, Sun-Hee
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.11
    • /
    • pp.10-17
    • /
    • 2008
  • The sign recognition and translation offer information and support decision making for foreigners or city tourist. Collecting sign images and building words in signs are essential to train machine recognizers and to evaluate systems. In this paper, we analyze the characteristics of sign images. The collected sign images are about 1000 captured from difference conditions and locations. We also build a dictionary of words in 100,000 sign names.

Sentiment Dictionary Construction Based on Reason-Sentiment Pattern Using Korean Syntax Analysis (한국어 구문분석을 활용한 이유-감성 패턴 기반의 감성사전 구축)

  • Woo Hyun Kim;Heejung Lee
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.4
    • /
    • pp.142-151
    • /
    • 2023
  • Sentiment analysis is a method used to comprehend feelings, opinions, and attitudes in text, and it is essential for evaluating consumer feedback and social media posts. However, creating sentiment dictionaries, which are necessary for this analysis, is complex and time-consuming because people express their emotions differently depending on the context and domain. In this study, we propose a new method for simplifying this procedure. We utilize syntax analysis of the Korean language to identify and extract sentiment words based on the Reason-Sentiment Pattern, which distinguishes between words expressing feelings and words explaining why those feelings are expressed, making it applicable in various contexts and domains. We also define sentiment words as those with clear polarity, even when used independently and exclude words whose polarity varies with context and domain. This approach enables the extraction of explicit sentiment expressions, enhancing the accuracy of sentiment analysis at the attribute level. Our methodology, validated using Korean cosmetics review datasets from Korean online shopping malls, demonstrates how a sentiment dictionary focused solely on clear polarity words can provide valuable insights for product planners. Understanding the polarity and reasons behind specific attributes enables improvement of product weaknesses and emphasis on strengths. This approach not only reduces dependency on extensive sentiment dictionaries but also offers high accuracy and applicability across various domains.

A Development of the Automatic Predicate-Argument Analyzer for Construction of Semantically Tagged Korean Corpus (한국어 의미 표지 부착 말뭉치 구축을 위한 자동 술어-논항 분석기 개발)

  • Cho, Jung-Hyun;Jung, Hyun-Ki;Kim, Yu-Seop
    • The KIPS Transactions:PartB
    • /
    • v.19B no.1
    • /
    • pp.43-52
    • /
    • 2012
  • Semantic role labeling is the research area analyzing the semantic relationship between elements in a sentence and it is considered as one of the most important semantic analysis research areas in natural language processing, such as word sense disambiguation. However, due to the lack of the relative linguistic resources, Korean semantic role labeling research has not been sufficiently developed. We, in this paper, propose an automatic predicate-argument analyzer to begin constructing the Korean PropBank which has been widely utilized in the semantic role labeling. The analyzer has mainly two components: the semantic lexical dictionary and the automatic predicate-argument extractor. The dictionary has the case frame information of verbs and the extractor is a module to decide the semantic class of the argument for a specific predicate existing in the syntactically annotated corpus. The analyzer developed in this research will help the construction of Korean PropBank and will finally play a big role in Korean semantic role labeling.

Named Entity Recognition and Dictionary Construction for Korean Title: Books, Movies, Music and TV Programs (한국어 제목 개체명 인식 및 사전 구축: 도서, 영화, 음악, TV프로그램)

  • Park, Yongmin;Lee, Jae Sung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.7
    • /
    • pp.285-292
    • /
    • 2014
  • A named entity recognition method is used to improve the performance of information retrieval systems, question answering systems, machine translation systems and so on. The targets of the named entity recognition are usually PLOs (persons, locations and organizations). They are usually proper nouns or unregistered words, and traditional named entity recognizers use these characteristics to find out named entity candidates. The titles of books, movies and TV programs have different characteristics than PLO entities. They are sometimes multiple phrases, one sentence, or special characters. This makes it difficult to find the named entity candidates. In this paper we propose a method to quickly extract title named entities from news articles and automatically build a named entity dictionary for the titles. For the candidates identification, the word phrases enclosed with special symbols in a sentence are firstly extracted, and then verified by the SVM with using feature words and their distances. For the classification of the extracted title candidates, SVM is used with the mutual information of word contexts.

A Postprocessing Method of Korean Character Recognition by Mis-recognized Morphology Presumption (오인식 형태소 추정에 의한 한국어 문자 인식 후처리 기법)

  • Kim, Young-Hun;Lee, Young-Hwa;Lee, Sang-Jo
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.36C no.7
    • /
    • pp.46-55
    • /
    • 1999
  • We proposed the new method of postprocessing which not only reduces the frequency of dictionary access using morphological analysis but improve the recognition rate of character recognizer. In this paper, after estimating morphological construction of mis-recognized word using the part of speech that is analyzed, correct presumed mis-recognized morphology. The postprocessing using a morphology unit reduce candidate because of short than word and frequency of dictionary access because there is no need to morphological analysis for candidate. To select right candidate is only necessary to dictionary access. The proposed results show that reduced the frequency of dictionary access to 60% than postprocessing method using a word unit and recognition rate improved from 94% to 97%.

  • PDF