• Title/Summary/Keyword: Dictionary Construction

Search Result 111, Processing Time 0.032 seconds

Construction of an Efficient Pre-analyzed Dictionary for Korean Morphological Analysis (한국어 형태소 분석을 위한 효율적 기분석 사전의 구성 방법)

  • Kwak, Sujeong;Kim, Bogyum;Lee, Jae Sung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.12
    • /
    • pp.881-888
    • /
    • 2013
  • A pre-analyzed dictionary is used to increase the speed and the accuracy of morphological analyzers and to decrease the over-generation. However, if the dictionary includes 'Insufficiently-analyzed word-phrases', which do not include all the possible analysis of the word-phrase, it may cause the decrease of the analysis accuracy. In this paper, we measure the accuracy changes according to the number of word-phrase frequency and the size changes of corpus by Sejong corpus. And performance of integrate system(SMA with pre-dictionary) is highest when sufficient analysis rate of pre-dictionary is more than 99.82%. Also pre-dictionary is constructed with word-phrase that frequency more than 32(64) when size of corpus is 1,600,000(6,300,000) word-phrase.

Development of Semi-automatic Construction Tool for Named Entity Dictionary based on Active Learning (능동 학습 기법을 활용한 개체명 사전 반자동 구축 도구 개발)

  • Yun, Bo-Hyun;Oh, Hyo-Jung
    • The Journal of Korean Association of Computer Education
    • /
    • v.18 no.6
    • /
    • pp.81-88
    • /
    • 2015
  • Along with advent of Web 3.0 era and advanced technologies of IoT(Internet of Things), massive amounts of information are generated. Reflecting this trend, this paper developed a semi-automatic construction tool for named entity dictionary based on active learning. Our proposed method chose error candidates to verify among the preliminary results using initial trained model and re-trained the model for correctly labeled data by user. We adopt active learning approach for minimizing human effort utilized metadata features of Wikipedia. Based on experimental results using our tool, we show that 68.6% errors were automatically corrected.

Construction of the Terminology Dictionary for National R&D Information Utilization (국가R&D정보활용을 위한 전문용어사전 구축)

  • Kim, Tae-Hyun;Yang, Myung-Seok;Choi, Kwang-Nam
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.10
    • /
    • pp.217-225
    • /
    • 2019
  • National research and development(R&D) information is information generated in the process of performing R&D based on programs and projects issued by national government departments, and includes information from various research fields as ordered by various departments. Therefore, for efficient R&D information retrieval, it is necessary to build a national R&D terminology dictionary that can reflect the characteristics of such national R&D information. In this study, we propose a method for constructing a national R&D terminology dictionary by applying the classification of science and technology standards used to specify the research field in national R&D information. We will discuss the structural characteristics of national R&D project information and the usefulness of the project keyword, and explain the status of national R&D information by the National Standard Science and Technology Classification(NSSTC) Codes and the characteristics of the national R&D terminologies. Based on this, a method for building a national R&D terminology dictionary is defined in terms of the type and structure of the terminology dictionary, preliminary construction procedures, and refining rules. The national R&D terminology dictionary built on the basis of this study can be used in various ways such as expansion of search terms using Korean-English equivalent words and synonyms when searching national R&D information, clarifying the scope of search using NSSTC, and providing user convenience functions using term explanation information.

English-Korean Transfer Dictionary Extension Tool in English-Korean Machine Translation System (영한 기계번역 시스템의 영한 변환사전 확장 도구)

  • Kim, Sung-Dong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.1
    • /
    • pp.35-42
    • /
    • 2013
  • Developing English-Korean machine translation system requires the construction of information about the languages, and the amount of information in English-Korean transfer dictionary is especially critical to the translation quality. Newly created words are out-of-vocabulary words and they appear as they are in the translated sentence, which decreases the translation quality. Also, compound nouns make lexical and syntactic analysis complex and it is difficult to accurately translate compound nouns due to the lack of information in the transfer dictionary. In order to improve the translation quality of English-Korean machine translation, we must continuously expand the information of the English-Korean transfer dictionary by collecting the out-of-vocabulary words and the compound nouns frequently used. This paper proposes a method for expanding of the transfer dictionary, which consists of constructing corpus from internet newspapers, extracting the words which are not in the existing dictionary and the frequently used compound nouns, attaching meaning to the extracted words, and integrating with the transfer dictionary. We also develop the tool supporting the expansion of the transfer dictionary. The expansion of the dictionary information is critical to improving the machine translation system but requires much human efforts. The developed tool can be useful for continuously expanding the transfer dictionary, and so it is expected to contribute to enhancing the translation quality.

A Study on Applying Novel Reverse N-Gram for Construction of Natural Language Processing Dictionary for Healthcare Big Data Analysis (헬스케어 분야 빅데이터 분석을 위한 개체명 사전구축에 새로운 역 N-Gram 적용 연구)

  • KyungHyun Lee;RackJune Baek;WooSu Kim
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.3
    • /
    • pp.391-396
    • /
    • 2024
  • This study proposes a novel reverse N-Gram approach to overcome the limitations of traditional N-Gram methods and enhance performance in building an entity dictionary specialized for the healthcare sector. The proposed reverse N-Gram technique allows for more precise analysis and processing of the complex linguistic features of healthcare-related big data. To verify the efficiency of the proposed method, big data on healthcare and digital health announced during the Consumer Electronics Show (CES) held each January was collected. Using the Python programming language, 2,185 news titles and summaries mentioned from January 1 to 31 in 2010 and from January 1 to 31 in 2024 were preprocessed with the new reverse N-Gram method. This resulted in the stable construction of a dictionary for natural language processing in the healthcare field.

The Construction of Korean-to-English Verb Dictionary for Phrase-to-Phrase Translations (구절 변환을 위한 한영 동사 사전 구성)

  • Ok, Cheol-Young;Kim, Yung-Taek
    • Annual Conference on Human and Language Technology
    • /
    • 1991.10a
    • /
    • pp.44-57
    • /
    • 1991
  • In the transfer machine translation, transfer dictionary decides the complexity of the transfer phase and the quality of translation according to the types and precision of informations supplied in the dictionary. Using the phrasal level translated informations within the human readable dictionary, human being translates a source sentence correctly and naturally. In this paper, we propose the verb transfer dictionary in which the various informations are constructed so the machine readable format that the Korean-to-English machine translation system can utilize them. In the proposed dictionary, we first provide the criterions by which an appropriate target verb is selected in phrase-to-phrase translations without an additional semantic analysis in transfer phase. Second, we provide the concrete sentence structure of a target verb so that we can resolve the expressive gaps between two languages and reduce the complexity of the various structure transfer in word-to-word translation.

  • PDF

Development of Japanese to Korean Machine Translation System ATOM Using Personal Computer I - Dictionary Construction and Morphological Analysis - (PC를 이용한 일$\cdot$한 번역 시스템 ATOM의 개발에 관한 연구 ( I ) - 구문해석과 생성과 사전 구성과 형태소 해석을 중심으로 -)

  • Kim, Young-Sum;Kim, Han-Woo;Choi, Byung-Uk
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.25 no.10
    • /
    • pp.1183-1192
    • /
    • 1988
  • In this paper, we describe heuristic information-added morphological dictionary and connection table, and automatic MUNJEUL separation process on the basis of least cost method for efficient morphological analysis. It is simplified the composition of connection and inflective word information by mutually interconnect conjugation table with connection tables. As a result, the applicability of system is increased. Translation dictionary consists of analysis and generation part and, increase the applicability by describing frequently using termination phrase which is extracted statistically as idiom and the procedure directly on the dictionary for the efficiency of analysis process and more natural generation of translation sentence.

  • PDF

Semi-Automatic Construction of Morphological Pattern Dictionary using the Method of Morphological Synthesis (형태소 합성 기법을 이용한 형태소 패턴 사전의 반자동 구축)

  • Park, In-Cheol
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.11
    • /
    • pp.5278-5283
    • /
    • 2011
  • One approach for very high speed korean morphological analysis is to use pre-built morphological results in dictionary. It pays the high cost to build this morphological pattern dictionary manually, besides the dictionary may contain errors. This paper proposes a method to generate morphological patterns automatically using Korean morphological synthesis. The experiment shows that we automatically generate 86% morphological patterns for analyzing Korean sentences. It takes 52.68 seconds for the morphological system using the patterns to analyze 403MB Korean corpus on 2.8GHz Window system.

Joint FrFT-FFT basis compressed sensing and adaptive iterative optimization for countering suppressive jamming

  • Zhao, Yang;Shang, Chaoxuan;Han, Zhuangzhi;Yin, Yuanwei;Han, Ning;Xie, Hui
    • ETRI Journal
    • /
    • v.41 no.3
    • /
    • pp.316-325
    • /
    • 2019
  • Accurate suppressive jamming is a prominent problem faced by radar equipment. It is difficult to solve signal detection problems for extremely low signal to noise ratios using traditional signal processing methods. In this study, a joint sensing dictionary based compressed sensing and adaptive iterative optimization algorithm is proposed to counter suppressive jamming in information domain. Prior information of the linear frequency modulation (LFM) and suppressive jamming signals are fully used by constructing a joint sensing dictionary. The jamming sensing dictionary is further adaptively optimized to perfectly match actual jamming signals. Finally, through the precise reconstruction of the jamming signal, high detection precision of the original LFM signal is realized. The construction of sensing dictionary adopts the Pei type fast fractional Fourier decomposition method, which serves as an efficient basis for the LFM signal. The proposed adaptive iterative optimization algorithm can solve grid mismatch problems brought on by undetermined signals and quickly achieve higher detection precision. The simulation results clearly show the effectiveness of the method.

Construction of Local Terminology Dictionary in NM Imaging Report Forms

  • Hwang, Kyung-Hoon;Jeong, Ji-Young;Park, Kuk-Yang
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2010.04a
    • /
    • pp.352-352
    • /
    • 2010
  • It is difficult to settle the well-designed local terminology for imaging report in the hospital information system (HIS). One of the major reasons is the local terminology with poor contents have been used in the hospital. Thus, we mapped the locally used terms in nuclear medicine imaging report to the SNOMED-CT, which had been widely used in the electronic medical record system, for implementation of hospital information system. Preliminary construction of terminology dictionary was done by mapping of local terms to SNOMED-CT and LexCare Suite. Further study may be warranted.