• Title/Summary/Keyword: vocabulary translation

Search Result 34, Processing Time 0.019 seconds

Development of A System for Registration of Korean Terminology on The Electropedia

  • Moon, Bonghee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.8
    • /
    • pp.105-111
    • /
    • 2019
  • In this paper, I introduce the development of a system to register Korean standard technical terms which are corresponded with English electronical terminologies on the Electropedia of the International Electronical Committee(IEC). In 2016, this project was started with the permission of registration at the Technical Committee 1 of the $80^{th}$ IEC General Meeting in Frankfurt, Germany. The work was consisted of 3 parts, the 1st step was gathering Korean vocabularies and building a databse for the translation of English terms of International Electronical Vocabulary(IEV) into Korean terms, the 2nd step was to find correct or proper Korean term which is in accord with each English term of IEV on the Electropedia. In this step, members of Korean TC 1 worked for search proper Korean terms using developed computer programs and databases which were made of Korean electronical dictionaries. After selection of proper terms, they did the cross-checking work for Korean terms each other. The last step was to register all of these Korean terms on the Electropedia. As a result, 20,766 Korean electronical terms were registered on the Electropedia in 2017. In the future, it is needed that the definition of English technical terms are translated into Korean.

Efficient Subword Segmentation for Korean Language Classification (한국어 분류를 위한 효율적인 서브 워드 분절)

  • Hyunjin Seo;Jeongjae Nam;Minseok Kim
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.535-540
    • /
    • 2022
  • Out of Vocabulary(OOV) 문제는 인공신경망 기계번역(Neural Machine Translation, NMT)에서 빈번히 제기되어 왔다. 이를 해결하기 위해, 기존에는 단어를 효율적인 압축할 수 있는 Byte Pair Encoding(BPE)[1]이 대표적으로 이용되었다. 하지만 BPE는 빈도수를 기반으로 토큰화가 진행되는 결정론적 특성을 취하고 있기에, 다양한 문장에 관한 일반화된 분절 능력을 함양하기 어렵다. 이를 극복하기 위해 최근 서브 워드를 정규화하는 방법(Subword Regularization)이 제안되었다. 서브 워드 정규화는 동일한 단어 안에서 발생할 수 있는 다양한 분절 경우의 수를 고려하도록 설계되어 다수의 실험에서 우수한 성능을 보였다. 그러나 분류 작업, 특히 한국어를 대상으로 한 분류에 있어서 서브 워드 정규화를 적용한 사례는 아직까지 확인된 바가 없다. 이를 위해 본 논문에서는 서브 워드 정규화를 대표하는 두 가지 방법인 유니그램 기반 서브 워드 정규화[2]와 BPE-Dropout[3]을 이용해 한국어 분류 문제에 대한 서브 워드 정규화의 효과성을 제안한다. NMT 뿐만 아니라 분류 문제 역시 단어의 구성성 및 그 의미를 파악하는 것은 각 문장이 속하는 클래스를 결정하는데 유의미한 기여를 한다. 더불어 서브 워드 정규화는 한국어의 문장 구성 요소에 관해 폭넓은 인지능력을 함양할 수 있다. 해당 방법은 본고에서 진행한 한국어 분류 과제 실험에서 기존 BPE 대비 최대 4.7% 높은 성능을 거두었다.

  • PDF

Automatic Error Correction System for Erroneous SMS Strings (SMS 변형된 문자열의 자동 오류 교정 시스템)

  • Kang, Seung-Shik;Chang, Du-Seong
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.6
    • /
    • pp.386-391
    • /
    • 2008
  • Some spoken word errors that violate grammatical or writing rules occurs frequently in communication environments like mobile phone and messenger. These unexpected errors cause a problem in a language processing system for many applications like speech recognition, text-to-speech translation, and so on. In this paper, we proposed and implemented an automatic correction system of ill-formed words and word spacing errors in SMS sentences that has been the major errors of poor accuracy. We experimented three methods of constructing the word correction dictionary and evaluated the results of those methods. They are (1) manual construction of error words from the vocabulary list of ill-formed communication languages, (2) automatic construction of error dictionary from the manually constructed corpus, and (3) context-dependent method of automatic construction of error dictionary.

Terminology of Developmental Abnormalities in Common Laboratory Animals (실험동물의 발생이상 용어집)

  • Kim, Jong-Choon;Yang, Young-Su;Ahn, Tai-Hwan;Kim, Sung-Ho;Chung, Soo-Youn;Rhee, Gyu-Seek;Chung, Na-Young;Chung, Moon-Koo
    • Toxicological Research
    • /
    • v.22 no.3
    • /
    • pp.157-220
    • /
    • 2006
  • This paper presents the first version of a Korean glossary of terms for structural developmental abnormalities in common laboratory animals, mainly rats, mice and rabbits. This is a translation of the glossary entitled Terminology and Developmental Abnormalities in Common Laboratory Mammals that was edited by the International Federation of Teratology Societies(IFTS) Committee on International Harmonization of Nomenclature in Developmental Toxicology. The purpose of the Korean glossary is to provide a common vocabulary that will reduce confusion and ambiguity in the description of developmental effects, particularly in submissions to regulatory agencies worldwide. The glossary contains a primary term or phrase, a definition of the abnormality, and notes, where appropriate. Selected synonyms or related terms, which reflect a similar or closely related concept, are noted. Non-preferred terms are indicated where their usage may be incorrect. Modifying terms used repeatedly in the glossary(e.g., absent, branched) are listed in Appendix A, and syndrome names are generally excluded from the glossary, but are listed separately in Appendix B. The glossary is organized into broad sections for external, visceral, and skeletal observations, then subdivided into regions, structures, or organs in a general overall head to tail sequence. Numbering is sequential, and not in any regional or hierarchical order, Uses and misuses of the glossary are discussed. Updates of the Korean glossary are planned based on the comments received.