• Title/Summary/Keyword: Korean Machine Translation Data

Search Result 51, Processing Time 0.025 seconds

A Study on the Performance Improvement of Machine Translation Using Public Korean-English Parallel Corpus (공공 한영 병렬 말뭉치를 이용한 기계번역 성능 향상 연구)

  • Park, Chanjun;Lim, Heuiseok
    • Journal of Digital Convergence
    • /
    • v.18 no.6
    • /
    • pp.271-277
    • /
    • 2020
  • Machine translation refers to software that translates a source language into a target language, and has been actively researching Neural Machine Translation through rule-based and statistical-based machine translation. One of the important factors in the Neural Machine Translation is to extract high quality parallel corpus, which has not been easy to find high quality parallel corpus of Korean language pairs. Recently, the AI HUB of the National Information Society Agency(NIA) unveiled a high-quality 1.6 million sentences Korean-English parallel corpus. This paper attempts to verify the quality of each data through performance comparison with the data published by AI Hub and OpenSubtitles, the most popular Korean-English parallel corpus. As test data, objectivity was secured by using test set published by IWSLT, official test set for Korean-English machine translation. Experimental results show better performance than the existing papers tested with the same test set, and this shows the importance of high quality data.

Deep Learning-based Korean Dialect Machine Translation Research Considering Linguistics Features and Service (언어적 특성과 서비스를 고려한 딥러닝 기반 한국어 방언 기계번역 연구)

  • Lim, Sangbeom;Park, Chanjun;Yang, Yeongwook
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.2
    • /
    • pp.21-29
    • /
    • 2022
  • Based on the importance of dialect research, preservation, and communication, this paper conducted a study on machine translation of Korean dialects for dialect users who may be marginalized. For the dialect data used, AIHUB dialect data distributed based on the highest administrative district was used. We propose a many-to-one dialect machine translation that promotes the efficiency of model distribution and modeling research to improve the performance of the dialect machine translation by applying Copy mechanism. This paper evaluates the performance of the one-to-one model and the many-to-one model as a BLEU score, and analyzes the performance of the many-to-one model in the Korean dialect from a linguistic perspective. The performance improvement of the one-to-one machine translation by applying the methodology proposed in this paper and the significant high performance of the many-to-one machine translation were derived.

FromTo-$Web/EK^{TM}$: English-to-Korean Machine Translation System for HTML Documents (에서로-웹/$EK^{TM}$: 영한 웹 문서 번역 시스템)

  • Sim, Chul-Min;Yuh, Sang-Wha;Jung, Han-Min;Kim, Tae-Wan;Park, Dong-In;Kwon, Hyuk-Chul
    • Annual Conference on Human and Language Technology
    • /
    • 1997.10a
    • /
    • pp.277-282
    • /
    • 1997
  • 최근 들어 웹 상의 문서를 번역해 주는 번역 시스템이 상용화되고 있다. 일반 문서와 달리 웹 문서는 HTML 태그를 포함하고 있어 번역 시스템에서 문장 단위로 분리하는데 어려움이 있다. 또한 그 대상 영역이 제한되지 않으므로 미등록어 및 구문 분석 실패에 대한 대처 기능이 필요하다. 따라서 웹 문서의 번역 품질이 일반 문서 번역에 비해 현저히 떨어지게 된다. 이 논문에서는 HTML 태그를 보유한 영어 웹 문서를 대상으로 하는 번역 시스템인 "에서로-웹/EK"에 대해 기술한다. 에서로-웹/EK는 HTML 문서의 특성을 고려하여 태그를 분리, 복원하는 태그 관리자를 별도로 가진다. 또한 태그를 유지하면서 영어에서 한국어로 변환되는 과정에서 발생하는 어휘 분리, 어휘 통합, 어순 변환 둥의 다양한 변환 현상을 처리한다. 이 시스템은 변환 방식에 기반한 번역 시스템으로서 영어 해석, 영한 변환, 한국어 생성의 단계를 거친다. 구현된 시스템은 Netscape와 DDE(Dynamic Data Exchange) 방식으로 연동하여 HTML 문서를 번역한다.

  • PDF

Machine Translation of Korean-to-English spoken language Based on Semantic Patterns (의미패턴에 기반한 대화체 한영 기계 번역)

  • Jung, Cheon-Young;Seo, Young-Hoon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.9
    • /
    • pp.2361-2368
    • /
    • 1998
  • This paper analyzes Korean spoken language and describes the machine translation o[ Korean to-English spoken language based on semantic patterns, In Korean-to-English machine translation. ambiguity of Korean sentence analysis using syntactic information can be resolved by semantic patterns, Therefore, for machine translation of spoken language, we estabilish the system based on semantic patterns extracted from Korean scheduling domain, This system obtains the robustness by skip ability of syllables in analysis of Korean sentence and we add options to semantic patterns in order to reduce pattern numbers, The data used [or the experiment are scheduling domain and performance of Korean-to-English translation is 88%.

  • PDF

Target Word Selection for English-Korean Machine Translation System using Multiple Knowledge (다양한 지식을 사용한 영한 기계번역에서의 대역어 선택)

  • Lee, Ki-Young;Kim, Han-Woo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.11 no.5 s.43
    • /
    • pp.75-86
    • /
    • 2006
  • Target word selection is one of the most important and difficult tasks in English-Korean Machine Translation. It effects on the translation accuracy of machine translation systems. In this paper, we present a new approach to select Korean target word for an English noun with translation ambiguities using multiple knowledge such as verb frame patterns, sense vectors based on collocations, statistical Korean local context information and co-occurring POS information. Verb frame patterns constructed with dictionary and corpus play an important role in resolving the sparseness problem of collocation data. Sense vectors are a set of collocation data when an English word having target selection ambiguities is to be translated to specific Korean target word. Statistical Korean local context Information is an N-gram information generated using Korean corpus. The co-occurring POS information is a statistically significant POS clue which appears with ambiguous word. The experiment showed promising results for diverse sentences from web documents.

  • PDF

A Linguistic Evaluation of English-to-Korean Translation - Centered on Machine Translation - (영한 번역의 언어학적 평가 모델 연구 - 기계번역을 중심으로 -)

  • 김덕봉;조병은;김명철;권용현
    • Korean Journal of Cognitive Science
    • /
    • v.12 no.4
    • /
    • pp.11-27
    • /
    • 2001
  • Machine translation (MT) quality assessment is an outstanding problem. In the present situation in which the quality of machine-translated products are far from the user\\`s satisfaction objective evaluation of MT system is a prerequisite to building mutual trust between the users and the vendors stimulating constructive competition among the developers and finally leading to improve the quality of MT systems. Especially there emerges a need for an intensive study on how to evaluate the quality of MT systems from both linguistic and data processing aspects and to secure a steady improvement of the translation quality. With due regard to such points we in this paper present a linguistic evaluation of English-to-Korean machine translation based on a test suite composed of 3.373 sentences that were classified into their linguistic phenomena and complexity levels and report the experimental results made from several commercial MT systems.

  • PDF

Korean-English Non-Autoregressive Neural Machine Translation using Word Alignment (단어 정렬을 이용한 한국어-영어 비자기회귀 신경망 기계 번역)

  • Jung, Young-Jun;Lee, Chang-Ki
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.629-632
    • /
    • 2021
  • 기계 번역(machine translation)은 자연 언어로 된 텍스트를 다른 언어로 자동 번역 하는 기술로, 최근에는 주로 신경망 기계 번역(Neural Machine Translation) 모델에 대한 연구가 진행되었다. 신경망 기계 번역은 일반적으로 자기회귀(autoregressive) 모델을 이용하며 기계 번역에서 좋은 성능을 보이지만, 병렬화할 수 없어 디코딩 속도가 느린 문제가 있다. 비자기회귀(non-autoregressive) 모델은 단어를 독립적으로 생성하며 병렬 계산이 가능해 자기회귀 모델에 비해 디코딩 속도가 상당히 빠른 장점이 있지만, 멀티모달리티(multimodality) 문제가 발생할 수 있다. 본 논문에서는 단어 정렬(word alignment)을 이용한 비자기회귀 신경망 기계 번역 모델을 제안하고, 제안한 모델을 한국어-영어 기계 번역에 적용하여 단어 정렬 정보가 어순이 다른 언어 간의 번역 성능 개선과 멀티모달리티 문제를 완화하는 데 도움이 됨을 보인다.

  • PDF

An Alignment based technique for Text Translation between Traditional Chinese and Simplified Chinese

  • Sue J. Ker;Lin, Chun-Hsien
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2002.02a
    • /
    • pp.147-156
    • /
    • 2002
  • Aligned parallel corpora have proved very useful in many natural language processing tasks, including statistical machine translation and word sense disambiguation. In this paper, we describe an alignment technique for extracting transfer mapping from the parallel corpus. During building our system and data collection, we observe that there are three types of translation approaches can be used. We especially focuses on Traditional Chinese and Simplified Chinese text lexical translation and a method for extracting transfer mappings for machine translation.

  • PDF

Analyzing the Types and Causes of Korean-to-English Machine Translation Errors: Focused on Morphological and Syntactical Errors (한-영 기계번역 결과물의 오류 유형 및 원인 분석: 형태적·구문적 오류를 중심으로)

  • Baek, Ji-Yeon;Goo, Hye-Kyoung
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.4
    • /
    • pp.199-204
    • /
    • 2022
  • This study was carried out in an L2 writing class using machine translation. The aim of this study was to explore what types of errors are identified the most frequently in the Korean-to-English machine translation output and what causes those errors. The participants were seven EFL university students who completed three writing tasks throughout the semester. The findings of data analysis indicated that the most common errors were seen in sentence structure and mechanics, and those errors in the translated texts were caused by the errors in the Korean source texts.

Korean Text to Gloss: Self-Supervised Learning approach

  • Thanh-Vu Dang;Gwang-hyun Yu;Ji-yong Kim;Young-hwan Park;Chil-woo Lee;Jin-Young Kim
    • Smart Media Journal
    • /
    • v.12 no.1
    • /
    • pp.32-46
    • /
    • 2023
  • Natural Language Processing (NLP) has grown tremendously in recent years. Typically, bilingual, and multilingual translation models have been deployed widely in machine translation and gained vast attention from the research community. On the contrary, few studies have focused on translating between spoken and sign languages, especially non-English languages. Prior works on Sign Language Translation (SLT) have shown that a mid-level sign gloss representation enhances translation performance. Therefore, this study presents a new large-scale Korean sign language dataset, the Museum-Commentary Korean Sign Gloss (MCKSG) dataset, including 3828 pairs of Korean sentences and their corresponding sign glosses used in Museum-Commentary contexts. In addition, we propose a translation framework based on self-supervised learning, where the pretext task is a text-to-text from a Korean sentence to its back-translation versions, then the pre-trained network will be fine-tuned on the MCKSG dataset. Using self-supervised learning help to overcome the drawback of a shortage of sign language data. Through experimental results, our proposed model outperforms a baseline BERT model by 6.22%.