• Title/Summary/Keyword: 병렬 코퍼스

Search Result 26, Processing Time 0.017 seconds

A Bidirectional Korean-Japanese Statistical Machine Translation System by Using MOSES (MOSES를 이용한 한/일 양방향 통계기반 자동 번역 시스템)

  • Lee, Kong-Joo;Lee, Song-Wook;Kim, Jee-Eun
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.36 no.5
    • /
    • pp.683-693
    • /
    • 2012
  • Recently, statistical machine translation (SMT) has received many attention with ease of its implementation and maintenance. The goal of our works is to build bidirectional Korean-Japanese SMT system by using MOSES [1] system. We use Korean-Japanese bilingual corpus which is aligned per sentence to train the translation model and use a large raw corpus in each language to train each language model. The proposed system shows results comparable to those of a rule-based machine translation system. Most of errors are caused by noises occurred in each processing stage.

A Hybrid Sentence Alignment Method for Building a Korean-English Parallel Corpus (한영 병렬 코퍼스 구축을 위한 하이브리드 기반 문장 자동 정렬 방법)

  • Park, Jung-Yeul;Cha, Jeong-Won
    • MALSORI
    • /
    • v.68
    • /
    • pp.95-114
    • /
    • 2008
  • The recent growing popularity of statistical methods in machine translation requires much more large parallel corpora. A Korean-English parallel corpus, however, is not yet enoughly available, little research on this subject is being conducted. In this paper we present a hybrid method of aligning sentences for Korean-English parallel corpora. We use bilingual news wire web pages, reading comprehension materials for English learners, computer-related technical documents and help files of localized software for building a Korean-English parallel corpus. Our hybrid method combines sentence-length based and word-correspondence based methods. We show the results of experimentation and evaluate them. Alignment results from using a full translation model are very encouraging, especially when we apply alignment results to an SMT system: 0.66% for BLEU score and 9.94% for NIST score improvement compared to the previous method.

  • PDF

Discovery of Coordinate Terms and Context using the Title and Snippet in Web Search (Web 검색 엔진의 제목과 문서요약을 이용한 동위어와 문맥의 발견)

  • Han, Sang-Yong;Lee, Sang-Hoon
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2007.10c
    • /
    • pp.210-215
    • /
    • 2007
  • 웹상에서의 정보량이 증가함에 따라, 사용자가 알고 싶어 하는 단어에 대해서 연관된 단어를 통해서 이해하게 된다. 동위어란 공통의 상위어를 가지는 단어이다. 이를 위한 기존의 연구로서 동위어와 상위어, 하위어 등을 찾는 연구는 많이 있었지만, 웹상의 문서를 이용하여 거대한 코퍼스를 해석해서 결과를 구하는 데 많은 시간이 소요되었다. 이에 본 논문에서는 사용자의 질의어에 대해서 웹 검색엔진이 가지는 제목과 문서요악으로부터 동위어와 문맥을 빠른 시간 안에 발견하는 방법에 대해 제안한다. 어떤 단어에 대한 동위어가 병렬조사 #와#로 접속되는 것을 이용하여 웹 검색 엔진에 대한 질의어를 작성하고, 그 검색 결과로부터 동위어를 얻는다. 이와 동시에 발견된 동위어와 질의어의 배후에 있는 문맥도 얻는다. 이를 통해, 웹 검색에 있어서 질의어의 확장과 비교 대상의 발견 등 폭넓은 분야에서도 적용가능하다고 할 수 있다.

  • PDF

Conceptual Interlingua Construction for Korean-English Query Translation (한영 질의어 변환을 위한 공통 중간개념 구축)

  • Choi, Yong-Seok;Seo, Chung-Won;Shin, Sa-Im;Kim, Jae-Ho;Choi, Key-Sun
    • Annual Conference on Human and Language Technology
    • /
    • 2001.10d
    • /
    • pp.422-427
    • /
    • 2001
  • 질의어 변환 방법은 다국어 정보검색을 위한 방법중에 효율적인 방법이다. 양질의 질의어 변환을 위해서, 사전, 온톨로지, 병렬 코퍼스 통과 같은 자연언어 자원이 필요하다. 이러한 자연언어 자원은 양질로 대량으로 구축하려면 많은 비용이 튼다는 단점이 있다. 본 논문에서는 한영 질의어 변환에 적용할 수 있는 공통 중간개념 구축방법을 제안한다. 공통 중간개념은 동사들의 축으로 이루어지며, 통사들은 기본동사들의 조합으로 표현한수 있다고 가정한다. 공통 중간개념은 적은 자연언어 자원을 효율적으로 이용할 수 있도록 한다. 본 논문에서는 기본 동사 축을 특이값 분해(singular value decomposition) 방법으로 구하고, 그 기본 동사 축을 이용해서 질의어 변환하는 방법을 보여준다.

  • PDF

An Use of the Patterns for an Efficient Example-Based Machine Translation (효율적인 예제 기반 기계번역을 위한 패턴의 사용)

  • Lee, Gi-Yeong;Kim, Han-U
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.37 no.3
    • /
    • pp.1-11
    • /
    • 2000
  • An example-based machine translation approach is a new paradigm for resolving various problems caused by the rules of conventional rule-based machine translation. But, in pure example-based machine translation, it is very hard to find similar examples matched with input sentences by using reasonable parallel corpus. This problem causes large overheads in the process of sentence generation. This paper proposes new method of English-Korean transfer using both patterns and examples. The patterns are composed of sentence patterns and phrase patterns. Meta parts of the patterns make the example-based machine translation more practical by raising the probability to find similar examples. The use of patterns and examples can reduce the ambiguities in source language analysis and give us a high quality of MT. And experimental results with a test corpus are discussed.

  • PDF

Chunking Korean and an Application (한국어 낱말 묶기와 그 응용)

  • Un Koaunghi;Hong Jungha;You Seok-Hoon;Lee Kiyong;Choe Jae-Woong
    • Language and Information
    • /
    • v.9 no.2
    • /
    • pp.49-68
    • /
    • 2005
  • Application of chunking to English and some other European languages has shown that it is a viable parsing mechanism for natural languages. Although a small number of attempts have been made to apply chunking to the analysis of the Korean language, it still is not clear enough what criteria there are to identify appropriate units of chunking, and how efficient and valid the chunking algorithms would be when applied to some authentic Korean texts. The purpose of this research is to provide an alternative set of algorithms for chunking Korean, and to implement them, and to test them against some English-Korean parallel corpora, which is English and Korean bibles matched sentence by sentence. It is shown in the paper that aligning related texts and identifying matched phrases between the two languages can be achieved through appropriate chunking and matching algorithms defined on the morphologically-tagged parallel corpus. Chunking and matching processes are based on the content words rather than the function words, and the matching itself is done in terms of the transfer dictionary. The implementation is done in C and XML, and can be accessed through the Internet.

  • PDF