MALSORI (대한음성학회지:말소리)
- Volume 68
- /
- Pages.95-114
- /
- 2008
- /
- 1226-1173(pISSN)
A Hybrid Sentence Alignment Method for Building a Korean-English Parallel Corpus
한영 병렬 코퍼스 구축을 위한 하이브리드 기반 문장 자동 정렬 방법
Abstract
The recent growing popularity of statistical methods in machine translation requires much more large parallel corpora. A Korean-English parallel corpus, however, is not yet enoughly available, little research on this subject is being conducted. In this paper we present a hybrid method of aligning sentences for Korean-English parallel corpora. We use bilingual news wire web pages, reading comprehension materials for English learners, computer-related technical documents and help files of localized software for building a Korean-English parallel corpus. Our hybrid method combines sentence-length based and word-correspondence based methods. We show the results of experimentation and evaluate them. Alignment results from using a full translation model are very encouraging, especially when we apply alignment results to an SMT system: 0.66% for BLEU score and 9.94% for NIST score improvement compared to the previous method.