DOI QR코드

DOI QR Code

English-Korean Transfer Dictionary Extension Tool in English-Korean Machine Translation System

영한 기계번역 시스템의 영한 변환사전 확장 도구

  • 김성동 (한성대학교 컴퓨터공학과)
  • Received : 2012.07.06
  • Accepted : 2012.09.18
  • Published : 2013.01.31

Abstract

Developing English-Korean machine translation system requires the construction of information about the languages, and the amount of information in English-Korean transfer dictionary is especially critical to the translation quality. Newly created words are out-of-vocabulary words and they appear as they are in the translated sentence, which decreases the translation quality. Also, compound nouns make lexical and syntactic analysis complex and it is difficult to accurately translate compound nouns due to the lack of information in the transfer dictionary. In order to improve the translation quality of English-Korean machine translation, we must continuously expand the information of the English-Korean transfer dictionary by collecting the out-of-vocabulary words and the compound nouns frequently used. This paper proposes a method for expanding of the transfer dictionary, which consists of constructing corpus from internet newspapers, extracting the words which are not in the existing dictionary and the frequently used compound nouns, attaching meaning to the extracted words, and integrating with the transfer dictionary. We also develop the tool supporting the expansion of the transfer dictionary. The expansion of the dictionary information is critical to improving the machine translation system but requires much human efforts. The developed tool can be useful for continuously expanding the transfer dictionary, and so it is expected to contribute to enhancing the translation quality.

영한 기계번역 시스템을 개발하기 위해서는 언어에 대한 다양한 정보를 필요로 하며, 특히 영어 단어에 대한 의미 정보를 포함하는 영한 변환사전의 풍부한 정보량은 번역품질에 중요한 요소이다. 지속적으로 생성되는 새로운 단어들은 사전에 등록되어 있지 않아 번역문에 영어 단어가 그대로 출력되어 번역품질을 저하시킨다. 또한 복합명사는 어휘분석, 구문분석을 복잡하게 하고 사전에 의미가 등록되지 않은 경우가 많아 올바르게 번역하기 어렵다. 따라서 영한 기계번역의 번역품질 향상을 위해서는 사전에 등록되어 있지 않은 단어들과 자주 사용되는 복합명사들을 수집하고 의미 정보를 추가하여 영한 변환사전을 지속적으로 확장하는 것이 필요하다. 본 논문에서는 인터넷 신문기사로부터 말뭉치를 추출하고, 사전 미등록 단어와 자주 나타나는 복합명사를 찾은 후, 이들에 대해 의미를 부착하여 영한 변환사전에 추가하는 일련의 과정으로 구성되는 영한 변환사전의 확장 방안을 제안하고 이를 지원하는 도구를 개발하였다. 사전 정보의 확대는 많은 사람의 노력을 필요로 하는 일이지만, 영한 기계번역 시스템의 개선을 위해서는 필수적이다. 본 논문에서 개발한 도구는 사람의 노력을 최소화 하면서, 영한 변환사전의 정보량 지속적인 확대를 위해 유용하게 활용되어 영한 기계번역 시스템의 번역품질 개선에 기여할 것으로 기대된다.

Keywords

References

  1. Jeff Allen, "Improved Translation Quality with Machine Translation Dictionary Building", TranslatioCafe.com, June, 2006.
  2. Mary McGee Wood, E. Pollard, H. Horsfall, N. Holdel, B, Chandler, and J. Carroll, "Dictionary Organization for Machine Translation: The Experience and Implications of the UMIST Japanese Project", Proceedings of the 3rd Conference on European Chapter of the Association for Computational Linguistics, 1987.
  3. H. S. Lee, Y. T. Kim, "Automatic Extraction of Collocations and Verbal Idioms from Corpus for a Generation of English-Korean Transfer Dictionary," Journal of KIISE: Vol.21, No.6, pp.2110-2117, 1994.
  4. S. J. Lee, S. K. Park, Y. T. Kim, "Head-based Phrase Structure Transfer Dictionary for Korean-English Machine Translation," in Proceedings of the 6th Human and Cognitive Language Technology (HCLT), 1994.
  5. C. Y Ok, "Phrase-based Transfer Dictionary for Korean-English Machine Translation," Phd. Thesis, Dept. of Computer Engineering, Seoul National University, 1993.
  6. S. M. Kim, C. W Min, S. C. Kang, J. I. Char, "Method and Apparatus for developing a transfer dictionary used in transfer-based machine translation system," Patent No. 100530154, 2005.
  7. Su Nam Kim, "Statistical Modeling of Multiword Expressions," Ph.D. thesis, University of Melbourne, Melbourne, 2008.
  8. H.-S. Bae, K.-S. Choi, "Electronic Dictionary for Performance Improvement of the Information Retrieval System," Journal of French Culture and Art Study, No.6, pp.69-82, 2002.
  9. Jansche, Martin. "Named Entity Extraction with Conditional Markov Models and Classifiers," Proceedings of Conference on Computational Natural Language Learning, pp.1-4, 2002.
  10. A. McCallum and W. Li, "Early Results for Named Entity Recognition with Conditional Random Fields, Features Induction and Web-Enhanced Lexicons," Proceedings of Conference on Natural Language Learning, pp.188-191, 2003.
  11. Y. Shinyama and S. Sekine, "Named Entity Discovery Using Comparable News Articles," Proceedings of the International Conference on Computational Linguistics, 2004.
  12. A. Kunchukuttan and Om P. Damani, "A System for Compound Noun Multiword Expression Extraction for Hindi," Proceedings of ICON-2008, 6th International Conference on Natural Language processing, pp.20-29, 2008.
  13. Yujie Zhang and Hitoshi Isahara, "Acquiring Compound Word Translations Both Automatically and Dynamically," Proceedings of the Pacific Asia Conference on Language, Information, and Computation, pp.181-186, 2004.
  14. Sung-Dong Kim, Da-Un kang, Bohee Lee, Dorim Kim, "Development of Dictionary Management Tool for English-Korean Machine Translation System," in Proceedings of the 36th KIISE(Korean Institute of Information Scientists and Engineers) Fall Conference, Vol.36, No.2(C), pp.199-203, 2009.