Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2007.14-B.3.231

Automatically Extracting Unknown Translations Using Phrase Alignment  

Kim, Jae-Hoon (한국한양대학교 컴퓨터공학과)
Yang, Sung-Il (한국전자통신연구원 언어처리연구팀)
Abstract
In this paper, we propose an automatic extraction model for unknown translations and implement an unknown translation extraction system using the proposed model. The proposed model as a phrase-alignment model is incorporated with three models: a phrase-boundary model, a language model, and a translation model. Using the proposed model we implement the system for extracting unknown translations, which consists of three parts: construction of parallel corpora, alignment of Korean and English words, extraction of unknown translations. To evaluate the performance of the proposed system we have established the reference corpus for extracting unknown translation, which comprises of 2,220 parallel sentences including about 1,500 unknown translations. Through several experiments, we have observed that the proposed model is very useful for extracting unknown translations. In the future, researches on objective evaluation and establishment of parallel corpora with good quality should be performed and studies on improving the performance of unknown translation extraction should be kept up.
Keywords
Unknown Translation; Alignment Model; Dictionary Construction; Parallel Corpus;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Fung, P. and Church, K. 'K-vec: A new approach for aligning parallel texts', Proceedings of COLING 94, pp.1096-1102, 1994   DOI
2 Fung, P. 'A statistical view of bilingual lexicon extraction: From parallel corpora to nonparallel corpora', Proceedings of the Third Conference of the Association for Machine Translation in the Americas, pp.1-16, 1998   DOI
3 Kim, C.-H. and Hong, M. 'A Korean syntactic parser customized for Korean-English patent MT system', Proceedings of the 5th International Conference on Natural Language, pp.44-55, 2006   DOI   ScienceOn
4 Crego, J.M., Marino, J. B., Gispert, A. 'An ngram-based statistical machine translation decoder', Proceedings of the 9th European Conference on Speech Communication and Technology, pp.3193-3196, 2005
5 Gale, W. A. and Church, K. W., 'A program for aligning sentences in bilingual corpora', Computational Linguistics, vol. 19, no. 1, pp.75-102, 1993
6 NIST 2006 Machine Translation Evaluation Official Results, http://www.nist.gov/speech/tests/mt/mt06eval_official_results.html, 2006
7 Brown, P., Della Pietra, V., Della Pietra, S., and Mercer, R., 'The mathematics of statistical machine translation: Parameter estimation', Computational Linguistics, vol. 19, no. 2, pp.263-311, 1993
8 Zhang, Y. and Vogel, S., 'An efficient phrase-to-phrase alignment model for arbitrarily long phrase and large corpora', Proceedings of the Tenth Conference of the European Association for Machine Translation, pp.294-301, 2005
9 Resnik, P. and Smith N.A., 'The web as a parallel corpus', Computational Linguistics, vo. 29, no. 3, pp.349-380, 2003   DOI   ScienceOn
10 Kilgarriff, A. and Grefenstette, G., 'Introduction to the Special Issue on the Web as Corpus'. Computational Linguistics, vol. 29, no. 3, pp.333-347, 2003   DOI   ScienceOn
11 Smadja, F., McKeown, K. R. and Hatzivassiloglou, V., 'Translating collocations for bilingual lexicons: A statistical approach', Computational Linguistics, vol. 22, no. 1, pp.1-38, 1996
12 Rey, A., Eassys on Terminology, John Benjamins, 1997
13 Hutchins, W. J. and Somers, H. L., An Introduction to Machine Translation, Academic Press Limited, 1992
14 Papineni, K. Roukos, S. Ward, Todd, Zhu, W. J., BLEU: A Method for Automatic Evaluation of Machine Translation, IBM Research Report RC22176, 2001
15 Arnold, D. J., Balkan, L., Meijer, S., Humphreys, R. L. and Sadler, L., Machine Translation: an Introductory Guide, Blackwells-NCC, London, 1994
16 Sinha, R. M. K., 'Interpreting Unknown Words in Machine Translation from Hindi to English', Proceeding of Computational Intelligence, pp.278-282, 2005
17 이연호, 김금희, 이홍윤, 유병기, 김규웅, 이영교, 임인칠, '한-일 기계번역 시스템의 관용구 및 미등록어 처리 알고리즘', 대한전자공학회 학술대회 논문집, 제14권, 1호, pp.201-204, 1991
18 Manning, C. D. and Schutze, H., Foundation of Statistical Natural Language Processing, The MIT Press, 1999
19 Diab, M. 'An unsupervised method for word sense tagging using parallel corpora: A preliminary investigation', Special Interest Group in Lexical Semantics Workshop, Association for Computational Linguistics, 2000
20 Och, F. J. and Ney, H., 'The alignment template approach to statistical machine translation', Computational Linguistics, vol. 30, no. 4. pp.417-449, 2004   DOI   ScienceOn
21 Ion, R., Ceausu, A. and Tufs, D. 'Dependency-based phrase alignment', Proceedings of the Fifth International Conference on Language Resources and Evaluation, pp.1290-1293 2006
22 Varma, N. Identifying Word Translation in Parallel Corpora Using Measures of Association, Master Thesis, Department of Computer Science, University of Minnesota, USA, 2002
23 Koehn, P. Noun Phrase Translation, PhD. Thesis, University of Southern California, 2003
24 Callison-Burch, C., Koehn, P. and Osborne, M. 'Improved statistical machine translation using paraphrases', Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL, pp.17-24, 2006   DOI
25 Wu, D. 'Stochastic inversion transduction grammars and bilingual parsing of parallel corpora', Computational Linguistics, vol. 23, no. 3, pp.377-403, 1997
26 Yamada, K. and Knight, K. 'A syntax-based statistical translation model', Proceedings of the 39th Annual Conference of the Association for Computational Linguistics, pp.523-530, 2001   DOI
27 Gale, W. and Church, K. 'Identifying word correspondence in parallel text', Proceedings of the workshop on Speech and Natural Language, pp.152-157, 1991   DOI
28 Hiemstra, D. 'Multilingual domain modeling In Twenty-One: automatic creation of a bi-directional translation lexicon from a parallel corpus', Proceedings of the 8th CLIN meeting, pp.41-58, 1998
29 서형원, 김형철, 조희영, 김재훈, 양성일, '웹 문서로부터 한영 병렬말뭉치의 자동 구축', 제26회 한국정보처리학회 추계학술대회 논문집, 제13권, 제2호, pp.161-164, 2006
30 조희영, 서형원, 김재훈, 양성일, '한영 명사구 기계 번역', 제18회 한글 및 한국어 정보처리 학술대회 발표 논문집, pp.273-278, 2006
31 Stolcke, A. 'SRILM-An extensible language modeling toolkit', Proceedings of Intl. Conf. on Spoken Language Processing, vol. 2, pp.901-904, 2002