Browse > Article
http://dx.doi.org/10.5916/jkosme.2013.37.3.300

Bilingual lexicon induction through a pivot language  

Kim, Jae-Hoon (Division of Information Technology, Korea Maritime University)
Seo, Hyeong-Won (Department of Computer Engineering, Korea Maritime University)
Kwon, Hong-Seok (Department of Computer Engineering, Korea Maritime University)
Abstract
This paper presents a new method for constructing bilingual lexicons through a pivot language. The proposed method is adapted from the context-based approach, called the standard approach, which is well-known for building bilingual lexicons using comparable corpora. The main difference between the standard approach and the proposed method is how to represent context vectors. The former is to represent context vectors in a target language, while the latter in a pivot language. The proposed method is very simplified from the standard approach thereby. Furthermore, the proposed method is more accurate than the standard approach because it uses parallel corpora instead of comparable corpora. The experiments are conducted on a language pair, Korean and Spanish. Our experimental results have shown that the proposed method is quite attractive where a parallel corpus directly between source and target languages are unavailable, but both source-pivot and pivot-target parallel corpora are available.
Keywords
Bilingual lexicon induction; Parallel corpus; Comparable corpora; Pivot language;
Citations & Related Records
연도 인용수 순위
  • Reference
1 P. Koehn, "Europarl: a parallel corpus for statistical machine translation", Proceedings of the conference on the 10th Machine Translation Summit, pp. 79-86, 2005.
2 W. Lee, S. Kim, G. Kim and K. Choi, "Implementation of modularized morphological analyzer", Proceedings of the 11th Annual Conference on Human and Cognitive Language Technology, pp. 123-136, 1999.
3 H Schmid, "Probabilistic part-of-speech tagging using decision trees", Proceedings of International Conference on New Methods in Language Processing, pp. 44-49, 1994.
4 H. Wu and H. Wang, "Pivot language approach for phrase-based statistical machine translation", Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, pp. 856-863, 2007.
5 T. Tsunakawa, N. Okazaki, and J. Tsujii, "Building bilingual lexicons using lexical translation probabilities via pivot languages", Proceedings of the International Conference on Computational Linguistics, pp. 18-22, 2008.
6 G. Grefenstette, "Corpus-derived, first, second and third-order affinities, Proceedings of EuroLex, pp. 279-290, 1994.
7 R. Rapp, "Identifying word translations in nonparallel texts", Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, pp. 320-322, 1995.
8 P. Fung. "A statistical view on bilingual lexicon extraction: from parallel corpora to non-parallel corpora", Proceedings of the Parallel Text Processing, pages 1-16, 1998.
9 A. Hazem and E. Morin, "Adaptive dictionary for bilingual lexicon extraction from comparable corpora", Proceedings of the 8th International Conference on Language Resources and Evaluation pp. 288-292, 2012.
10 P. Fung and K. R. McKeown, "Finding terminology translations from non-parallel corpora", Proceedings of the 5th International Workshop of Very Large Corpora, pp. 192-202, 1997.
11 F. Bond, R. Sulong, T. Yamazaki, and K. Ogura., "Design and construction of a machine- tractable Japanese-Malay dictionary", Proceedings of Machine Translation Summit VIII, pages 53-58.
12 A. Lardilleux, Y. Lepage, and F. Yvon, "The contribution of low frequencies to multilingual sub-sentential alignment: a differential associative approach", International Journal of Advanced Intelligence, vol. 3, no. 2, pp. 189-217, 2011.   DOI
13 H.-W. Seo, H.-C. Kim, H.-Y. Cho, J.-H. Kim and S.-W. Yang, "Automatically constructing English-Korean parallel corpus from web documents", Proceedings of the 26th KIPS Fall Conference, vol. 13, no. 2, pp. 161-164, 2006.
14 K. Yu and J. Tsujii, "Bilingual dictionary extraction from Wikipedia", Proceedings of the 12th Machine Translation Summit, pp. 379-386, 2009.
15 R. Rapp, "Automatic identification of word translations from unrelated English and German corpora", Proceedings of the Association for Computational Linguistics, pp. 519-526, 1999.
16 D. Wu and X. Xia, "Learning an English-Chinese lexicon from a parallel corpus", Proceedings of the First Conference of the Association for Machine Translation in the Americas, pp. 206-213, 1994.
17 P. Fung, "Compiling bilingual lexicon entries from a non-parallel English-Chinese corpus", Proceedings of the Third Workshop on Very Large Corpora, pp. 173-183, 1995.
18 A. Ismail and S. Manandhar, "Bilingual lexicon extraction from comparable corpora using in-domain terms", Proceedings of the International Conference on Computational Linguistics, pp. 481-489, 2010.
19 K. Tanaka and K. Umemura, "Construction of a bilingual dictionary intermediated by a third language", Proceedings of the 15th International Conference on Computational Linguistics, pp. 297-303, 1994.