DOI QR코드

DOI QR Code

Bilingual lexicon induction through a pivot language

  • Kim, Jae-Hoon (Division of Information Technology, Korea Maritime University) ;
  • Seo, Hyeong-Won (Department of Computer Engineering, Korea Maritime University) ;
  • Kwon, Hong-Seok (Department of Computer Engineering, Korea Maritime University)
  • Received : 2013.05.01
  • Accepted : 2013.05.13
  • Published : 2013.05.31

Abstract

This paper presents a new method for constructing bilingual lexicons through a pivot language. The proposed method is adapted from the context-based approach, called the standard approach, which is well-known for building bilingual lexicons using comparable corpora. The main difference between the standard approach and the proposed method is how to represent context vectors. The former is to represent context vectors in a target language, while the latter in a pivot language. The proposed method is very simplified from the standard approach thereby. Furthermore, the proposed method is more accurate than the standard approach because it uses parallel corpora instead of comparable corpora. The experiments are conducted on a language pair, Korean and Spanish. Our experimental results have shown that the proposed method is quite attractive where a parallel corpus directly between source and target languages are unavailable, but both source-pivot and pivot-target parallel corpora are available.

Keywords

References

  1. R. Rapp, "Automatic identification of word translations from unrelated English and German corpora", Proceedings of the Association for Computational Linguistics, pp. 519-526, 1999.
  2. D. Wu and X. Xia, "Learning an English-Chinese lexicon from a parallel corpus", Proceedings of the First Conference of the Association for Machine Translation in the Americas, pp. 206-213, 1994.
  3. P. Fung, "Compiling bilingual lexicon entries from a non-parallel English-Chinese corpus", Proceedings of the Third Workshop on Very Large Corpora, pp. 173-183, 1995.
  4. K. Yu and J. Tsujii, "Bilingual dictionary extraction from Wikipedia", Proceedings of the 12th Machine Translation Summit, pp. 379-386, 2009.
  5. A. Ismail and S. Manandhar, "Bilingual lexicon extraction from comparable corpora using in-domain terms", Proceedings of the International Conference on Computational Linguistics, pp. 481-489, 2010.
  6. K. Tanaka and K. Umemura, "Construction of a bilingual dictionary intermediated by a third language", Proceedings of the 15th International Conference on Computational Linguistics, pp. 297-303, 1994.
  7. H. Wu and H. Wang, "Pivot language approach for phrase-based statistical machine translation", Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, pp. 856-863, 2007.
  8. T. Tsunakawa, N. Okazaki, and J. Tsujii, "Building bilingual lexicons using lexical translation probabilities via pivot languages", Proceedings of the International Conference on Computational Linguistics, pp. 18-22, 2008.
  9. R. Rapp, "Identifying word translations in nonparallel texts", Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, pp. 320-322, 1995.
  10. P. Fung. "A statistical view on bilingual lexicon extraction: from parallel corpora to non-parallel corpora", Proceedings of the Parallel Text Processing, pages 1-16, 1998.
  11. A. Hazem and E. Morin, "Adaptive dictionary for bilingual lexicon extraction from comparable corpora", Proceedings of the 8th International Conference on Language Resources and Evaluation pp. 288-292, 2012.
  12. G. Grefenstette, "Corpus-derived, first, second and third-order affinities, Proceedings of EuroLex, pp. 279-290, 1994.
  13. P. Fung and K. R. McKeown, "Finding terminology translations from non-parallel corpora", Proceedings of the 5th International Workshop of Very Large Corpora, pp. 192-202, 1997.
  14. F. Bond, R. Sulong, T. Yamazaki, and K. Ogura., "Design and construction of a machine- tractable Japanese-Malay dictionary", Proceedings of Machine Translation Summit VIII, pages 53-58.
  15. A. Lardilleux, Y. Lepage, and F. Yvon, "The contribution of low frequencies to multilingual sub-sentential alignment: a differential associative approach", International Journal of Advanced Intelligence, vol. 3, no. 2, pp. 189-217, 2011. https://doi.org/10.1504/IJAIP.2011.043426
  16. H.-W. Seo, H.-C. Kim, H.-Y. Cho, J.-H. Kim and S.-W. Yang, "Automatically constructing English-Korean parallel corpus from web documents", Proceedings of the 26th KIPS Fall Conference, vol. 13, no. 2, pp. 161-164, 2006.
  17. P. Koehn, "Europarl: a parallel corpus for statistical machine translation", Proceedings of the conference on the 10th Machine Translation Summit, pp. 79-86, 2005.
  18. W. Lee, S. Kim, G. Kim and K. Choi, "Implementation of modularized morphological analyzer", Proceedings of the 11th Annual Conference on Human and Cognitive Language Technology, pp. 123-136, 1999.
  19. H Schmid, "Probabilistic part-of-speech tagging using decision trees", Proceedings of International Conference on New Methods in Language Processing, pp. 44-49, 1994.

Cited by

  1. Extended pivot-based approach for bilingual lexicon extraction vol.38, pp.5, 2014, https://doi.org/10.5916/jkosme.2014.38.5.557
  2. Analyzing Errors in Bilingual Multi-word Lexicons Automatically Constructed through a Pivot Language vol.39, pp.2, 2015, https://doi.org/10.5916/jkosme.2015.39.2.172
  3. Evaluating a Pivot-Based Approach for Bilingual Lexicon Extraction vol.2015, pp.None, 2013, https://doi.org/10.1155/2015/434153