Analyzing Errors in Bilingual Multi-word Lexicons Automatically Constructed through a Pivot Language

Seo, Hyeong-Won;Kim, Jae-Hoon;

doi:10.5916/jkosme.2015.39.2.172

Journal of Advanced Marine Engineering and Technology

Volume 39 Issue 2
/
Pages.172-178
/
2015
/
2234-7925(pISSN)
/
2234-8352(eISSN)

The Korean Society of Marine Engineering (한국마린엔지니어링학회)

DOI QR Code

Analyzing Errors in Bilingual Multi-word Lexicons Automatically Constructed through a Pivot Language

Seo, Hyeong-Won (Department of Computer Engineering, Korea Maritime and Ocean University) ;
Kim, Jae-Hoon (Department of Computer Engineering, Korea Maritime and Ocean University)

Received : 2014.11.04
Accepted : 2014.12.17
Published : 2015.02.28

https://doi.org/10.5916/jkosme.2015.39.2.172 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Constructing a bilingual multi-word lexicon is confronted with many difficulties such as an absence of a commonly accepted gold-standard dataset. Besides, in fact, there is no everybody's definition of what a multi-word unit is. In considering these problems, this paper evaluates and analyzes the context vector approach which is one of a novel alignment method of constructing bilingual lexicons from parallel corpora, by comparing with one of general methods. The approach builds context vectors for both source and target single-word units from two parallel corpora. To adapt the approach to multi-word units, we identify all multi-word candidates (namely noun phrases in this work) first, and then concatenate them into single-word units. As a result, therefore, we can use the context vector approach to satisfy our need for multi-word units. In our experimental results, the context vector approach has shown stronger performance over the other approach. The contribution of the paper is analyzing the various types of errors for the experimental results. For the future works, we will study the similarity measure that not only covers a multi-word unit itself but also covers its constituents.

Keywords

References

D. Bouamor, N. Semmar, and P. Zweigenbaum, "Automatic construction of a multiword expressions bilingual lexicon : a statistical machine translation evaluation perspective", Proceedings of the 3rd Workshop on Cognitive Aspects of the Lexicon, pp. 95-108, 2012.
B. Daille, D. k. Samuel, and M. Emmanuel, "French-English multi-word terms alignment based on lexical content analysis", Proceedings of the 4th International Conference on Language Resources and Evaluation, vol. 3, pp. 919-922, 2004.
D. Wu and X. Xuanyin, "Learning an English-Chinese lexicon from a parallel corpus", Proceedings of the 1st Conference on Association for Machine Translation in the Americas, pp. 206-213, 1994.
H. W. Seo, H. S. Kwon, and J. H. Kim, "Extended pivotbased approach for bilingual lexicon extraction", Journal of the Korean Society of Marine Engineering, vol. 38, no. 5, pp. 557-565, 2014. https://doi.org/10.5916/jkosme.2014.38.5.557
J. H. Kim, H. W. Seo, and H. S. Kwon, "Bilingual lexicon induction thorough a pivot language", Journal of the Korean Society of Marine Engineering, vol. 37, no. 3, pp. 300-306, 2013. https://doi.org/10.5916/jkosme.2013.37.3.300
H. W. Seo, H. S. Kwon, M. A. Cheon, and J. H. Kim, "Constructing bilingual multiword lexicons for a resourcepoor language pair", Advanced Science and Technology Letters, vol. 54 (HCI 2014), pp. 95-99, 2014.
T. Tsunakawa, N. Okazaki, and J. Tsujii, "Building bilingual lexicons using lexical translation probabilities via pivot Languages", Proceedings of the 6th International Conference on Language Resources and Evaluation, pp. 1664-1667, 2008.
H. W. Seo, H. C. Kim, H. Y. Cho, J. H. Kim, and S. I. Yang, "Automatically constructing English-Korean parallel corpus from web documents", Proceedings of the 26th on Korea Information Processing Society Fall Conference, vol. 13, no, 2, pp.161-164, 2006 (in Korean).
P. Koehn, "Europarl : a parallel corpus for statistical machine translation", Proceedings of the Conference on the 10th Machine Translation Summit, pp. 79-86, 2005.
B. M. Kang and H. G. Kim, "Sejong Korean corpora in the making", Proceedings of the 4th International Conference on Language Resources and Evaluation, vol. 5, pp. 1747-1750, 2004.

Cited by

Natural-Annotation-based Unsupervised Construction of Korean-Chinese Domain Dictionary vol.322, pp.1757-899X, 2018, https://doi.org/10.1088/1757-899X/322/5/052054

Journal of Advanced Marine Engineering and Technology

Analyzing Errors in Bilingual Multi-word Lexicons Automatically Constructed through a Pivot Language

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)