[KSCI] Korea Science Citation Index Service

Discriminative Models for Automatic Acquisition of Translation Equivalences

Zhang, Chun-Xiang (School of Computer Science and Technology, Harbin Institute of Technology)
Li, Sheng (School of Computer Science and Technology, Harbin Institute of Technology)
Zhao, Tie-Jun (School of Computer Science and Technology, Harbin Institute of Technology)

Publication Information

International Journal of Control, Automation, and Systems / v.5, no.1, 2007 , pp. 99-103 More about this Journal

Abstract

Translation equivalence is very important for bilingual lexicography, machine translation system and cross-lingual information retrieval. Extraction of equivalences from bilingual sentence pairs belongs to data mining problem. In this paper, discriminative learning methods are employed to filter translation equivalences. Discriminative features including translation literality, phrase alignment probability, and phrase length ratio are used to evaluate equivalences. 1000 equivalences randomly selected are filtered and then evaluated. Experimental results indicate that its precision is 87.8% and recall is 89.8% for support vector machine.

Keywords

Data mining; discriminative features; discriminative learning; translation equivalence;

Citations & Related Records

Times Cited By Web Of Science : 1 (Related Records In Web of Science)
Times Cited By SCOPUS : 0

Reference

1	K. Imamura and E. Sumita, 'Bilingual corpus cleaning focusing on translation literality,' Proc.of the 7th International Conference on Spoken Language Pro-cessing, pp. 1713-1716, 2002
2	Y. Li, H. Zaragoza, R. Herbrich, J. Shawe-Taylor, and J. Kandola, 'The perceptron algorithm with uneven margins,' Proc. of the 9th International Conference on Machine Learning, pp. 379-386, 2002
3	W. A. Gale and K. W. Church, 'Identifying word correspondences in parallel texts,' Proc. of the 4th DARPA Workshop on Speech and Natural Language, pp. 152-157, 1991
4	C. Cortes and V. Vapnik, 'Support-vector networks,' Machine Learning, vol. 20, no. 3, pp. 273-297, 1995
5	K. W. Church, 'Char_align: A program for aligning parallel texts at the character level,' Proc. of Meeting of the Association for Computational Linguistics, pp. 1-8, 1993
6	H. Kaji, Y. Kida, and Y. Morimoto, 'Learning translation templates from bilingual texts,' Proc. of the 14th International Conference on Computational Linguistics, pp. 672-678, 1992
7	D. W. Oard and B. J. Dorr, A Survey of Multilingual Text Retrieval, Technical Report, University of Maryland, 1996
8	Y. Zhang, S. Vogel, and A. Waibel, 'Integrated phrase segmentation and alignment model for statistical machine translation,' Proc. of International Conference on Natural Language Processing and Knowledge Engineering, 2003
9	P. F. Brown, S. A. Della Pietra, V. J. Della Pietra, and R. L. Mercer, 'The mathematics of statistical machine translation: Parameter estimation,' Computational Linguistics, vol. 19, no. 2, pp. 263-311, 1993
10	F. Wong, D. C. Hu, Y. H. Mao, and M. C. Dong, 'A flexible example annotation schema: Translation corresponding tree representation,' Proc. of the 20th International Conference on Computational Linguistics, pp. 1079-1085, 2004