Discriminative Models for Automatic Acquisition of Translation Equivalences

  • Zhang, Chun-Xiang (School of Computer Science and Technology, Harbin Institute of Technology) ;
  • Li, Sheng (School of Computer Science and Technology, Harbin Institute of Technology) ;
  • Zhao, Tie-Jun (School of Computer Science and Technology, Harbin Institute of Technology)
  • Published : 2007.02.28

Abstract

Translation equivalence is very important for bilingual lexicography, machine translation system and cross-lingual information retrieval. Extraction of equivalences from bilingual sentence pairs belongs to data mining problem. In this paper, discriminative learning methods are employed to filter translation equivalences. Discriminative features including translation literality, phrase alignment probability, and phrase length ratio are used to evaluate equivalences. 1000 equivalences randomly selected are filtered and then evaluated. Experimental results indicate that its precision is 87.8% and recall is 89.8% for support vector machine.

Keywords

References

  1. W. A. Gale and K. W. Church, 'Identifying word correspondences in parallel texts,' Proc. of the 4th DARPA Workshop on Speech and Natural Language, pp. 152-157, 1991
  2. H. Kaji, Y. Kida, and Y. Morimoto, 'Learning translation templates from bilingual texts,' Proc. of the 14th International Conference on Computational Linguistics, pp. 672-678, 1992
  3. D. W. Oard and B. J. Dorr, A Survey of Multilingual Text Retrieval, Technical Report, University of Maryland, 1996
  4. Y. Zhang, S. Vogel, and A. Waibel, 'Integrated phrase segmentation and alignment model for statistical machine translation,' Proc. of International Conference on Natural Language Processing and Knowledge Engineering, 2003
  5. F. Wong, D. C. Hu, Y. H. Mao, and M. C. Dong, 'A flexible example annotation schema: Translation corresponding tree representation,' Proc. of the 20th International Conference on Computational Linguistics, pp. 1079-1085, 2004
  6. K. Imamura and E. Sumita, 'Bilingual corpus cleaning focusing on translation literality,' Proc.of the 7th International Conference on Spoken Language Pro-cessing, pp. 1713-1716, 2002
  7. P. F. Brown, S. A. Della Pietra, V. J. Della Pietra, and R. L. Mercer, 'The mathematics of statistical machine translation: Parameter estimation,' Computational Linguistics, vol. 19, no. 2, pp. 263-311, 1993
  8. K. W. Church, 'Char_align: A program for aligning parallel texts at the character level,' Proc. of Meeting of the Association for Computational Linguistics, pp. 1-8, 1993
  9. C. Cortes and V. Vapnik, 'Support-vector networks,' Machine Learning, vol. 20, no. 3, pp. 273-297, 1995
  10. Y. Li, H. Zaragoza, R. Herbrich, J. Shawe-Taylor, and J. Kandola, 'The perceptron algorithm with uneven margins,' Proc. of the 9th International Conference on Machine Learning, pp. 379-386, 2002