Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2004.11B.6.749

Target Word Selection Disambiguation using Untagged Text Data in English-Korean Machine Translation  

Kim Yu-Seop (한림대학교 정보통신공학부)
Chang Jeong-Ho (서울대학교 컴퓨터공학부)
Abstract
In this paper, we propose a new method utilizing only raw corpus without additional human effort for disambiguation of target word selection in English-Korean machine translation. We use two data-driven techniques; one is the Latent Semantic Analysis(LSA) and the other the Probabilistic Latent Semantic Analysis(PLSA). These two techniques can represent complex semantic structures in given contexts like text passages. We construct linguistic semantic knowledge by using the two techniques and use the knowledge for target word selection in English-Korean machine translation. For target word selection, we utilize a grammatical relationship stored in a dictionary. We use k- nearest neighbor learning algorithm for the resolution of data sparseness Problem in target word selection and estimate the distance between instances based on these models. In experiments, we use TREC data of AP news for construction of latent semantic space and Wail Street Journal corpus for evaluation of target word selection. Through the Latent Semantic Analysis methods, the accuracy of target word selection has improved over 10% and PLSA has showed better accuracy than LSA method. finally we have showed the relatedness between the accuracy and two important factors ; one is dimensionality of latent space and k value of k-NT learning by using correlation calculation.
Keywords
Target Word Selection; LSA(Latent Semantic Analysis); PLSA(Probabilistic Latent Semantic Analysis); k-nearest Neighbor Learning; Correlation;
Citations & Related Records
연도 인용수 순위
  • Reference
1 D. Gildea and T. Hofmann, 'Topic Based Language Models using EM,' Proceedings of the 6th European Conference on Speech Communication and Technology, 1999
2 T. Hofmann, J. Puzicha and M. Jordan, 'Unsupervised Learning from Dyadic Data,' Advances in Neural Irformation Processing Systems, 11, 1999
3 T. Hoffmann, 'Probabilistic Latent Semantic Indexing,' Proceedings of the 22th Annual International ACM SIGIR conference on Research and Development in Information Retrieval(SIGIR99), pp.50-57, 1999   DOI
4 E. Voorhees and D. Harman, 'Overview of the Seventh Text Retrieval Conference(TREC-7),' Proceedings of the Seventh Text REtrieval Conference(TREC-7), pp.1-24, 1998
5 T. Cover and P. Hart, 'Nearest Neighbor Pattern Classification,' IEEE trans. on Information Theory, 13, pp. 21-27, 1967   DOI   ScienceOn
6 D. Aha, D. Kibler and M. Albert, 'Instance-based Learning Algorithms,' Machine Learning, 6(1) pp. 37-66, 1991   DOI
7 Y. Gotoh and S. Renals, 'Document Space Models using Latent Semantic Analysis,' Proceedings of Eurospeech-97, pp.1443-1446, 1997
8 http://www.smartran.co.kr/
9 M. Berry, T. Do, G. O'Brien, V. Krishna and S. Varadhan, 'SVDPACKC : Version 1.0 User's Guide,' University of Tennessee Technical Report, CS-93-194, 1993
10 F. R. K. Chung, 'Spectral Graph Theory,' Corference Board of the Mathematical Sciences, 92, American Mathematical Society, 1997
11 L. Bain and M. Engelhardt, 'Introduction to Probability and Mathematical Statistics,' Thomson Learning, pp.179-190, 1987
12 S. Deerwester, S. Dumais, G. Furnas, T. Landauer and R. Harshman, 'Indexing by Latent Semantic Analysis,' Journal of the American Society for Information Science, 41, pp. 391-407, 1990   DOI
13 T. Hoffmann, J. Puzicha and M. Jordan, 'Unsupervised Learning from Dyadic Data,' Advances in Neural Iriformation Processing Systems, 11, 1999
14 T. K. Landauer, P. W. Foltz and D. Laham, 'An Introduction to Latent Semantic Analysis,' Discourse Processes, 25, pp.259-284, 1998   DOI   ScienceOn
15 T. Hoffmann, 'Probabilistic Latent Semantic Analysis,' Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence(UAI 1999), 1999
16 P. Foltz, W. Kintsch and T. Landauer, 'The Measurement of Textual Coherence with Latent Semantic Analysis,' Discourse Processes, 25, pp.285-307, 1998   DOI
17 Y. Kim, J. Chang and B. Zhang, 'Target Word Selection using WordNet and Data-driven Model in Machine Translation,' Lecture Notes in Artificial Intelligence, 2417, p.607, 2002
18 Y. Kim, B. Zhang and Y. Kim, 'Collocation Dictionary Optimization using WordNet and k- nearset Neighbor Learning,' Machine Translation, 16, pp.89-108, 2001   DOI
19 N. Kim and Y. Kim, 'Determining Target Expression Using Parameterized Collocations from Corpus in Korean-English Machine Translation,' Proceedings of Pacific Rim International Corference on Artificial Intelligence, 1994
20 T. K. Landauer and S. T. Dumais, 'A Solution to Plato's Problem: The Latent Semantic Analysis Theory of the Acquisition, Induction, and Representation of Knowledge,' Psychological Review, 104, 1988   DOI   ScienceOn
21 I. Dagan, L. Lee and F. Fereira, 'Similarity-based Models of Word Cooccurrence Probabilities,' Machine Learning, 34, pp.43-69, 1999   DOI
22 I. Dagan and A. Itai, 'Word Sense Disambiguation Using a Second Language Monolingual Corpus,' Computational Linguistics, 20, pp.563-595, 1994