Browse > Article

Intra-Sentence Segmentation using Maximum Entropy Model for Efficient Parsing of English Sentences  

Kim Sung-Dong (한성대학교 컴퓨터공학부)
Abstract
Long sentence analysis has been a critical problem in machine translation because of high complexity. The methods of intra-sentence segmentation have been proposed to reduce parsing complexity. This paper presents the intra-sentence segmentation method based on maximum entropy probability model to increase the coverage and accuracy of the segmentation. We construct the rules for choosing candidate segmentation positions by a teaming method using the lexical context of the words tagged as segmentation position. We also generate the model that gives probability value to each candidate segmentation positions. The lexical contexts are extracted from the corpus tagged with segmentation positions and are incorporated into the probability model. We construct training data using the sentences from Wall Street Journal and experiment the intra-sentence segmentation on the sentences from four different domains. The experiments show about $88\%$ accuracy and about $98\%$ coverage of the segmentation. Also, the proposed method results in parsing efficiency improvement by 4.8 times in speed and 3.6 times in space.
Keywords
intra-sentence segmentation; maximum entropy model; parsing; machine translation;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 C. Lyon and B. Dickerson, 'Reducing the Complexity of Parsing by a Method of Decomposition,' In International Workshop on Parsing Technology, Sept., 1997
2 C. Lyon and R. Frank, 'Neural Network Design for a Natural Language Parser,' In International Conference on Artificial Neural Networks, 1995
3 Osamu Furuse and Hitoshi Iida, 'Constituent Boundary Parsing for Example-Based machine Translation,' In Proceedings of 1994 Conference on Computational Linguistics, pp. 105-111, 1994, Kyoto, Japan   DOI
4 S. D. Kim and Y. T. Kim, 'Sentence Analysis using Pattern Matching in English-Korean Machine Translation,' In Proceedings of the 1995 ICCPOL, pp. 25-28, 1995
5 J. Lafferty, D. Beeferman, and A. Berger, 'Text Segmentation using Exponential Models,' In Second Conference on Empirical Metlwds in Natural Language Processing, 1997, Providence, RI
6 David D. Palmer and Marti A. Hearst, 'Adaptive Multilingual Sentence Boundary Disambiguation,' Computational Linguistics, Vol. 23, No.2, pp. 241-265, 1997
7 J. C. Reynar and A. Ratnaparkhi. 'A Maximum Entropy Approach to Identifying Sentence Boundaries,' In Proceedings of the Fifth Conference on Applied Natural Language Processing, pp. 16-19, 1997, Washington D.C   DOI
8 Tom M. Mitchell, 'Machine Learning,' The McGraw-Hill Companies, Inc., 1997
9 김성동, 김영택. '효율적인 영어 구문 분석을 위한 문장 분할', 한국 정보과학회 논문지, Vol. 24, No. 8, pp. 884-890, 1997
10 A. Ratnaparkhi, 'A Maximum Entropy Part of Speech Tagger,' In E. Brill and K. Church, editors, Conference on Empirical Methods in Natural Language Processing, 1996, University of Pennsylvania
11 Eric S. Ristad, 'Maximum Entropy Modeling for Natural Language,' 1997, Madrid
12 F. Jelinek and R. L. Mercer, 'Interpolated Estimation of Markov Source Parameters from Sparse Data,' In Workshop on Pattern Recognition in Practice, 1980, Amsterdam, The Netherlands
13 S. M. Katz, 'Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer,' IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 35, 1987
14 Sheldon M. Ross, 'Introduction to Probability Models,' Academic Press, 1997
15 A. Ratnaparkhi, 'A Simple Introduction to Maximum Entropy Models for Natural Language Processing,' Technical report, Institute for Research in Cognitive Science, University of Pennsylvania, 1994, IRCS Report 97-08
16 E. T. Jaynes, 'Information Theory and Statistical Mechanics,' Physical Review, Vol. 106, pp. 620-630, 1957   DOI
17 Adam L. Berger, Stephen A. Della Pietra, and Vincent J. Pietra, 'A Maximum Entropy Approach to Natural Language Processing,' Computational Linguistics, Vol. 22, No.1, pp. 39-72, 1996