[KSCI] Korea Science Citation Index Service

Intra-Sentence Segmentation using Maximum Entropy Model for Efficient Parsing of English Sentences

Kim Sung-Dong (한성대학교 컴퓨터공학부)

Publication Information

Journal of KIISE:Software and Applications / v.32, no.5, 2005 , pp. 385-395 More about this Journal

Abstract

Long sentence analysis has been a critical problem in machine translation because of high complexity. The methods of intra-sentence segmentation have been proposed to reduce parsing complexity. This paper presents the intra-sentence segmentation method based on maximum entropy probability model to increase the coverage and accuracy of the segmentation. We construct the rules for choosing candidate segmentation positions by a teaming method using the lexical context of the words tagged as segmentation position. We also generate the model that gives probability value to each candidate segmentation positions. The lexical contexts are extracted from the corpus tagged with segmentation positions and are incorporated into the probability model. We construct training data using the sentences from Wall Street Journal and experiment the intra-sentence segmentation on the sentences from four different domains. The experiments show about $88\%$ accuracy and about $98\%$ coverage of the segmentation. Also, the proposed method results in parsing efficiency improvement by 4.8 times in speed and 3.6 times in space.

Keywords

intra-sentence segmentation; maximum entropy model; parsing; machine translation;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	C. Lyon and B. Dickerson, 'Reducing the Complexity of Parsing by a Method of Decomposition,' In International Workshop on Parsing Technology, Sept., 1997
2	C. Lyon and R. Frank, 'Neural Network Design for a Natural Language Parser,' In International Conference on Artificial Neural Networks, 1995
3	Osamu Furuse and Hitoshi Iida, 'Constituent Boundary Parsing for Example-Based machine Translation,' In Proceedings of 1994 Conference on Computational Linguistics, pp. 105-111, 1994, Kyoto, Japan DOI
4	S. D. Kim and Y. T. Kim, 'Sentence Analysis using Pattern Matching in English-Korean Machine Translation,' In Proceedings of the 1995 ICCPOL, pp. 25-28, 1995
5	J. Lafferty, D. Beeferman, and A. Berger, 'Text Segmentation using Exponential Models,' In Second Conference on Empirical Metlwds in Natural Language Processing, 1997, Providence, RI
6	David D. Palmer and Marti A. Hearst, 'Adaptive Multilingual Sentence Boundary Disambiguation,' Computational Linguistics, Vol. 23, No.2, pp. 241-265, 1997
7	J. C. Reynar and A. Ratnaparkhi. 'A Maximum Entropy Approach to Identifying Sentence Boundaries,' In Proceedings of the Fifth Conference on Applied Natural Language Processing, pp. 16-19, 1997, Washington D.C DOI
8	Tom M. Mitchell, 'Machine Learning,' The McGraw-Hill Companies, Inc., 1997
9	김성동, 김영택. '효율적인 영어 구문 분석을 위한 문장 분할', 한국 정보과학회 논문지, Vol. 24, No. 8, pp. 884-890, 1997
10	A. Ratnaparkhi, 'A Maximum Entropy Part of Speech Tagger,' In E. Brill and K. Church, editors, Conference on Empirical Methods in Natural Language Processing, 1996, University of Pennsylvania
11	Eric S. Ristad, 'Maximum Entropy Modeling for Natural Language,' 1997, Madrid
12	F. Jelinek and R. L. Mercer, 'Interpolated Estimation of Markov Source Parameters from Sparse Data,' In Workshop on Pattern Recognition in Practice, 1980, Amsterdam, The Netherlands
13	S. M. Katz, 'Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer,' IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 35, 1987
14	Sheldon M. Ross, 'Introduction to Probability Models,' Academic Press, 1997
15	A. Ratnaparkhi, 'A Simple Introduction to Maximum Entropy Models for Natural Language Processing,' Technical report, Institute for Research in Cognitive Science, University of Pennsylvania, 1994, IRCS Report 97-08
16	E. T. Jaynes, 'Information Theory and Statistical Mechanics,' Physical Review, Vol. 106, pp. 620-630, 1957 DOI
17	Adam L. Berger, Stephen A. Della Pietra, and Vincent J. Pietra, 'A Maximum Entropy Approach to Natural Language Processing,' Computational Linguistics, Vol. 22, No.1, pp. 39-72, 1996

1	Syntactic Category Prediction for Improving Parsing Accuracy in English-Korean Machine Translation / [Kim Sung-Dong;] / The KIPS Transactions:PartB
2	Intra-sentence Segmentation using Finite Automata for Efficient English Syntactic Analysis / [Kim, Sung-Dong;] / Journal of KIISE:Computing Practices and Letters

KSCI

Intra-Sentence Segmentation using Maximum Entropy Model for Efficient Parsing of English Sentences 효율적인 영어 구문 분석을 위한 최대 엔트로피 모델에 의한 문장 분할

Intra-Sentence Segmentation using Maximum Entropy Model for Efficient Parsing of English Sentences