Browse > Article

Segmentation of Long Chinese Sentences using Comma Classification  

Jin Me-Ixun (포항공과대학교 컴퓨터공학과)
Kim Mi-Young (성신여자대학교 컴퓨터정보학부)
Lee Jong-Hyeok (포항공과대학교 컴퓨터공학과)
Abstract
The longer the input sentences, the worse the parsing results. To improve the parsing performance, many methods about long sentence segmentation have been reserarched. As an isolating language, Chinese sentence has fewer cues for sentence segmentation. However, the average frequency of comma usage in Chinese is higher than that of other languages. The syntactic information that the comma conveys can play an important role in long sentence segmentation of Chinese languages. This paper proposes a method for classifying commas in Chinese sentences according to the context where the comma occurs. Then, sentences are segmented using the classification result. The experimental results show that the accuracy of the comma classification reaches 87.1%, and with our segmentation model, the dependency parsing accuracy of our parser is improved by 5.6%.
Keywords
Long sentence segmentation; Syntactic analysis; SVM; Comma;
Citations & Related Records
연도 인용수 순위
  • Reference
1 X. Carreras, L. Marquez, V. Punyakanok, and D. Roth. 'Learning and inference for clause identification,' Proc. of 13th European Conference on Machine Learning, Finland, pages 35-47, 2002
2 S. Kim, B.Zhang and Y. Kim. 'Learning-based intrasentence segmentation for efficient translation of long sentences,' Machine Translation, Vol.16, no.3, pages 151-174, 2001   DOI
3 R.Levy and C.D.Manning. 'Is it Harder to Parse Chinese, or the Chinese Treebank?,' Proc, of the ACL, 2003   DOI
4 D.M.Bikel and D.Chiang. 'Two statistical parsing models applied to the Chinese Treebank,' Proc. of the NAACL-ANLP workshop of Second Chinese Language Processing Workshop, pages 1 -6, 2000
5 P.L. Shiuan and C.T.H. Ann. 'A divide-andconquer strategy for parsing,' Proc. of the ACL/ SIGPARSE 5th international workshop on parsing technologies, Santa Cruz, USA, pages 57-66, 1996
6 N. Xue and F. Xia. 'The bracketing Guidelines for the Penn Chinese Treebank(3.0),' Technical Report. 00-08, University of Pennsylvania, IRCS Report, 2000
7 T.Joachims. 'Making large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning,' B. Scholkopf and C. Burges and A. Smola (ed.), MIT-Press, 1999
8 H. Yamada and Y. Matsumoto, 'Statistical Dependency Analysis with Support Vector Machines,' IWPT03, pages 195-206 2003
9 V. N. Vapnik. 'The nature of statistical learning theory,' Springer-Verlag New York, Inc., New York, NY, 1995
10 M.Y.Kim, S.J. Kang, J.H. Lee. 'Resolving ambiguity in Inter-chunk dependency parsing,' Proceedings of the sixth Natural Language Processing Pacific Rim Symposium, Tokyo, Japan, 2001
11 Geoffrey Nunberg. 'the linguistics of punctuation,' CSLI lecture notes. No. 18, University of Chicago Press, 1990
12 P.L. Shiuan and C.T.H. Ann. 'A divide-andconquer strategy for parsing,' Proc. of the ACL/ SIGPARSE 5th international workshop on parsing technologies, Santa Cruz, USA, pages 57-66, 1996
13 Briscoe, E. and J. Carroll, 'Developing and evaluating a probabilistic LR parser of part-of-speech and punctuation labels,' Proc. of the 4th ACL/ SIGPARSE International Workshop on Parsing Technologies, Prague, Czech Republic, pages.48-58, 1995
14 M. J. Collins, 'Head-driven Statistical Models for Natural Language Parsing,' Ph.D. thesis, University of Pennsylvania, Philadelphia, 1999
15 M. Bayparktar, B. Say and V. Akman, 'An analysis of English punctuation: the special case of comma,' International Journal of Corpus Linguistics, Vol.3, no.1, pages 33-57, 1998   DOI
16 B. Say and V. Akman, 'current approaches to punctuation in computational linguistics,' Computers and the Humanities, Vol.30, no.6, pages 457-469, 1997   DOI
17 E.F.T.K. Sang and H.Dejean, 'Introduction to the CoNLL-2001 shared task: clause identification,' Proc. of 5th Conference on Computational Natural Language Learning, pages 53-57, 2001
18 Roger Levy and Christopher Manning, 'Is it harder to parse Chinese, or the Chinese Treebank?,' Proc. of the 41st meeting of the Association for Computational Linguistics, pages 439-446, 2003   DOI
19 Shui-fang Lin. 'study and application of punctuation,'(In Chinese). People's Publisher, P.R.China.2000
20 B. Jones, 'Exploring the role of punctuation in parsing natural text,' Proc. of COLING-94, pages 421-425, 1994   DOI
21 B. Jones. 'Towards testing the syntax of punctuation,' Proc. of 34th meeting of the Association for Computational Linguistics, pages 363-365, 1996   DOI
22 B. Jones. 'What's the point? A(computational) theory of punctuation,' PhD Thesis, Centre for Cognitive Science, University of Edinburgh, Edinburgh, UK, 1996
23 R.L. Hill. 'A comma in parsing: A study into the influence of punctuation (commas) on contextually isolated 'garden-path' sentences,' M.Phil disseration, Dundee University, 1996
24 V.J. Leffa. 'clause processing in complex sentences,' Proc. of 1st International Conference on Language Resources and Evaluation, Spain, pages 937-943, 1998