Browse > Article

Identification of Maximal-Length Noun Phrases Based on Expanded Chunks and Classified Punctuations in Chinese  

Bai, Xue-Mei ((주)4U Applications)
Li, Jin-Ji (포항공과대학교 컴퓨터공학과)
Kim, Dong-Il (연변과학기술대학교 컴퓨터전자통신학부)
Lee, Jong-Hyeok (포항공과대학교 컴퓨터공학과)
Abstract
In general, there are two types of noun phrases(NP): Base Noun Phrase(BNP), and Maximal-Length Noun Phrase(MNP). MNP identification can largely reduce the complexity of full parsing, help analyze the general structure of complex sentences, and provide important clues for detecting main predicates in Chinese sentences. In this paper, we propose a 2-phase hybrid approach for MNP identification which adopts salient features such as expanded chunks and classified punctuations to improve performance. Experimental result shows a high quality performance of 89.66% in $F_1$-measure.
Keywords
Maximal-Length Noun Phrase(MNP); Expanded Chunk; Classified Punctuation;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Changhao Yin, 'Identification of Maximal Noun Phrase in Chinese: Using the Head of Base Phrases,' Master Dissertation, 2005
2 Taku Kudo and Yuji Matsumoto, 'Chunking with Support Vector Machines,' In: Proceedings of Second Meeting of North American Chapter of the Association for Computational Linguistics (NAACL), pages 192-199. 2001   DOI
3 Yongmei Tan, Tianshun Yao, Qing Chen and Jongbo Zhu, 'Applying Conditional Random Fields to Chinese Shallow Parsing,' In: The Sixth International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2005), LNCS, Vol.3406, Springer, pages 167-176, 2005   DOI
4 Lance A. Ramshaw and Mitchell P. Marcus. 'Text Chunking using transformation-based Learning,' In: Proceedings of the 3rd workshop on very large corpora, pages 88-94, 1995
5 LIBSVM: A Library for Support Vector Machines http://www.csie.ntu.edu.tw/~cjlin/libsvm/index.html
6 Angel S. Y. Tse, Kam-Fai Wong, & al. 'Effectiveness Analysis of Linguistics- and Corpusbased Noun Phrase Partial Parsers,' In: Key-Sun Choi ed. Proceedings of Natural Language Processing Pacific Rim Symposium (NLPRS'95), Korea: Academic Press, pages 252-257, 1995
7 Shui-fang Lin, 'study and application of punctuation,' (标点符号的学习与应用). People's Publisher, P.R.China (in Chinese), 2000
8 김영택 외 공저, '자연언어처리', 생능출판사, 2001
9 Qiang Zhou, Maosong Sun and Changning Huang, 'Automatically Identify Chinese Maximal Noun Phrase,' Technical Report 99001, State Key Lab. of Intelligent Technology and Systems, Dept. of Computer Science and Technology, Tsinghua University. 1998
10 Steven P. Abney, 'Parsing by Chunks,' In: Principle-Based Parsing, Kluwer Academic Publishers, Dordrecht, pages 257-278, 1991
11 Erik F. Tjong, Kim Sang, Sabine Buchholz, 'Introduction to the CoNLL-2000 Shared Task: Chunking,' In: Proceedings of CoNLL-2000 and LLL-2000, pages 127-132, 2000   DOI
12 Ming Zhou, 'A Block-Based Robust Dependency Parser for Unrestricted Chinese Text,' In: Proceedings of the Second Chinese Language Processing Workshop, Hongkong, pages 78-84, 2000   DOI
13 Tie-jun Zhao, Mu-yun Yang, Fang Liu, Jian-min Yao, Hao Yu, 'Statistics Based Hybrid Approach to Chinese Base Phrase Identification,' In Proceedings of Second Chinese Language Processing Workshop, Hong Kong, China, pages 73-77. 2001   DOI
14 Wenjie Li, Haihua Pan, Ming Zhou, Kam-Fai Wong and Vincent Lum, 'Corpus-based Maximallength Chinese Noun Phrase Extraction,' In: Key-Sun Choi ed. Proceedings of Natural Language Processing Pacific Rim Symposium (NLPRS'95), Korea: Academic Press, pages 246-251 1995
15 Jun Zhao, Chang-ning Huang, 'The Model for Chinese BaseNP Structure Analysis,' In: Chinese J. Computer, 22(2), pages 141-146, 1999
16 WEKA machine learning toolkit http://www.cs.waikato.ac.nz/~ml/
17 Kuang-hua Chen, Hsin-His Chen, 'Extracting Noun Phrases from Large-Scale Texts: A Hybrid Approach and Its Automatic Evaluation,' In: Proceedings of 32nd Annual Meeting of Association of 38 Computational Linguistics, New York: Academic Press, pages 234-241, 1994   DOI
18 Yuqi Zhang, Qiang Zhou, 'Chinese Base-Phrases Chunking,' In: COLING-02: The First SIGHAN Workshop on Chinese Language Processing, 39 pages 131-135, 2002   DOI
19 Didier Bourigault. 'Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases,' In: Christian Boiteted. Proceedings of the 15th International Conference on Computational Linguistics (COLING 92), Nantes, France, 977-981, 1992   DOI
20 Penn Chinese TreeBank 4.0 http://www.cis.upenn.edu/~chinese/
21 Atro Voutilainen, 'NPTool, a detector of English Noun Phrases,' In: Ken Church ed. Proceedings of the workshop on Very Large Corpora: Academic and Industrial Perspectives Ohio State University, Columbus, Ohio, USA, pages 48-57, 1993