Identification of Maximal-Length Noun Phrases Based on Expanded Chunks and Classified Punctuations in Chinese

확장청크와 세분화된 문장부호에 기반한 중국어 최장명사구 식별

  • 백설매 ((주)4U Applications) ;
  • 이금희 (포항공과대학교 컴퓨터공학과) ;
  • 김동일 (연변과학기술대학교 컴퓨터전자통신학부) ;
  • 이종혁 (포항공과대학교 컴퓨터공학과)
  • Published : 2009.04.15

Abstract

In general, there are two types of noun phrases(NP): Base Noun Phrase(BNP), and Maximal-Length Noun Phrase(MNP). MNP identification can largely reduce the complexity of full parsing, help analyze the general structure of complex sentences, and provide important clues for detecting main predicates in Chinese sentences. In this paper, we propose a 2-phase hybrid approach for MNP identification which adopts salient features such as expanded chunks and classified punctuations to improve performance. Experimental result shows a high quality performance of 89.66% in $F_1$-measure.

일반적으로 명사구는 기본명사구와 최장명사구로 분류되는데 최장명사구에 대한 정확한 식별은 문장의 전체적인 구문구조를 파악하고 정확한 지배용언을 찾아내는데 중요한 역할을 하게 된다. 본 논문에서는 확장된 청크(chunk) 개념과 다섯 개의 클래스로 세분화된 문장부호 정보를 자질로 사용한 두 단계 최장명사구 식별 기법을 제안한다. 제안한 기법은 기본모델보다 2.65% 향상된 평균 89.66%($F_1$-measure)의 우수한 성능을 보인다.

Keywords

References

  1. Didier Bourigault. 'Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases,' In: Christian Boiteted. Proceedings of the 15th International Conference on Computational Linguistics (COLING 92), Nantes, France, 977-981, 1992 https://doi.org/10.3115/993079.993111
  2. Atro Voutilainen, 'NPTool, a detector of English Noun Phrases,' In: Ken Church ed. Proceedings of the workshop on Very Large Corpora: Academic and Industrial Perspectives Ohio State University, Columbus, Ohio, USA, pages 48-57, 1993
  3. Kuang-hua Chen, Hsin-His Chen, 'Extracting Noun Phrases from Large-Scale Texts: A Hybrid Approach and Its Automatic Evaluation,' In: Proceedings of 32nd Annual Meeting of Association of 38 Computational Linguistics, New York: Academic Press, pages 234-241, 1994 https://doi.org/10.3115/981732.981764
  4. Wenjie Li, Haihua Pan, Ming Zhou, Kam-Fai Wong and Vincent Lum, 'Corpus-based Maximallength Chinese Noun Phrase Extraction,' In: Key-Sun Choi ed. Proceedings of Natural Language Processing Pacific Rim Symposium (NLPRS'95), Korea: Academic Press, pages 246-251 1995
  5. Angel S. Y. Tse, Kam-Fai Wong, & al. 'Effectiveness Analysis of Linguistics- and Corpusbased Noun Phrase Partial Parsers,' In: Key-Sun Choi ed. Proceedings of Natural Language Processing Pacific Rim Symposium (NLPRS'95), Korea: Academic Press, pages 252-257, 1995
  6. Qiang Zhou, Maosong Sun and Changning Huang, 'Automatically Identify Chinese Maximal Noun Phrase,' Technical Report 99001, State Key Lab. of Intelligent Technology and Systems, Dept. of Computer Science and Technology, Tsinghua University. 1998
  7. Changhao Yin, 'Identification of Maximal Noun Phrase in Chinese: Using the Head of Base Phrases,' Master Dissertation, 2005
  8. Erik F. Tjong, Kim Sang, Sabine Buchholz, 'Introduction to the CoNLL-2000 Shared Task: Chunking,' In: Proceedings of CoNLL-2000 and LLL-2000, pages 127-132, 2000 https://doi.org/10.3115/1117601.1117631
  9. Taku Kudo and Yuji Matsumoto, 'Chunking with Support Vector Machines,' In: Proceedings of Second Meeting of North American Chapter of the Association for Computational Linguistics (NAACL), pages 192-199. 2001 https://doi.org/10.3115/1073336.1073361
  10. Penn Chinese TreeBank 4.0 http://www.cis.upenn.edu/~chinese/
  11. 김영택 외 공저, '자연언어처리', 생능출판사, 2001
  12. Steven P. Abney, 'Parsing by Chunks,' In: Principle-Based Parsing, Kluwer Academic Publishers, Dordrecht, pages 257-278, 1991
  13. Ming Zhou, 'A Block-Based Robust Dependency Parser for Unrestricted Chinese Text,' In: Proceedings of the Second Chinese Language Processing Workshop, Hongkong, pages 78-84, 2000 https://doi.org/10.3115/1117769.1117782
  14. Yongmei Tan, Tianshun Yao, Qing Chen and Jongbo Zhu, 'Applying Conditional Random Fields to Chinese Shallow Parsing,' In: The Sixth International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2005), LNCS, Vol.3406, Springer, pages 167-176, 2005 https://doi.org/10.1007/978-3-540-30586-6_16
  15. Yuqi Zhang, Qiang Zhou, 'Chinese Base-Phrases Chunking,' In: COLING-02: The First SIGHAN Workshop on Chinese Language Processing, 39 pages 131-135, 2002 https://doi.org/10.3115/1118824.1118842
  16. Tie-jun Zhao, Mu-yun Yang, Fang Liu, Jian-min Yao, Hao Yu, 'Statistics Based Hybrid Approach to Chinese Base Phrase Identification,' In Proceedings of Second Chinese Language Processing Workshop, Hong Kong, China, pages 73-77. 2001 https://doi.org/10.3115/1117769.1117781
  17. Jun Zhao, Chang-ning Huang, 'The Model for Chinese BaseNP Structure Analysis,' In: Chinese J. Computer, 22(2), pages 141-146, 1999
  18. Shui-fang Lin, 'study and application of punctuation,' (标点符号的学习与应用). People's Publisher, P.R.China (in Chinese), 2000
  19. Lance A. Ramshaw and Mitchell P. Marcus. 'Text Chunking using transformation-based Learning,' In: Proceedings of the 3rd workshop on very large corpora, pages 88-94, 1995
  20. WEKA machine learning toolkit http://www.cs.waikato.ac.nz/~ml/
  21. LIBSVM: A Library for Support Vector Machines http://www.csie.ntu.edu.tw/~cjlin/libsvm/index.html