[KSCI] Korea Science Citation Index Service

Improving the Performance of Korean Text Chunking by Machine learning Approaches based on Feature Set Selection

Hwang, Young-Sook (Dept.of Computer Sceince, Korea University)
Chung, Hoo-jung (Dept.of Computer Sceince, Korea University)
Park, So-Young (Dept.of Computer Sceince, Korea University)
Kwak, Young-Jae (Dept.of Computer Sceince, Korea University)
Rim, Hae-Chang (Dept.of Computer Sceince, Korea University)

Publication Information

Journal of KIISE:Software and Applications / v.29, no.9, 2002 , pp. 654-668 More about this Journal

Abstract

In this paper, we present an empirical study for improving the Korean text chunking based on machine learning and feature set selection approaches. We focus on two issues: the problem of selecting feature set for Korean chunking, and the problem of alleviating the data sparseness. To select a proper feature set, we use a heuristic method of searching through the space of feature sets using the estimated performance from a machine learning algorithm as a measure of "incremental usefulness" of a particular feature set. Besides, for smoothing the data sparseness, we suggest a method of using a general part-of-speech tag set and selective lexical information under the consideration of Korean language characteristics. Experimental results showed that chunk tags and lexical information within a given context window are important features and spacing unit information is less important than others, which are independent on the machine teaming techniques. Furthermore, using the selective lexical information gives not only a smoothing effect but also the reduction of the feature space than using all of lexical information. Korean text chunking based on the memory-based learning and the decision tree learning with the selected feature space showed the performance of precision/recall of 90.99%/92.52%, and 93.39%/93.41% respectively.

Keywords

Korean Base Phrase Recognition; Machine Learning; Decision Tree; Feature Set Selection; Memory -based Learning;

Citations & Related Records

Times Cited By KSCI : 2 (Citation Analysis)

Reference
Cited By KSCI

1	S. Abney, 'Partial Parsing via. Finite-State Cascades,' In Proc. of the ESSLLI '96 Robust Parsing Workshop, 1996 DOI
2	Young-Sook Hwang, Hoo-jung Chung, Yong-Jae Kwak, So-Young Park, 'Shallow Parsing by Weighted Probabilistic Sum,' In Proc. of the 19th International Conference on Computer Processing Languages(ICCPOL2001), 2001
3	Avrim L. Blum. (1997). 'Selection of Relevant Features and Examples in Machine Learning,' Journal or Artifical Intelligence, pp. 245-271 DOI ScienceOn
4	S. Abney, 'Parsing by Chunks,' In R.C. Berwick, S.P. Abney and C. Tenny, editors, Principle-Based Parsing: Computation and Psycholinguistics, Kluwer, pp. 257-278, 1991
5	Walter. Daelemans, and Antal van den Bosch. (1992). 'Generalisation performance of backpropagation learning on a syllabification task', In M. F. J. Drossaers and A. Nijholt, editors, Proc. of TWLT3: Connectionism and Natural Language Processing, pp. 27-37, Enschede. Twente University
6	J. R. Quinlan. (1993). 'C4.5: Programs for Machine Learning', Mateo: Morgan Kaufmann
7	Gregory Grefenstette, 'Light parsing as Finite State Filtering', In Proc. of the Workshop on Extended Finite State Models of Language, ECAI'96, 1996
8	D. W. Aha, D. Kibler, M. Albert. (1991). 'Instance-based learning algorithms,' Machine Learning, 6:37-66 DOI
9	K. W. Church, 'A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text,' In Proc. of the 2nd Conf. On Applied NLP, 1988 DOI
10	GuoDong ZHOU and Jian SU, 'Error-Driven HMM-based Chunk Tagger with Context-Dependent Lexicon,' In Proc. of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 2000
11	한국과학기술원, 국어정보베이스(v.1.0) (CD 배포판), 1997
12	Erik F. Tjong Kim Sang, W. Daelemans, H. Dejean, R. Koeling, Y. Krymolowski, V. Punyakanok, and D. Roth, 'Applying system combination to base noun phrase identification,' In Proc. of COLING, 2000 DOI
13	W. Skut and T. Brants, 'A Maximum-Entropy Partial Parser for Unrestricted Text,' In Proc. of the 6th Workshop on Very Large Corpora., 1998
14	Rob Koeling, 'Chunking with Maximum Entropy Models,' In Proc. of CoNLL-2000 and LLL-2000, pp.139-141, 2000 DOI
15	L.A. Ramshaw and M.P. Marcus, 'Text Chunking using Transformation- Based Learning,' In Proc. of the 3rd ACL workshop on Very Large Corpora, 1995
16	Claire Cardie and David Pierce, 'Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification,' In Proc. of COLING-ACL'98, pp. 218-224, 1998 DOI
17	Claire Cardie and David Pierce, 'The Role of Lexicalization and Pruning for Base Noun Phrase Grammars,' In Proc. of the 6th National Conference on Artificial Intelligence, 1999
18	Shlomo Argamon, Ido Dagan, and Yuval Krymolowski, 'A Memory-Based Approach to Learning Shallow Natural Language Patterns,' In Proc. of COLING-ACL'98, pp. 67-73, 1998 DOI
19	Taku Kudo and Yuji Matsumoto, 'Chunking with Support Vector Machines' In Proc. of NAACL-2001, 2001 DOI
20	W. Daelemans, S. Buchholz, J. Veenstra, 'Memory-Based Shallow Parsing,' In Proc. of CoNLL, Bergen, Norway, 1999
21	박성배, 장병탁, '최대 엔트로피 모델을 이용한 텍스트 단위와 학습', 제 13회 한글 및 한국어 정보처리학술대회, pp.130-137, 2001 과학기술학회마을
22	Hans van Halteren, 'A Default First Order Family Weight Determination Procedure for WPDV Models,' In Proc. of CoNLL-2000 and LLL-2000, pp.119-212, 2000 DOI
23	J. Veenstra, 'Fast NP Chunking Using Memory-Based Learning Techniques,' In Proc. of the 8th Belgian-Dutch Conference on Machine Learning, 1998
24	양재형, '규칙기반 학습에 의한 한국어의 기반 명사구 인식', 정보과학회 논문지: 소프트웨어 및 응용, 제27권 제 10호, pp. 1062-1071, 2000 과학기술학회마을
25	이신목, 강인호, 김길창, '방향성을 이용한 한국어 비재귀명사구 인식 모델', 제 13회 한글 및 한국어 정보처리학술대회, pp. 439-444, 2001
26	Juntae Yoon, et. al. 'Three Types of Chunking in Korean and Dependency Analysis based on Lexical Association,' In Proc. of the 18th International Conference on Computer Processing Languages(ICCPOL'99), pp. 59-65, 1999
27	신효필, '최소자원 최대효과의 구문분석', 제11회 한글 및 한국어 정보처리 학술대회, pp. 242-248, 1999 과학기술학회마을

1	Improving Parsing Efficiency Using Chunking in Chinese-Korean Machine Translation / [;;] / Journal of KIISE:Software and Applications
2	Eojeol Syntactic Tag Prediction of Korean Text using Entropy Guided CRF / [Oh, Jin-Young;Cha, Jeong-Won;] / Journal of KIISE:Computing Practices and Letters
3	Chunking of Contiguous Nouns using Noun Semantic Classes / [Ahn, Kwang-Mo;Seo, Young-Hoon;] / The Journal of the Korea Contents Association
4	Analysis of Korean Language Parsing System and Speed Improvement of Machine Learning using Feature Module / [Kim, Seong-Jin;Ock, Cheol-Young;] / Journal of the Institute of Electronics and Information Engineers

KSCI

Improving the Performance of Korean Text Chunking by Machine learning Approaches based on Feature Set Selection 자질집합선택 기반의 기계학습을 통한 한국어 기본구 인식의 성능향상

Improving the Performance of Korean Text Chunking by Machine learning Approaches based on Feature Set Selection