Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2002.9B.6.791

Korean Probabilistic Dependency Grammar Induction by morpheme  

Choi, Seon-Hwa (전남대학교 대학원 전산학과)
Park, Hyuk-Ro (전남대학교 전산학과)
Abstract
In this thesis. we present a new method for inducing a probabilistic dependency grammar (PDG) from text corpus. As words in Korean are composed of a set of more basic morphemes, there exist various dependency relations in a word. So, if the induction process does not take into account of these in-word dependency relations, the accuracy of the resulting grammar nay be poor. In comparison with previous PDG induction methods. the main difference of the proposed method lies in the fact that the method takes into account in-word dependency relations as well as inter-word dependency relations. To access the performance of the proposed method, we conducted an experiment using a manually-tagged corpus of 25,000 sentences which is complied by Korean Advanced Institute of Science and Technology (KAIST). The grammar induction produced 2,349 dependency rules. The parser with these dependency rules shoved 69.77% accuracy in terms of the number of correct dependency relations relative to the total number dependency relations for best-1 parse trees of sample sentences. The result shows that taking into account in-word dependency relations in the course of grammar induction results in a more accurate dependency grammar.
Keywords
dependency grammar; grammar induction;
Citations & Related Records
연도 인용수 순위
  • Reference
1 K. Lari and S. J Young, 'The estimation of stochastic context-free grammars using the inside-outside algorithm,' Computer Speech and Language, 4, pp.35-56, 1990   DOI
2 S. F. Chen, 'Bayesian grammar induction for language modeling.' In 33rd Annual Meeting of the Association for Computational Linguistics, pp.228-235, 1995   DOI
3 김형근, '확률적 의존문법과 한국어 구분 분석', 석사논문, 한국과학기술원, 1994
4 홍영국, 이종혁, 이근배, '의존문법에 기반을 둔 한국어 구문 분석기' 한국정보과학회 봄 학술발표논문집, pp.781-784, 1993
5 F. Jelinek, J D. Lafferty and R. L. Mercer, 'Basic methods of Probabilistic Context Free Grammars,' Technical Report, IBM-T. J Watson Research Center, 1990
6 J. K. Baker, 'Trainable grammars for speech recognition,' In 97th Meeting of the Acoustical Society of America, pp.547-550, 1979
7 De. Marcken, 'Lexical heads, phase structure and the induction of grammar,' In Third Workshop on Very Large Corpora, 1995
8 Black, Lafferty and S. Roukos, 'Development and evaluation of a road-coverage probabilistic grammar of Englishlanguage computer manuals,' In 30th Annual Meeting of the Assocation for Computational Linguistics, pp.185-192, 1992
9 E. Brill and M. Marcus, 'Tagging an unfamiliar text with minimal human supersision,' In Fall Symposium on Probabilistic Approaches to Natural Language, 1992
10 E. Charniak, 'Statistical Language Learning,' The MIT Press, 1993
11 F. Pereira and Y. Schabes, 'Inside-outside reestimation from partially bracketed corpora,' In 30th Annual Meeting of the Association for Computational Linguistics, pp.128-135, 1992
12 G. Carroll and E. Charniak, 'Learning probabilistic dependency grammars from labeled text,' In Working Notes Fall Symposium Series AAAI, pp.25- 31, 1992
13 G. Carroll and E. Charniak, 'Two Experiments on Learning Probabilistic Dependency Grammars for Corpora,' Technical Report CS-92-16, Brown University, 1992
14 H. Gaifman, 'Dependency systems and phrase-structure system,' Information and Control, 8, pp.304-337, 1965   DOI
15 M. A. Covington, 'A Dependency Parser for Variable-WordOrder Languages,' Technical Report Al-1990-01, The University of Georgia, 1990
16 나동렬, '한국어 파싱에 대한 고찰' 한국정보과학회지, 12(8), pp.33-46, 1994
17 M. J Collins, 'A New Statistical Parser Based on Bi-gram Lexical Dependencies,' In COLING-96, 1996
18 P. F. Brown, V. J. Della Pietra, P. V. deSouza, J. C. Lai and R. L. Mercer, 'Class-Based n-gram Models of Natural Language,' Computational Linguistics, 18(4) : pp.467-480, 1992
19 M. Magerman, 'Natural Language Parsingas Statistical pattern Recognition,' PhD thesis, Standford University, 1994
20 이공주, '언어적 특성에 기반한 한국어의 확률적 구문분석' 박사논문, 한국과학기술원, 1998
21 이공주, 김재훈, 장병규, 최기선, 김길창, '한국어 구문트리 태깅 코퍼스 작성을 위한 한국어 구문 태그' 한국과학기술원 전산학과 기술보고서, CS/TR-96-102, http://hanul.kaist.ac.kr/~kjlee/paper.html, 1996
22 이승미, '확률 의존문법 학습' 박사논문, 한국과학기술원, 1998