Generalized LR Parser with Conditional Action Model(CAM) using Surface Phrasal Types

표층 구문 타입을 사용한 조건부 연산 모델의 일반화 LR 파서

  • 곽용재 (고려대학교 정보통신공동연구소) ;
  • 박소영 (고려대학교 컴퓨터학과) ;
  • 황영숙 (고려대학교 컴퓨터학과) ;
  • 정후중 (고려대학교 정보통신공동연구소) ;
  • 이상주 (㈜엔엘피솔루션) ;
  • 임해창 (고려대학교 컴퓨터학과)
  • Published : 2003.02.01

Abstract

Generalized LR parsing is one of the enhanced LR parsing methods so that it overcome the limit of one-way linear stack of the traditional LR parser using graph-structured stack, and it has been playing an important role of a firm starting point to generate other variations for NL parsing equipped with various mechanisms. In this paper, we propose a conditional Action Model that can solve the problems of conventional probabilistic GLR methods. Previous probabilistic GLR parsers have used relatively limited contextual information for disambiguation due to the high complexity of internal GLR stack. Our proposed model uses Surface Phrasal Types representing the structural characteristics of the parse for its additional contextual information, so that more specified structural preferences can be reflected into the parser. Experimental results show that our GLR parser with the proposed Conditional Action Model outperforms the previous methods by about 6-7% without any lexical information, and our model can utilize the rich stack information for syntactic disambiguation of probabilistic LR parser.

일반화 LR(Generalized LR, 이하 GLR) 파싱은 선형 스택을 사용하는 전통적인 LR 파싱 방식의 한계를 극복하도록 만들어진 LR 파싱 기법의 하나로서, LR 기법에 여러 가지 매커니즘을 통합하여 자연어 파싱에 응용하는 작업의 토대가 되어 왔다. 본 논문에서는 기존의 확률적 LR 파싱 기법이 가지고 있는 문제를 개선한 조건부 연산 모델(Conditional Action Model)을 제안한다. 기존의 확률적 LR 파싱 기법은 그래프 구조 스택의 복잡성으로 인해 상대적으로 제한된 문맥 정보만을 사용하여 왔다. 제안된 모델은 부분 생성 파스의 표현을 위하여 표층 구문 타입(Surface Phrasal Type)을 사용하여 그래프 구조 스택에 들어 있는 구문 구조를 기술함으로써 좀 더 세분된 구조적 선호도를 파서에 반영시킬 수 있다. 실험 결과, 어휘를 고려하지 않고 학습한 조건부 연산 모델로 구현된 본 GLR 파서는 기존의 방식보다 약 6-7%의 정확도 향상을 보였으며, 본 모델을 통해 풍부한 스택 정보를 확률적 LR 파서의 구조적 중의성 해결에 효과적으로 사용할 수 있음을 보였다.

Keywords

References

  1. Alfred V. Aho, Ravi Sethi and Jeffrey D. Ullman, Compilers : Principles, Techniques and Tools, 1986
  2. Masaru Tomita, Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems, Kluwer Academic Publishers, 1986
  3. Ulf Hermjakob, Rapid Parser Development: A Machine Learning for Korean, In Proceedings of the North American chapter of the Association for Computational Linguistics (NAACL-2000), pages 118-123, 2000
  4. Alon Lavie, GLR*: A Robust Parser For Spontaneously Spoken Language, Ph.D. thesis, School of Computer Science, Carnegic Mellon University, 1996
  5. Tobias Ruland, A Context-Sensitive Model for Probabilistic LR Parsing of Spoken Language with Transformation-Based Postprocessing, In Proceedings of the 18th International Conference on Computational Linguistics, pages 677-683, 2000 https://doi.org/10.3115/992730.992744
  6. Robert F. Simmons and Yeong-Ho Yu, The Acquisition and Use of Context-Dependent Grammar for English, In Proceedings of the 29th annual Meeting of the Association for Computational Linguistics, pages 122-129, 1991
  7. 곽용재, 한국어의 결정적 구문 분석을 위한 문맥 의존 문법 규칙의 획득과 적용, 고려대학교 컴퓨터학과 석사 학위 논문, 1999
  8. Ulf Hermjakob, Learning Parse And Translation Decisions From Examples with Rich Context, PH.D thesis, University of Texas at Austin, 1997
  9. Aboy Wong and Dekai Wu, Are Phrase Structured Grammars Useful in Statistical Parsing?, In Proceeding of 5th Natural Language Processing Pacific Rim Symposium, pages 120-124, 1999
  10. J. H. Wright and E. N. Wrigley, Generalized LR Parsing,GLR Parsing with Probability, Kluwer Academic Publishers, 1991
  11. Keh-Yi Su, Jong-Nae Wang, Mei-Hui Su and Jing-Shin Chang, Generalized LR Parsing,GLR Parsing with Scoring, Kluwer Academic Publishers, pages 93-112, 1991
  12. Ted Briscoe and John Carroll, Generalized Probabilistic LR Parsing of Natural Language(Corpora) with Unification-Based Grammars, Computational Linguistics, 19(1), pages 25-59, 1993
  13. Ted Briscoe and John Carroll, Developing and evaluating a probabilistic LR parser of part-of-speech and punctuation labels, In Proceedings of the 4th ACL/SIGPARSE International Workshop on Parsing Technologise, pages 48-58, 1995
  14. Kentaro Inui, Virach Sornlertlamvanich, Hozumi Tanaka and Takenobu Tokunaga, A New Formalization fo Probabilistic GLR Parsing, In Proceedings of the 5th International Workshop on Parsing Technologies, 1997
  15. Kentaro Inui, Virach Sornlertlamvanich, Hozumi Tanaka and Takenobu Tokunaga, Probabilistic GLR parsing: a new formalization and its impact on parsing performance, Journal of Natural Language Processing, Vol.5, No.3, pages 33-52, 1998
  16. E. Black, F. Jelinek, J. Lafferty, D. M. Magerman, R. Mercer and S. Roukos, Toward history-based grammars: Using richer models for probabilistic parsing, In Proceedings of the February 1992 DARPA Speech and Natural Language Wokshop, 1992
  17. Yong-Jae Kwak, So-Young Park, Hoojung Chung, Young-Sook Hwang, Sang-Zoo Lee and Hae-Chang Rim, GLR Parser with Condetional Action Model(CAM), In Proceedings of 6th Natural Language Processing Pacific Rim Symposium, pages 359-366, 2001
  18. Noam Chomsky Lectures on Government and Binding, Foris, 1981
  19. Michael Collins, Head-Driven Models for Natural Language Parsing, PH.D thesis, Dept. of Computer and Information Science, University of Pennsylvania, 1999
  20. So-Young Park, Young-Sook Hwang, Hoojung Chung, Yong-Jae Kwak and Hae-Chang Rim, A Feature-based Grammar for Korean Parsing, In proceedings of 5th Natural Language Processing Pacific Rim Symposium, pages 167-171, 1999
  21. 이상주, 자동 품사 부착을 위한 새로운 통계적 모형, 고려대학교 컴퓨터학과 박사 학위 논문, 1999
  22. So-Young Park, Yong-Jae Kwak, Hoojung Chung, Young-Sook Hwang, Sang-Zoo Lee, Hae-Chang Rim, A Feature-based Korean Grammar with Unification Constraints, In proceedings of International Conference on Speech Processing 2001, pages 995-999, 2001
  23. Joshua Goodman, Parsing Algorithms and Metrics, In Proceedings of the 34th Annual Meeting of the ACL, pages 177-183, 1996