Shallow Parsing on Grammatical Relations in Korean Sentences

한국어 문법관계에 대한 부분구문 분석

  • Published : 2005.10.01

Abstract

This study aims to identify grammatical relations (GRs) in Korean sentences. The key task is to find the GRs in sentences in terms of such GR categories as subject, object, and adverbial. To overcome this problem, we are fared with the many ambiguities. We propose a statistical model, which resolves the grammatical relational ambiguity first, and then finds correct noun phrases (NPs) arguments of given verb phrases (VP) by using the probabilities of the GRs given NPs and VPs in sentences. The proposed model uses the characteristics of the Korean language such as distance, no-crossing and case property. We attempt to estimate the probabilities of GR given an NP and a VP with Support Vector Machines (SVM) classifiers. Through an experiment with a tree and GR tagged corpus for training the model, we achieved an overall accuracy of $84.8\%,\;94.1\%,\;and\;84.8\%$ in identifying subject, object, and adverbial relations in sentences, respectively.

본 연구의 목적은 한국어 문장의 문법관계를 분석하는 데 있다. 주된 문제는 문장의 주어, 목적어, 부사어를 문장에서 찾아내는 것이다. 이 문제를 해결하기 위해서 한국어 구문 분석에서 발생하는 여러 중의성을 고려해야 한다. 우리는 문법관계의 중의성을 먼저 해결하고 그 다음에 주어진 명사구와 용언구의 문법관계 확률을 이용하여 용언구의 술어-논항 관계 중의성을 해소하는 통계적 방법을 제안한다. 제안된 방법은 어절간의 거리, 교차구조 금지, 일문일격의 원칙 둥의 한국어 언어 특성을 반영하였다. 용언구와 명사구 사이의 문법관계에 대한 확률은 지지벡터 분류기를 이용하여 추정하였다. 제안된 방법은 문법관계 및 구문구조 부착 말뭉치를 이용하여 자동으로 문법관계를 학습하였고 주어, 목적어, 부사 각각의 문법관계분석에 대해 각각 $84.8\%,\;94.1\%,\;84.8\%$의 성능을 얻었다.

Keywords

References

  1. Grenfenstette, G. (1997). SQLET: Short query linguistic expansion techniques, palliating one-word queries by providing intermediate structure to text. In Proc. of the RIAO'97, 500-509
  2. Palmer, M., Passonneau, R., Weir, C. & Finin, T (1993). The KERNEL text understanding system. Artificial Intelligence, 63, 17-68 https://doi.org/10.1016/0004-3702(93)90014-3
  3. Yeh, A. (2000). Using existing systems to supplement small amounts of annotated GRs training data. Proc. of the ACL2000, 126-132. Hong Kong https://doi.org/10.3115/1075218.1075235
  4. Grenfenstette, G. (1996). Light parsing as finite-state filtering. Workshop on Extended Finite State Models of Language, ECAI'96, Budapest, Hungary
  5. Ait-Mokhtar, S. & Chanod, J-P. (1997). Subject and object dependency extraction using finitestate transducers. In Proceedings of the ACL/EACL'97 Workshop on Automatic Information Extraction and Building of Lexical Semantic Resources, 71-77. Madrid, Spain
  6. Brants, T., Skut, W. & Krenn, B. (1997). Tagging grammatical functions. In Proceedings of the 2nd Conference on EMNLP, 64-74. Providence, RI.
  7. Argamon, S., Dagan, I. & Krymolowski, Y. (1998). A memory-based approach to learning shallow natural language patterns. In Proceedings of the 36th Annual Meeting of the ACL, 67-73. Montreal, Canada https://doi.org/10.3115/980451.980857
  8. Buchholz, S., Veenstra, J. & Daelemans, W. (1999). Cascaded GR assignment. In Proceedings of the Joint Conference on EMNLP and Very Large Corpora, 239-246
  9. C. Stanfill and D. Waltz, 'Toward Memory-based Reasoning,' Communications of the ACM, 29(12), pp. 1213-1228, 1986 https://doi.org/10.1145/7902.7906
  10. Carroll, J. & E. Briscoe (2002). High precision extraction of GRs. In Proceedings of the 19th International Conference on Computational Linguistics (COLING), Taipei, Taiwan
  11. 양재형, 김영택, '통계정보를 활용한 한국어 미지격 명사구의 문법기능 결정', 정보과학회논문지, Vol. 21, No.5, pp. 808-15, 1994. 5
  12. 양재형, 심광섭, '시소러스와 하위범주화 사전을 이용한 격모호성 해결', 정보과학회논문지(B) 제26권 제9호 1999. 9
  13. Lee, S., Seo, J. & Jang, T. Y. (2003). Analysis of the grammatical functions between adnoun and NPs in Korean using Support Vector Machines. Natural Language Engineering, Cambridge University Press, Vol. 9, No.3, pp. 269-280, Sept.
  14. Hindle, D. and Rooth, M. (1993). 'Structural ambiguity and lexical relations,' Computational Linguistics, 19:103-120
  15. Lee, K. J., Kim, J. H., & Kim, G. C. (1997). An Efficient Parsing of Korean Sentence Using Restricted Phrase Structure Grammar, Computer Processing of Oriental Languages, Vol. 12, No.1, pp. 49-62
  16. Collins, Michael. (1996). A New Statistical Parser Based on Bigram Lexical Dependencies. In Proceedings of ACL-96, Sant Cruz, CA, USA
  17. Charniak, E. (2001). Immediate-head parsing for language models. Proceedings of ACL 2001, 116-123
  18. Srinivas, B. (2000). A lightweight dependency analyzer for partial parsing. Natural Language Engineering, 6(2), 113-138 https://doi.org/10.1017/S1351324900002345
  19. Viterbi, A. J. (1967). Error bounds for convolution codes and an asymptotically optimal decoding algorithm. IEEE trans. on Information Theory, 12:260-269 https://doi.org/10.1109/TIT.1967.1054010
  20. Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. Springer, New York
  21. Lee, K. J., KIM, J. H., Choi, K. S. & Kim, G. C. (1996). Korean syntactic tagset for building a tree annotated corpus. Korean Journal of Cognitive Science, 7(4):7-24
  22. Rijsbergen, C.J.van. (1979). Information Retrieval. Buttersworth, London
  23. 김길창, 임해창, 서정연, 나동렬, '한국어 이해에 나타나는 중의성 문제 처리 모델에 관한 연구', 연구결과보고서, 한국과학재단, 1997.10