DOI QR코드

DOI QR Code

Korean Semantic Role Labeling Based on Suffix Structure Analysis and Machine Learning

접사 구조 분석과 기계 학습에 기반한 한국어 의미 역 결정

  • 석미란 (필아이티(주) IT사업본부) ;
  • 김유섭 (한림대학교 융합소프트웨어학과)
  • Received : 2016.10.04
  • Accepted : 2016.10.12
  • Published : 2016.11.30

Abstract

Semantic Role Labeling (SRL) is to determine the semantic relation of a predicate and its argu-ments in a sentence. But Korean semantic role labeling has faced on difficulty due to its different language structure compared to English, which makes it very hard to use appropriate approaches developed so far. That means that methods proposed so far could not show a satisfied perfor-mance, compared to English and Chinese. To complement these problems, we focus on suffix information analysis, such as josa (case suffix) and eomi (verbal ending) analysis. Korean lan-guage is one of the agglutinative languages, such as Japanese, which have well defined suffix structure in their words. The agglutinative languages could have free word order due to its de-veloped suffix structure. Also arguments with a single morpheme are then labeled with statistics. In addition, machine learning algorithms such as Support Vector Machine (SVM) and Condi-tional Random Fields (CRF) are used to model SRL problem on arguments that are not labeled at the suffix analysis phase. The proposed method is intended to reduce the range of argument instances to which machine learning approaches should be applied, resulting in uncertain and inaccurate role labeling. In experiments, we use 15,224 arguments and we are able to obtain approximately 83.24% f1-score, increased about 4.85% points compared to the state-of-the-art Korean SRL research.

의미 역 결정은 한 문장에서 술어와 그것의 논항간의 의미 관계를 결정해주는 것을 말한다. 한편 한국어 의미 역 결정은 영어와는 다른 한국어 고유의 특이한 언어 구조 때문에 많은 어려움을 가지고 있는데, 이러한 어려움 때문에 지금까지 제안된 다양한 방법들을 곧바로 적용하기에 어려움이 있었다. 다시 말하자면, 지금까지 제안된 방법들은 영어나 중국어에 적용했을 때에 비해서 한국어에 적용하면 낮은 성능을 보여주었던 것이다. 이러한 어려움을 해결하기 위하여 본 연구에서는 조사나 어미와 같은 접사구조를 분석하는 것에 초점을 맞추었다. 한국어는 일본어와 같은 교착어의 하나인데, 이들 교착어에서는 매우 잘 정리되어 있는 접사구조가 어휘에 반영되어 있다. 교착어는 바로 이들 잘 정의된 접사 구조 때문에 매우 자유로운 어순이 가능하다. 또한 본 연구에서는 단일 형태소로 이루어진 논항은 기초 통계량을 기준으로 의미 역 결정을 하였다. 또한 지지 벡터 기계(Support Vector Machine: SVM)과 조건부 무작위장(Conditional Random Fields: CRFs)와 갗은 기계 학습 알고리즘을 사용하여 앞에서 결정되지 못한 논항들의 의미 역을 결정하였다. 본 논문에서 제시된 방법은 기계 학습 접근 방식이 처리해야 하는 논항의 범위를 줄여주는 역할을 하는데, 이는 기계 학습 접근은 상대적으로 불확실하고 부정확한 의미 역 결정을 하기 때문이다. 실험에서는 본 연구는 15,224 논항을 사용하였는데, 약 83.24%의 f1 점수를 얻을 수 있었는데, 이는 한국어 의미 역 결정 연구에 있어서 해외에서 발표된 연구 중 가장 높은 성능으로 알려진 것에 비해 약 4.85%의 향상을 보여준 것이다.

Keywords

References

  1. V. Punyakanok, D. Roth, and W. Yih,. "The Importance of Syntactic Parsing and Inference in Semantic Role Labeling," Computational Linguistics, Vol.34, No.2, pp.257-287, 2008. https://doi.org/10.1162/coli.2008.34.2.257
  2. L. Marquez, X. Carreras, K. C. Litkowski, and S. Stevenson, "Semantic Role Labeling: An Introduction to the Special Issue," Computational Linguistics, Vol.34, No.2, pp.145-159, 2008. https://doi.org/10.1162/coli.2008.34.2.145
  3. S. Pradhan, W. Ward, K. Hacioglu, J. H. Martin, and D. Jurafsky, "Semantic Parsing using Support Vector Machines," HLT-NAACL, pp.233-240, 2004.
  4. H. A. Schwartz, F. Gomez, and C. Millward, "A Semantic Feature for Verbal Predicate and Semantic Role Labeling using SVMs," FLAIRS Conference, pp.213-218, 2008.
  5. T. Mitsumori, M. Murata, Y. Fukuda, K. Doi, and H. Doi, "Semantic Role Labeling using Support Vector Machines," Association for Computational Linguistics, pp.197-200, 2005.
  6. R. T. Tsai, W. Chou, Y. Su, Y. Lin, C. Sung, H. Dai, I. T. Yeh, W. Ku, T. Sung, and W. Hsu, "BIOSMILE: A Semantic Role Labeling System for Biomedical Verbs using a Maximum-Entropy Model with Automatically Generated Template Features," BMC bioinformatics, Vol.8, No.325, pp.1-15, 2007. https://doi.org/10.1186/1471-2105-8-1
  7. N. Kwon, M. Fleischman, and E. Hovy, "Framenet-based Semantic Parsing using Maximum Entropy Models," Proceedings of the 20th International Conference on Computational Linguistics, 2004.
  8. T. Liu, W. Che, and S. Li, "Semantic Role Labeling with Maximum Entropy Classifier," Journal of Software, Vol.18, No.3, pp.565-573, 2007. https://doi.org/10.1360/jos180565
  9. Z. P. Jiang and H. T. Ng., "Semantic Role Labeling of NomBank: A Maximum Entropy Ap-proach," Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp.138-145, 2006.
  10. T. Cohn and P. Blunsom, "Semantic Role Labelling with Tree Conditional Random Fields," Proceedings of the Ninth Conference on Computational Natural Language Learning, pp.169-172, 2005.
  11. W. Aziz, M. Rios, and L. Specia, "Improving Chunk-based Semantic Role Labeling with Lexical Features," Proceedings of Recent Advances in Natural Language Processing, pp.226-232, 2011.
  12. J. Lafferty, A. McCallum, and F. Pereira, "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data," Proceedings of the 18th International Conference on Machine Learning, pp.282-289, 2001.
  13. F. Sha and F. Pereira, "Shallow Parsing with Conditional Random Fields," Proceedings of the Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics, pp.213-220, 2003.
  14. S. Arora, F. Lin, H. Shima, and M. Wang, "Tree Conditional Random Fields for Japanese Semantic Role Labeling," Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA., 2008.
  15. E. Moreau and I. Tellier, "The Crotal SRL System: A Generic Tool based on Tree Structured CRF," Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task, pp.91-96, 2009.
  16. M. Palmer, D. Gildea, and P. Kingsbury, "The Proposition Bank: An Annotated Corpus of Seman-tic Roles," Computational Linguistics, Vol.31, No.1, pp.71-106, 2005. https://doi.org/10.1162/0891201053630264
  17. Y. Kim, H. Chae, B. Snyder, and Y. Kim. "Training a Korean SRL System with Rich Morphological Features," Association for Computational Linguistics (ACL). pp.637-642, 2014.
  18. P. Resnik, "Using Information Content to Evaluate Semantic Similarity in a Texonomy," Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp.448-453, 1995.
  19. E. Terra and C. L A Clarke, "Frequency Estimates for Statistical Word Similarity Measures," Proceeding of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp.165-172, 2003.
  20. M. Seok, C. Park, J. Kim, H. Song, and Y. Kim, "Korean Semantic Role Labeling using Korean PropBank Frame Files," Proceeding of the International Multi-Conference on Engineering and Technology Innovation, 2015.
  21. T. Joachims, "Learning to Classify Text Using Support Vector Machines: The Springer International Series in Engineering and Computer Science," New York, NY., 2002.
  22. K. Crammer and Y. Singer, "On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines," Journal of Machine Learning Research, Vol.2, pp.265-292, 2001.
  23. I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun, "Support Vector Ma-chine Learning for Interdependent and Structured Output Spaces," Proceedings of the Twenty-first International Conference on Machine Learning, pp.104-111, 2004.
  24. C. Sutton and A. McCallum, "An Introduction to conditional random fields," Foundation and Trends in Machine Learning, Vol.4, No.4, pp.267-373, 2006. https://doi.org/10.1561/2200000013