DOI QR코드

DOI QR Code

Predicate Recognition Method using BiLSTM Model and Morpheme Features

BiLSTM 모델과 형태소 자질을 이용한 서술어 인식 방법

  • Nam, Chung-Hyeon (Department of Computer Engineering, Korea University of Technology and Education) ;
  • Jang, Kyung-Sik (Department of Computer Engineering, Korea University of Technology and Education)
  • Received : 2021.10.13
  • Accepted : 2021.10.28
  • Published : 2022.01.31

Abstract

Semantic role labeling task used in various natural language processing fields, such as information extraction and question answering systems, is the task of identifying the arugments for a given sentence and predicate. Predicate used as semantic role labeling input are extracted using lexical analysis results such as POS-tagging, but the problem is that predicate can't extract all linguistic patterns because predicate in korean language has various patterns, depending on the meaning of sentence. In this paper, we propose a korean predicate recognition method using neural network model with pre-trained embedding models and lexical features. The experiments compare the performance on the hyper parameters of models and with or without the use of embedding models and lexical features. As a result, we confirm that the performance of the proposed neural network model was 92.63%.

정보 추출 및 질의응답 시스템 등 다양한 자연어 처리 분야에서 사용되는 의미역 결정은 주어진 문장과 서술어에 대해 서술어와 연관성 있는 논항들의 관계를 파악하는 작업이다. 입력으로 사용되는 서술어는 형태소 분석과 같은 어휘적 분석 결과를 이용하여 추출하지만, 한국어 특성상 문장의 의미에 따라 다양한 패턴을 가질 수 있기 때문에 모든 언어학적 패턴을 만들 수 없다는 문제점이 있다. 본 논문에서는 사전에 언어학적 패턴을 정의하지 않고 신경망 모델과 사전 학습된 임베딩 모델 및 형태소 자질을 추가한 한국어 서술어를 인식하는 방법을 제안한다. 실험은 모델의 변경 가능한 파라미터에 대한 성능 비교, 임베딩 모델과 형태소 자질의 사용 유무에 따른 성능 비교를 하였으며, 그 결과 제안한 신경망 모델이 92.63%의 성능을 보였음을 확인하였다.

Keywords

Acknowledgement

This paper was supported by Education and Research Promotion Program of KoreaTech.

References

  1. B. J. Kim, C. M. Park, Y. Y. Choi, M. J. Kwon, and J. Y. Seo, "Korean Named Entity Recognition using Joint Learning with Language Model," in Proceeding of the 29st Conference on Human and Cognitive Language Technology, Daeju, pp. 333-337, 2017.
  2. M. S. Choi and B. W. On, "A Comparative Study on the Accuracy of Sentiment Analysis of Bi-LSTM Model by Morpheme Feature," in Proceeding of the 2019 KIIT Conference, Daejeon, pp. 307-309, 2019.
  3. P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, "Enriching Word Vectors with Subword Information," Transactions of the Association for Computational Linguistics, vol. 5, pp. 135-146, Aug. 2017. https://doi.org/10.1162/tacl_a_00051
  4. T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient Estimation of Word Representations in Vector Space," arXiv:1301.3781v3, 2013.
  5. A. Moschitti and C. A. Bejan, "A Semantic Kernel for Predicate Argument Classification," in Proceeding of the Eighth Conference on Computational Natural Language Learning, Boston, pp. 17-24, 2004.
  6. Y. Miyao and J. Tsujii, "Deep linguistic analysis for the accurate identification of predicate-argument relations," in Proceeding of the 20th international conference on Computational Linguistics, Switzerland, pp. 1392-1398, 2004.
  7. W. Che, Z. Li, Y. Hu, Y. Li, B. Qin, T. Liu, and S. Li, "A Cascaded Syntactic and Semantic Dependency Parsing System," in Proceeding of the 12th Conference on Computational Natural Language Learning, Menchester, pp. 238-242, 2008.
  8. D. Larionov A. Shelmanov E. Chistova, and I. Smirnov, "Semantic Role Labeling with Pretrained Language Models for Known and Unknown Predicates," in Proceeding of the Recent Advances in Natural Language Processing, Varna, pp. 619-628, 2019.
  9. Ulsan University Korean Language Processing Laboratory UCorpus-DP/SR [Internet]. Available: http://nlplab.ulsan.ac.kr/.
  10. K. B. Park. wordvectors [Internet]. Available: http://www.github.com/Kyubyong/wordvectors/.