DOI QR코드

DOI QR Code

Exploiting Chunking for Dependency Parsing in Korean

한국어에서 의존 구문분석을 위한 구묶음의 활용

  • 남궁영 (한국해양대학교 컴퓨터공학과) ;
  • 김재훈 (한국해양대학교 컴퓨터공학과 및 해양인공지능융합전공)
  • Received : 2021.08.04
  • Accepted : 2021.12.31
  • Published : 2022.07.31

Abstract

In this paper, we present a method for dependency parsing with chunking in Korean. Dependency parsing is a task of determining a governor of every word in a sentence. In general, we used to determine the syntactic governor in Korean and should transform the syntactic structure into semantic structure for further processing like semantic analysis in natural language processing. There is a notorious problem to determine whether syntactic or semantic governor. For example, the syntactic governor of the word "먹고 (eat)" in the sentence "밥을 먹고 싶다 (would like to eat)" is "싶다 (would like to)", which is an auxiliary verb and therefore can not be a semantic governor. In order to mitigate this somewhat, we propose a Korean dependency parsing after chunking, which is a process of segmenting a sentence into constituents. A constituent is a word or a group of words that function as a single unit within a dependency structure and is called a chunk in this paper. Compared to traditional dependency parsing, there are some advantage of the proposed method: (1) The number of input units in parsing can be reduced and then the parsing speed could be faster. (2) The effectiveness of parsing can be improved by considering the relation between two head words in chunks. Through experiments for Sejong dependency corpus, we have shown that the USA and LAS of the proposed method are 86.48% and 84.56%, respectively and the number of input units is reduced by about 22%p.

본 논문은 한국어에 대해서 구묶음을 수행한 후에 의존구조를 분석하는 방법을 제안한다. 의존구조 분석은 단어의 지배어를 결정하는 과정이다. 지배어를 정할 때, 문법적인 지배어를 정할 것인지 의미적인 지배어를 정할 것인지가 고질적인 문제이다. 일반적으로는 문법적인 지배어를 정하고 있다. 예를 들면 문장 "밥을 먹고 싶다"에서 어절 "먹고"의 지배어로 "싶다"를 정한다. 그러나 "싶다"는 보조용언으로 의미적으로 지배어가 될 수 없다. 이와 같은 방법으로 구문을 분석하면 의미분석을 위해서 또 다른 변환이 있어야 한다. 본 논문에서는 이런 문제를 다소 완화하기 위해서 구묶음을 수행한 후에 구문을 분석하는 방법을 제안한다. 구묶음은 문장을 구성성분 단위로 분할하는 과정이며 구성성분은 내용어 말덩이와 기능어 말덩이로 구성된다. 구묶음을 수행하면 구문 분석의 입력이 되는 문장 성분의 수가 줄어들므로 구문 분석 속도가 개선될 수 있으며, 문장에서 중심어를 중심으로 하나의 말덩이로 묶이므로 말덩이에 대해서만 그 의존 관계를 파악할 수 있어 구문 분석의 효율성을 높일 수 있다. 본 논문은 세종의존말뭉치를 사용해서 성능을 분석했으며 UAS와 LAS가 각각 86.48%와 84.56%였으며 입력의 노드 수도 약 22% 정도 줄일 수 있었다.

Keywords

Acknowledgement

이 논문은 2017년도 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원(NRF-2017M3C4A7068187, 한국어 정보처리 원천 기술 연구 개발)을 받아 수행된 연구임.

References

  1. S. Kubler, R. McDonald, and J. Nivre, "Dependency Parsing," Morgan and Claypool Publishers, 2009.
  2. N. Jian, "A review of graph-based dependency parsing," Proceedings of the 5th International Conference on Computer, Automation and Power Electronics, pp.25-30, 2017.
  3. G. Bouma, D. Seddah, and D. Zeman, "From raw text to enhanced universal dependencies: The parsing shared task at IWPT 2021," Proceedings of the 17th International Conference on Parsing Technologies, pp.146-157, 2021.
  4. S. P. Abney, "Parsing by chunks," Principle-based Parsing, eds. R. Berwick, S. Abney, and C. Tenny, Kluwer Academic Publishers, 1991.
  5. S. P. Abney, "Part-of-speech and partial parsing," Corpus-Based Methods in Language and Speech Processing, eds. Young, S and Bloothooft, G., Kluwer Academic Publishers, pp.118-173, 1996.
  6. J. Kim, "A survey on partial parsing methods," Korea Information Processing Society Review, Vol.7, No.6, pp.83-96, 2000 (in Korean).
  7. C. Kim, C. Jung, Y. Kim, and Y. Seo, "An efficient Korean syntactic analyzer using partial combination of words," Proceedings of the 27th KISS Fall Conference, pp.597-600, 1995 (in Korean).
  8. I. Mel'cuk and J. Milicevic, "An Advanced Introduction to Semantics: A Meaning-Text Approach," Cambridge: Cambridge University Press, 2020.
  9. K. Lee and J. Kim, "Implementing Korean partial parser based on rules," The Transaction of the Korean Information Processing Society, Vol.10-B, No.4, pp.389-396, 2003 (in Korean).
  10. G. Grefenstette, "Light parsing as Finite State Filtering," In Proceedings of the Workshop on Extended Finite State Models of Language, pp.20-25, 1996.
  11. A. Molina and F. Pla, "Shallow parsing using specialized HMMs," Journal of Machine Learning Research, Vol.2, pp.595-613, 2002.
  12. L. Ramshaw and M. Marcus, "Text chunking using transformation-based learning," arXiv:cmp-lg/9505040, 1995.
  13. W. Daelemans, S. Buchholz, and J. Veenstra, J. "Memorybased shallow parsing," Proceedings of the Conference on Computational Natural Language Learning, pp.53-60, 1999.
  14. Y. Hwang, H. Chung, S. Park, Y. Kwak, and H. Rim, "Improving the performance of Korean text chunking by machine learning approaches based on feature set selection," Journal of KISS: Software and Applications, Vol.29, No.9/10, pp.654-668, 2002 (in Korean).
  15. S. Park and B. Zhang, "A hybrid of rule based method and memory based learning for Korean text chunking," Journal of KISS: Software and Applications, Vol.31, No.3, pp.369-378, 2004 (in Korean).
  16. Y. Namgoong, C. Kim, M. Cheon, H. Park, H. Yoon, M. Choi, J. Kim, and J. Kim, "Defining chunks and chunking using its corpus and Bi-LSTM/CRFs in Korean," Journal of KIISE, Vol.47, No.6, pp.587-595, 2020 (in Korean). https://doi.org/10.5626/jok.2020.47.6.587
  17. J. Kim, "Basic Units and their Tags for Korean Partial Parsing," Korea Maritime and Ocean University, Department of Computer Engineering, Technical Report KMU-NLP-TR-2000-006, 2000 (in Korean).
  18. The National Institute of the Korean Language, The 21 Century Sejong Plan, 2012 (in Korean).
  19. Y. Park and J. Seo, "Correction method for Korean dependency parsing using projectivity and re-searching," Korean Journal of Cognitive Science, Vol.22, No.4, pp.419-447, 2011 (in Korean).
  20. Y. Choi and K. Lee, "Korean dependency parser using higher-order features and stack-pointer networks," Journal of KIISE, Vol.46, No.7, pp.636-643, 2019 (in Korean). https://doi.org/10.5626/jok.2019.46.7.636
  21. J. Lim and H. Kim, "Korean dependency parsing using the self-attention head recognition model," Journal of KIISE, Vol.46, No.1, pp.22-30, 2019 (in Korean). https://doi.org/10.5626/jok.2019.46.1.22
  22. J. Li and J. Lee, "Korean transition-based dependency parsing with recurrent neural network," KIISE Transactions on Computing Practices, Vol.21, No.8, pp.567-571, 2015 (in Korean). https://doi.org/10.5626/KTCP.2015.21.8.567
  23. C. Jeong, J. Shin, J. Lee, and C. Ock, "Transition-based Korean dependency analysis system using semantic abstraction," Journal of KIISE, Vol.46, No.11, pp.1174-1185, 2019 (in Korean) https://doi.org/10.5626/jok.2019.46.11.1174
  24. S. Na, K. Kim, and Y. Kim, "Stack LSTMs for transition-based Korean dependency parsing," Proceedings of Korea Computer Congress, pp.732-734, 2016.
  25. S. Hong, S. Na, J. Shin, and Y. Kim, "BERT and ELMo for contextualized word embeddings in Korean Dependency Parsing," Proceedings of Korea Computer Congress, pp.491-493, 2019 (in Korean).
  26. C. Park, C. Lee, J. Lim, and H. Kim, "Korean dependency parsing with BERT," Proceedings of Korea Computer Congress, pp.530-532, 2019 (in Korean).
  27. C. Park and C. Lee, "Korean dependency parsing using pointer networks," Journal of KIISE, Vol.44, No.8, pp.822-831, 2017 (in Korean). https://doi.org/10.5626/JOK.2017.44.8.822
  28. K. Park and Y. Mun, "Two-phase shallow semantic parsing based on partial syntactic parsing," The Transaction of the Korean Information Processing Society, Vol.17-B, No.1, pp.85-92, 2010 (in Korean).
  29. Y. Choi and K. Lee, "Head-percolation rules of constituent-to-dependency conversion in Korean," Proceedings of the 30th Annual Conference on Human and Cognitive Language Technology, pp.514-519, 2018.
  30. H. Won and P. M. Ryu, "Semi-automatic generation of universal dependency corpus from Sejong phrase structured corpus," Proceedings of KCC, pp.1430-1432, 2019.
  31. T. Kim, P. Ryu, H. Kim, and H. Oh, "Unified methodology of multiple POS taggers for large-scale Korean linguistic GS set construction," Journal of KIISE, Vol.47, No.6, pp.596-602, 2020. https://doi.org/10.5626/jok.2020.47.6.596
  32. Y. Namgoong, C. Kim, M. Cheon, H. Park, H. Yoon, M. Choi, J. Kim, and J. Kim, "Building Korean dependency treebanks reflected chunking," Proceedings of the 31st Annual Conference on Human and Cognitive Language Technology, pp.133-138, 2019.
  33. X. Ma, Z. Hu, J. Liu, N. Peng, G. Neubig, and E. Hovy, "Stack-pointer networks for dependency parsing," Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp.1403-1414., 2018.
  34. J. Pennington, R. Socher, and C. D. Manning, "Glove: Global vectors for word representation," Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp.1532-1543, 2014.
  35. K. Jeon, H. Seo, Y. Nam, and J. Kim, "Performance improvement of chunking using cascaded machine learning methods," Proceedings of the 30th Annual Conference on Human and Cognitive Language Technology, pp.107-109, 2011 (in Korean).
  36. K. Ahn and Y. Seo, "Chunking of contiguous nouns using noun semantic classes," The Journal of the Korea Contents Association, Vol.10, No.3, pp.10-20, 2010 (in Korean). https://doi.org/10.5392/JKCA.2010.10.3.010
  37. E. Park and D. Ra, "Processing dependent nouns based on chunking for Korean syntactic analysis," Korean Journal of Cognitive Science, Vol.17, No.2, pp.119-138, 2006 (in Korean).
  38. M. Kim, S. Kang, and J. Lee, "Text chunking by rule and lexical information," Proceedings of the 30th Annual Conference on Human and Cognitive Language Technology, pp.103-109, 2000 (in Korean).
  39. J. Lim and H. Kim, "Korean dependency parsing using token-level contextual representation in pre-trained language model," Journal of KIISE, Vol.48, No.1, pp.27-34, 2021. https://doi.org/10.5626/JOK.2021.48.1.27