DOI QR코드

DOI QR Code

Error Correction in Korean Morpheme Recovery using Deep Learning

딥 러닝을 이용한 한국어 형태소의 원형 복원 오류 수정

  • Received : 2015.05.27
  • Accepted : 2015.09.01
  • Published : 2015.11.15

Abstract

Korean Morphological Analysis is a difficult process. Because Korean is an agglutinative language, one of the most important processes in Morphological Analysis is Morpheme Recovery. There are some methods using Heuristic rules and Pre-Analyzed Partial Words that were examined for this process. These methods have performance limits as a result of not using contextual information. In this study, we built a Korean morpheme recovery system using deep learning, and this system used word embedding for the utilization of contextual information. In '들/VV' and '듣/VV' morpheme recovery, the system showed 97.97% accuracy, a better performance than with SVM(Support Vector Machine) which showed 96.22% accuracy.

한국어 형태소 분석은 교착어 특성상 난이도가 높은 작업이다. 그 중에서 형태소의 원형 복원 작업은 규칙이나 기분석 사전 정보 등을 활용하는 방법이 주로 연구되었다. 그러나 이러한 방법들은 어휘 수준의 문맥 정보를 보지 못하기 때문에 원형 복원에 한계가 있다. 본 논문에서는 최근 자연어처리에 연구되고 있는 기계학습 방법인 딥 러닝(deep learning)을 사용하여 형태소의 원형 복원 문제의 해결을 시도하였다. 문맥 정보를 보기 위해 단어 표현(word embedding)을 사용하여 기존의 방법들 보다 높은 성능을 보였다. 실험 결과, '들/VV'과 '듣/VV'의 복원 문제에 대해서 97.97%로 기존의 자연어처리에 쓰이는 기계학습 방법 중 하나인 SVM(Support Vector Machine)의 96.22% 보다 1.75% 높은 성능을 보였다.

Keywords

Acknowledgement

Grant : 휴먼 지식증강 서비스를 위한 지능진화형 WiseQA 플랫폼 기술 개발

Supported by : 정보통신기술진흥센터

References

  1. Kwangseob Shim, "Morpheme Restoration for Syllable-based Korean POS Tagging," Journal of KIISE : Software and Applications, 40.3: 182-189, 2013.
  2. Kwangseob Shim, "Syllable-based Korean Morphological Analysis using n-grams extracted from POS Tagged Corpus," Journal of KIISE : Software and Applications, 40.12: 869-876, 2013.
  3. Kwangseob Shim, "Syllable-based Probabilistic Models for Korean Morphological Analysis," Journal of KIISE, 41.9: 642-651, 2014. https://doi.org/10.5626/JOK.2014.41.9.642
  4. COLLOBERT, Ronan, et al., "Natural language processing (almost) from scratch," The Journal of Machine Learning Research, 12: 2493-2537, 2011.
  5. Changki Lee, Junseok Kim, Jeonghee Kim, "Korean Dependency Parsing using Deep Learning," Proc. of 26th Hangul and Korean Information Processing Conference, 2014.
  6. Changki Lee, Junseok Kim, Jeonghee Kim, Hyunki Kim, "Named Entity Recognition using Deep Learning," Proc. of the 41th KIISE Winter Conference, 2014.
  7. Jangseong Bae, Changki Lee, Soojong Lim, "Korean Sementic Role Labeling using Deep Learning," Proc. of the KIISE Korea Computer Congress, 2015.
  8. Cheon Eum Park, Gyoung Ho Choi, Changki Lee, "Korean Coreference Resolution with Guided Mention Pair Model using Deep Learning," Proc. of the KIISE Korea Computer Congress, 2015.
  9. Kyoungho Choi, Changki Lee, Cheongjae Lee, Jeongho Chang, Sangkeun Jung, "English Part-of-Speech Tagging using Recurrent Neural Network," Proc. of the KIISE Korea Computer Congress, 2015.
  10. Changki Lee, "Named Entity Recognition using Long Short-Term Memory Based Recurrent Neural Network," Proc. of the KIISE Korea Computer Congress, 2015.
  11. HINTON, Geoffrey; OSINDERO, Simon; TEH, Yee-Whye, "A fast learning algorithm for deep belief nets," Neural computation, 18.7: 1527-1554, 2006. https://doi.org/10.1162/neco.2006.18.7.1527
  12. GLOROT, Xavier; BORDES, Antoine; BENGIO, Yoshua, "Deep sparse rectifier networks," Proc. of the 14th International Conference on Artificial Intelligence and Statistics. JMLR W&CP Volume, pp. 315-323, 2011.
  13. Kwangseob Shim, "Syllable-based POS Tagging without Korean Morphological Analysis," Korean Journal of Cognitive Science, 22.3: 327-345, 2011. https://doi.org/10.19066/cogsci.2011.22.3.005
  14. Changki Lee, Junseok Kim, Jeonghee Kim, Hyunki Kim, "Joint Models for Korean Word Spacing and POS Tagging using Structural SVM," Journal of KIISE : Software and Applications, 40.12: 826-832, 2013.
  15. Han-young Seo, Sungki Choi, Hyuk-chul Kwon, "Improvement for Statistical Context-sensitive Spelling Correction using Korean WordNet," Proc. of the KIISE Korea Computer Congress 2014, pp. 607-609, 2014.
  16. Hyunsoo Choi, Aesun Yoon, Hyukchul Kwon, "Improving Recall for Context-Sensitive Spelling Correction by Weakening Constraints on Case Markers," Journal of KIISE : Software and Applications, 41.3: 249-256, 2014.
  17. Minho Kim, Hyuk-chul Kwon, Sungki Choi, "Context-sensitive Spelling Error Correction using Eojeol N-gram," Journal of KIISE, 41.12: 1081-1089, 201. https://doi.org/10.5626/JOK.2014.41.12.1081