DOI QR코드

DOI QR Code

다단계 구단위화를 이용한 고속 한국어 의존구조 분석

High Speed Korean Dependency Analysis Using Cascaded Chunking

  • 투고 : 2009.11.04
  • 심사 : 2010.01.12
  • 발행 : 2010.03.31

초록

한국어 처리에서 구문분석기에 대한 요구는 많은 반면 성능의 한계와 강건함의 부족으로 인해 채택되지 못하는 것이 현실이다. 본 연구는 구문분석을 레이블링 문제로 전환하여 성능, 속도, 강건함을 모두 실현한 시스템에 대해서 설명한다. 우리는 다단계 구 단위화(Cascaded Chunking)를 통해 한국어 구문분석을 시도한다. 각 단계에서는 어절별 품사 태그와 어절 구문표지를 자질로 사용하고 CRFs(Conditional Random Fields)를 이용하여 최적의 결과를 얻는다. 58,175문장 세종 구문 코퍼스로 10-fold Cross Validation(평균 10.97어절)으로 실험한 결과 평균 86.01%의 구문 정확도를 보였다. 이 결과는 기존에 제안되었던 구문분석기와 대등하거나 우수한 성능이며 기존 구문분석기가 처리하지 못하는 장문도 처리 가능하다.

Syntactic analysis is an important step in natural language processing. However, we cannot use the syntactic analyzer in Korean for low performance and without robustness. We propose new robust, high speed and high performance Korean syntactic analyzer using CRFs. We treat a parsing problem as a labeling problem. We use a cascaded chunking for Korean parsing. We label syntactic information to each Eojeol at each step using CRFs. CRFs use part-of-speech tag and Eojeol syntactic tag features. Our experimental results using 10-fold cross validation show significant improvement in the robustness, speed and performance of long Korea sentences.

키워드

참고문헌

  1. 홍진표, 차정원, "어절패턴 사전을 이용한 새로운 한국어 형태소 분석기," 한국정보과학회 학술발표논문집, 35(1(C)), pp. 279-284, 2008년.
  2. A.L. Berger, V.J. Della Pietra, and S.A. Della Pietra, "A maximum entropy approach to natural language processing," Computational Linguistics, vol. 22, no. 1, pp. 39-71, 1996.
  3. Charniak, E., "Statistical parsing with a context-free grammar and word statistics," In Proceedings of the Fourteenth National Conference on Artificial Intelligence. Menlo Park, AAAI Press/MIT, pp. 598-603, 1997.
  4. Charniak, E., "A Maximum-Entropy-Inspired Parse," In Proceedings of NAACL-2000, pp. 132-139, 2000.
  5. Dan Klein and Christopher D. Manning., "Accurate Unlexicalized Parsing," ACL 2003, pp. 423-430, 2003.
  6. Eugene Charniak and Mark Johnson, "Coarse-to-fine n-best parsing and MaxEnt discriminative reranking," In ACL 2005, pp. 173-180, 2005.
  7. Geum, J. C. and G. Kim, "Implementation of HPSG parsig mechanism for Korean syntactic structure analysis," In Proceedings of the Spring Conference of Korea Information Science Society, pp. 139-142, 1998.
  8. Hoojung Chung, Statistical Korean Dependency Parsing Model based on the surface Contextual Information, Ph.D. dissertation, 2004.
  9. J. Lafferty, A. McCallum, and F. Pereira, "Conditional random fields: Probabilistic models for segmenting and labeling sequence data," In Proceedings. 18th International Conference on Machine Learning, pp. 282-289, 2001.
  10. Jeongwon Cha, Geunbae Lee, and Jong-Hyeok Lee, "Morpho-syntactic categorial modeling of Korean," Computers and the Humanitie Journal, vol 36, no. 4, pp. 431-453, 2002. https://doi.org/10.1023/A:1020260012525
  11. Jung, H.-S., J.-H. Kim, J.-S. Lee, S.-Y. Chun, and M.-J, "Park Design of Korean-English machine translation system (KoEng)," In Proceedings of the 1st Workshop of Machine Translation, pp. 87-96, 1989.
  12. Kiyotaka Uchimoto, Masaki Murata, Satoshi Sekine, and Hitoshi Isahara, "Dependency model using posterior context," In Procedings of Sixth International Workshop on Parsing Technologies, pp. 321-322, 2000.
  13. Kiyotaka Uchimoto, Satoshi Sekine, and Hitoshi Isahara, "Japanese Dependency Structure Analysis Based on Maximum Entropy Models," In Proceedings of the EACL, pp. 196-203, 1999.
  14. Kudo, T. and Y. Matsumoto, "Japanese Dependency Analysis using cascaded Chunking," In Proceedings of the CoNLL-2003, pp. 63-69, 2002.
  15. Masakazu Fujio and Yuji Matsumoto, "Japanese Dependency Structure Analysis based on Lexicalized Statistics," In Proceedings of EMNLP '98, pp. 87-96, 1998.
  16. Msahiko Haruno, Satoshi Shirai, and Yoshifumi Ooyama, "Using Decision Trees to Construct a Practical Parser," Machine Learning, 34:131–149, 1999.
  17. S. Della Pietra, V. Della Pietra, and J. Lafferty, "Inducing features of random fields," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 4, pp. 380-393, 1997. https://doi.org/10.1109/34.588021
  18. Slav Petrov and Dan Klein, "Improved Inference for Unlexicalized Parsing," In proceedings of HLT-NAACL 2007, pp. 404-411, 2007.
  19. Steven Abney, "Parsing By Chunking," In Principle- Based Parsing. Kluwer Academic Publishers, 1991.
  20. Taku Kudo and Yuji Matsumoto, "Japanese Dependency Structure Analysis based on Support Vector Machines," In Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 18-25, 2000.
  21. Yang, J, A study on the Korean analyzer based on HPSG, Master's thesis, Dept. of Computer Engineering. Seoul National University, 1990.
  22. Yong-Hun Lee and Jong-Hyeok Lee, "Korean Parsing using Machine Learning Techniques," KCC 2008, pp. 285-288, 2008.
  23. Yoon, D. H. and Y. T. Kim, "Analysis techniques for Korean sentence based on Lexical Functional Grammar," In Proceedings of the International Parsing Workshop '89, pp. 369-78, 1989.
  24. Zhou, H., T. Yu, et al, "Japanese Dependency Analysis Based on SVMs and CRFs," International Journal of Mathematics and Computersin Simulation, 1(3): 233-237, 2007.