High Speed Korean Dependency Analysis Using Cascaded Chunking

Oh, Jin-Young;Cha, Jeong-Won;

doi:10.9709/JKSS.2010.19.1.103

Journal of the Korea Society for Simulation (한국시뮬레이션학회논문지)

Volume 19 Issue 1
/
Pages.103-111
/
2010
/
1225-5904(pISSN)

The Korea Society for Simulation (한국시뮬레이션학회)

DOI QR Code

High Speed Korean Dependency Analysis Using Cascaded Chunking

다단계 구단위화를 이용한 고속 한국어 의존구조 분석

오진영 (창원대학교 컴퓨터공학과) ;
차정원 (창원대학교 컴퓨터공학과)

Received : 2009.11.04
Accepted : 2010.01.12
Published : 2010.03.31

https://doi.org/10.9709/JKSS.2010.19.1.103 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Syntactic analysis is an important step in natural language processing. However, we cannot use the syntactic analyzer in Korean for low performance and without robustness. We propose new robust, high speed and high performance Korean syntactic analyzer using CRFs. We treat a parsing problem as a labeling problem. We use a cascaded chunking for Korean parsing. We label syntactic information to each Eojeol at each step using CRFs. CRFs use part-of-speech tag and Eojeol syntactic tag features. Our experimental results using 10-fold cross validation show significant improvement in the robustness, speed and performance of long Korea sentences.

한국어 처리에서 구문분석기에 대한 요구는 많은 반면 성능의 한계와 강건함의 부족으로 인해 채택되지 못하는 것이 현실이다. 본 연구는 구문분석을 레이블링 문제로 전환하여 성능, 속도, 강건함을 모두 실현한 시스템에 대해서 설명한다. 우리는 다단계 구 단위화(Cascaded Chunking)를 통해 한국어 구문분석을 시도한다. 각 단계에서는 어절별 품사 태그와 어절 구문표지를 자질로 사용하고 CRFs(Conditional Random Fields)를 이용하여 최적의 결과를 얻는다. 58,175문장 세종 구문 코퍼스로 10-fold Cross Validation(평균 10.97어절)으로 실험한 결과 평균 86.01%의 구문 정확도를 보였다. 이 결과는 기존에 제안되었던 구문분석기와 대등하거나 우수한 성능이며 기존 구문분석기가 처리하지 못하는 장문도 처리 가능하다.

Keywords

References

홍진표, 차정원, "어절패턴 사전을 이용한 새로운 한국어 형태소 분석기," 한국정보과학회 학술발표논문집, 35(1(C)), pp. 279-284, 2008년.
A.L. Berger, V.J. Della Pietra, and S.A. Della Pietra, "A maximum entropy approach to natural language processing," Computational Linguistics, vol. 22, no. 1, pp. 39-71, 1996.
Charniak, E., "Statistical parsing with a context-free grammar and word statistics," In Proceedings of the Fourteenth National Conference on Artificial Intelligence. Menlo Park, AAAI Press/MIT, pp. 598-603, 1997.
Charniak, E., "A Maximum-Entropy-Inspired Parse," In Proceedings of NAACL-2000, pp. 132-139, 2000.
Dan Klein and Christopher D. Manning., "Accurate Unlexicalized Parsing," ACL 2003, pp. 423-430, 2003.
Eugene Charniak and Mark Johnson, "Coarse-to-fine n-best parsing and MaxEnt discriminative reranking," In ACL 2005, pp. 173-180, 2005.
Geum, J. C. and G. Kim, "Implementation of HPSG parsig mechanism for Korean syntactic structure analysis," In Proceedings of the Spring Conference of Korea Information Science Society, pp. 139-142, 1998.
Hoojung Chung, Statistical Korean Dependency Parsing Model based on the surface Contextual Information, Ph.D. dissertation, 2004.
J. Lafferty, A. McCallum, and F. Pereira, "Conditional random fields: Probabilistic models for segmenting and labeling sequence data," In Proceedings. 18th International Conference on Machine Learning, pp. 282-289, 2001.
Jeongwon Cha, Geunbae Lee, and Jong-Hyeok Lee, "Morpho-syntactic categorial modeling of Korean," Computers and the Humanitie Journal, vol 36, no. 4, pp. 431-453, 2002. https://doi.org/10.1023/A:1020260012525
Jung, H.-S., J.-H. Kim, J.-S. Lee, S.-Y. Chun, and M.-J, "Park Design of Korean-English machine translation system (KoEng)," In Proceedings of the 1st Workshop of Machine Translation, pp. 87-96, 1989.
Kiyotaka Uchimoto, Masaki Murata, Satoshi Sekine, and Hitoshi Isahara, "Dependency model using posterior context," In Procedings of Sixth International Workshop on Parsing Technologies, pp. 321-322, 2000.
Kiyotaka Uchimoto, Satoshi Sekine, and Hitoshi Isahara, "Japanese Dependency Structure Analysis Based on Maximum Entropy Models," In Proceedings of the EACL, pp. 196-203, 1999.
Kudo, T. and Y. Matsumoto, "Japanese Dependency Analysis using cascaded Chunking," In Proceedings of the CoNLL-2003, pp. 63-69, 2002.
Masakazu Fujio and Yuji Matsumoto, "Japanese Dependency Structure Analysis based on Lexicalized Statistics," In Proceedings of EMNLP '98, pp. 87-96, 1998.
Msahiko Haruno, Satoshi Shirai, and Yoshifumi Ooyama, "Using Decision Trees to Construct a Practical Parser," Machine Learning, 34:131–149, 1999.
S. Della Pietra, V. Della Pietra, and J. Lafferty, "Inducing features of random fields," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 4, pp. 380-393, 1997. https://doi.org/10.1109/34.588021
Slav Petrov and Dan Klein, "Improved Inference for Unlexicalized Parsing," In proceedings of HLT-NAACL 2007, pp. 404-411, 2007.
Steven Abney, "Parsing By Chunking," In Principle- Based Parsing. Kluwer Academic Publishers, 1991.
Taku Kudo and Yuji Matsumoto, "Japanese Dependency Structure Analysis based on Support Vector Machines," In Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 18-25, 2000.
Yang, J, A study on the Korean analyzer based on HPSG, Master's thesis, Dept. of Computer Engineering. Seoul National University, 1990.
Yong-Hun Lee and Jong-Hyeok Lee, "Korean Parsing using Machine Learning Techniques," KCC 2008, pp. 285-288, 2008.
Yoon, D. H. and Y. T. Kim, "Analysis techniques for Korean sentence based on Lexical Functional Grammar," In Proceedings of the International Parsing Workshop '89, pp. 369-78, 1989.
Zhou, H., T. Yu, et al, "Japanese Dependency Analysis Based on SVMs and CRFs," International Journal of Mathematics and Computersin Simulation, 1(3): 233-237, 2007.

Journal of the Korea Society for Simulation (한국시뮬레이션학회논문지)

High Speed Korean Dependency Analysis Using Cascaded Chunking

다단계 구단위화를 이용한 고속 한국어 의존구조 분석

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)