Eojeol Syntactic Tag Prediction of Korean Text using Entropy Guided CRF

Oh, Jin-Young;Cha, Jeong-Won;

Journal of KIISE:Computing Practices and Letters (한국정보과학회논문지:컴퓨팅의 실제 및 레터)

Volume 15 Issue 5
/
Pages.395-399
/
2009
/
1229-7712(pISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

Eojeol Syntactic Tag Prediction of Korean Text using Entropy Guided CRF

엔트로피 지도 CRF를 이용한 한국어 어절 구문태그 예측

오진영 (창원대학교 컴퓨터공학과) ;
차정원 (창원대학교 컴퓨터공학과)

Published : 2009.05.15

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In this work, we describe the syntactic tag prediction system for Korean using the decision tree and CRFs. Generally they select features by their intuition. It depends on their prior knowledge. In this works, we combine features systematically using the decision tree. We also analyze errors and optimize features for the best performance. From the result of experiments, we can see that the proposed method is effective for the syntactic tag estimation and will be helpful for the syntactic analysis.

본 연구에서는 의사결정나무와 Conditional Random Fields(CRFs)를 이용하여 한국어 어절 구문태그를 예측하는 시스템에 대해서 설명한다. 기계학습에서 자질의 선택은 작성자의 직관에 의해서 주로 이루어지는데 이는 작성자의 지식에 의존한다. 본 연구에서는 의사결정나무를 사용하여 보다 체계적으로 조합이 이루어지도록 하였다. 또한 오류 분석을 통하여 최적의 자질이 무엇인지를 파악하여 최고의 성능을 보이도록 하였다. 실험을 통하여 본 논문에서 제안한 방법이 성능향상에 도움이 된다는 것을 확인할 수 있어 앞으로 구문 분석에 많은 도움이 될 것이라고 확신한다.

Keywords

References

Abney, S. and S. P. Abney. Parsing by Chunks. Principle-Based Parsing. R. C. Berwick, S. P. Abney and C. Tenny, Kluwer Academic Publi-shers: pp. 257-278, 1991
차정원, "힌국어 결합범주문법을 위한 통계적 구문분석", 포항공대 박사학위 논문, 2002
세종계획 21, http://www.sejong.or.kr/
황영숙, 정후중, 박소영, 곽용재, 임해창, "자질집합선택 기반의 기계학습을 통한 한국어 기본구 인식의 성능향상", 정보과학회논문지, 소프트웨어 및 응용 제 29권, 제9호, pp. 654-668, 2002
박성배, 장병탁, "한국어 구 단위화를 위한 규칙 기반방법과 기억 기반 학습의 결함", 정보과학회논문지, 소프트웨어 및 응용 제 31권, 제3호, pp. 369-378, 2004
박성배, 장병탁, 김영탁, "k-NN으로 확장된 한국어 단위화", 한국정보과학회 가을 학술발표논문집, Vol.27, No.2, pp. 182-184, 2004
김미영, 강신재, 이종혁, "단위(chunks)분석과 의존문법에 기반한 한국어 구문분석". 한국정보과학회 봄 학술발표논문집, Vol.27, No.1, pp. 327-329, 2000
박의규, 나동열, "한국어 구문분석을 위한 구묶음 기반 의존명사 처리", 한국인지과학회 논문지 제 17권, 제2호, pp. 119-138, 2006
신효필, "최소자원 최대효과 구문분석", 한국 정보 과학회 언어 공학연구회, 학술대회지(한글 및 한국어정보처리), pp. 242-248, 1999
Bangalore, S. and A. K. Joshi, "Supertagging: An Approach to Almost Parsing," Computational Linguistics 25: pp. 237-265, 1999
Milidiu, R. L., C. N. d. Santos, et al., Pharse Chunking Using Entropy Guided Transformation Learning. Proceedings of ACL-08: HLT. Colum-bus, Ohio, Association for Computational Linguistics: pp. 647-655, 2008
J. Lafferty, A. McCallum, F. Pereira. "Conditional Random Fields: Probabilistic Models for Segmen-ting and Labeling Sequence Data," Proceedings of International Conference on Machine Learning, ICML-01, pp. 282-289, 2001
A. L. Berger, V. J. Della Pietra, S. A. Della Pietra, "A maximum entropy approach to natural language processing," Computational Linguistics, Vol.22, No.1, pp. 39-71, 1996

Journal of KIISE:Computing Practices and Letters (한국정보과학회논문지:컴퓨팅의 실제 및 레터)

Eojeol Syntactic Tag Prediction of Korean Text using Entropy Guided CRF

엔트로피 지도 CRF를 이용한 한국어 어절 구문태그 예측

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)