DOI QR코드

DOI QR Code

커널 Ripple-Down Rule을 이용한 태깅 말뭉치 오류 자동 수정

Automatic Correction of Errors in Annotated Corpus Using Kernel Ripple-Down Rules

  • 박태호 (창원대학교 컴퓨터공학과) ;
  • 차정원 (창원대학교 컴퓨터공학과)
  • 투고 : 2015.10.06
  • 심사 : 2016.03.30
  • 발행 : 2016.06.15

초록

자연어처리에서 기계학습을 위한 학습 말뭉치는 매우 중요하다. 정제된 대량의 말뭉치는 자연어처리 시스템에 직접 영향을 준다. 본 논문에서는 대량의 말뭉치 오류를 자동으로 수정하는 새로운 방법을 제안한다. 오류 말뭉치와 정답 말뭉치에서 사람이 태깅한 문서의 특성을 반영한 수정 규칙을 자동으로 생성하였다. 수정 규칙은 RDR(Ripple-Down Rules)를 사용하여 표현하였다. 수정 방법의 가치를 보이기 위해 품사 부착 말뭉치와 개체명 부착 말뭉치에 대해서 실험하였으며 두 분야에서 유의미한 결과를 보였다. 이 방법은 대량의 말뭉치를 제작할 때 오류를 최소화하는 방법으로 사용이 가능하다.

Annotated Corpus is important to understand natural language using machine learning method. In this paper, we propose a new method to automate error reduction of annotated corpora. We use the Ripple-Down Rules(RDR) for reducing errors and Kernel to extend RDR for NLP. We applied our system to the Korean Wikipedia and blog corpus errors to find the annotated corpora error type. Experimental results with various views from the Korean Wikipedia and blog are reported to evaluate the effectiveness and efficiency of our proposed approach. The proposed approach can be used to reduce errors of large corpora.

키워드

과제정보

연구 과제 주관 기관 : 창원대학교

참고문헌

  1. J. Hong, J. Cha, "Error Correction of Sejong Morphological Annotation Corpora using Part-of-Speech Tagger and Frequency Information," Journal of KIISE. SA, ISSN:1226-2285, Vol. 40, No. 7, pp. 417-428, 2013.
  2. M. Choi, H. Seo, H. Kwon and J. Kim, "Detecting and correcting errors in Korean POS-tagged corpora," Journal of the Korean Society of Marine Engineering, Vol. 37, No. 2, pp. 227-235, 2013. https://doi.org/10.5916/jkosme.2013.37.2.227
  3. Wu. X., "Knowledge acquisition from database," Ablex Publishing Corp., USA, 1995.
  4. Zhu. X., Wu. X. and Chen Q., "Eliminating Class Noise in Large Datasets," Proc. of the 20th ICML International Conference on Machine Learning (ICML 2003). Washington D. C., Vol. 3, pp. 920-927, 2003.
  5. Zhu. X., Wu. X. and Chen. Q., "Bridging Local and Gobal Data Cleansing: Identifying Class Noise in Large," Distributed Data Datasets, Data Mining and Knowledge Discovery, pp. 275-308, Dec. 2006.
  6. Guyon, Isabelle, Matic. N. and Vapnik. V., "Discovering informative patterns and data cleaning," Advances in Knpwledge Discovery and Data Mining, AAAI/MIT Press, pp. 181-203, 1996.
  7. Gamberger, Dragan, Lavrac. N. and Groselj. C., "Experiments with noise filtering in a medical domain," Proc. of 16th ICML Conference, pp. 143-151, San Francisco, CA, 1999.
  8. John, G. H., "Robust decision trees: Removing outliers from databases," Proc. of the First International Conference on Knowledge Discovery and Data Mining, pp. 174-179, AAAI Press, 1995.
  9. Zeng, Xinchuan and Martinez. T., "A noise filtering method using neural networks," SCIMA 2003. IEEE International Workshop on Soft Computing Techniques in Instrumentation, Measurement and Related Applications, pp. 26-31, 17, May 2003.
  10. Edwards, G., and Compton, P., "Peirs: A pathologist maintained expert system for the interpretation of chemical pathology reports," Pathology, Vol. 25, No. 1, pp. 27-34, 1993. https://doi.org/10.3109/00313029309068898
  11. Edwards, G., and Compton, P., "Experience with Ripple-Down Rules," Knowledge-Based System Journal, Vol. 19, Issue 5, pp. 356-362, 2006. https://doi.org/10.1016/j.knosys.2005.11.022
  12. Cao, T.M. and Compton, P. A., "Simulation Framework for Knowledge Acquisition Evaluation," Twenty- Eighth Australasian Computer Science Conference ACSC2005. Newcastle, Vol. 38, pp. 353-360, 2005.
  13. Ghassan Beydoun, PhD Thesis, "Incremental Knowledge Acquisition for Search Control Heuristics," UNSW, 2000.
  14. Edwards and Compton. (2007. May 09). [Online]. Available: http://www.cse.unsw.edu.au/-cs9416/06s1/lectures/rdr/RDR_slides.pdf(downloaded 2016. Apr. 7)
  15. Nguyen, D. Q., Nguyen, D. Q., Pham, D. D., & Pham, S. B., "RDRPOSTagger: A Ripple Down Rules-based Part-Of-Speech Tagger," EACL'14, pp. 17-20. 2014.