Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2006.13B.1.063

PPEditor: Semi-Automatic Annotation Tool for Korean Dependency Structure  

Kim Jae-Hoon (한국해양대학교 컴퓨터공학과)
Park Eun-Jin (한국해양대학교 컴퓨터공학과)
Abstract
In general, a corpus contains lots of linguistic information and is widely used in the field of natural language processing and computational linguistics. The creation of such the corpus, however, is an expensive, labor-intensive and time-consuming work. To alleviate this problem, annotation tools to build corpora with much linguistic information is indispensable. In this paper, we design and implement an annotation tool for establishing a Korean dependency tree-tagged corpus. The most ideal way is to fully automatically create the corpus without annotators' interventions, but as a matter of fact, it is impossible. The proposed tool is semi-automatic like most other annotation tools and is designed to edit errors, which are generated by basic analyzers like part-of-speech tagger and (partial) parser. We also design it to avoid repetitive works while editing the errors and to use it easily and friendly. Using the proposed annotation tool, 10,000 Korean sentences containing over 20 words are annotated with dependency structures. For 2 months, eight annotators have worked every 4 hours a day. We are confident that we can have accurate and consistent annotations as well as reduced labor and time.
Keywords
Korean Language Processing; Syntactic Analysis; POS tagging; Partial Parsing; Dependency structure; Corpus annotation tool;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 김재훈, '품사 태깅 시스템의 신뢰도 측정', 정보처리학회논문지 B, 제8-B권, 제4호, pp.365-372, 2001   과학기술학회마을
2 D. Reidsma, N. Jovanovic, D. Hofs, Designing Annotation Tools based on Properties of Annotation Problems, CTIT Technical Reports TR-CTIT-04-45, University of Twente, The netherlands, 2004
3 H., Gaifman 'Dependency systems and phrase-structure systems,' Information and Control, Vol.8, pp.304-337, 1965   DOI
4 S. H?ifler, Link2Tree: A Dependency-Constituency Converter, Ph.D. Dissertation, Institute of Computational Linguistics University of Zurich, 2002
5 T. Morton and J. LaCivita, 'WordFreak: An open tool for linguistic annotation,' Proceedings of the NAACL, pp.17-18, 2003   DOI
6 D. Day, J. Aberdeen, L. Hirschman, R. Kozierok, P. Robinson, and M. Vilain, 'Mixed-Initiative Development of Language Processing Systems,' Proceedings of the ANLP, pp.348-355, 1997   DOI
7 T. Brants and O. Plaehn, 'Interactive corpus annotation,' Proceedings of the Second International Conference on Language Resources and Engineering (LREC 2000), pp.453-459, 2000
8 J. Carletta, D. McKelvie, A. Isard, A. Mengel, M. Klein, and M. B. Miler, 'A generic approach to software support for linguistic annotation using XML,' G. Sampson & D. McCarthy (Eds.), Readings in Corpus Linguistics, Continuum International, 2002
9 Burnard, L. The British National Corpus(BNC) Users Reference Guide, 2000
10 Atalay, N. B., Oflazer, K. and Say, B. 'The annotation process in the Turkish treebank,' Proceedings of the EACL Workshop on Linguistically Interpreted Corpora, Budapest, Hungary, 2003
11 문화체육부&과학기술처, 대한민국 국어정보베이스, 연구보고서, 1998
12 C. Kim, M. Hong, Y. Huang, Y. K. Kim, S. I. Yang, Y. A. Seo, and S.-K. Choi, 'Korean-Chinese Machine Translation Based on Verb Patterns,' Proceedings of The 5th Conference of the Association for Machine Translation in the Americas, pp.94-103, 2002
13 E. Brill, 'Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging', Computation Linguistics, Vol.21, No.4, pp.543-565, 1995
14 C. Laprun, J. G. Fiscus, J Garofolo, and S. Pajo, 'A practical introduction to ATLAS,' Proceedings of the Third International Conference on Language Resources and Evaluation, 2002
15 임준호, 박소영, 곽용재, 임해창, 김의수, 강범모, '구문패턴을 이용한 반자동 구문분석 말뭉치 구축도구', 제14회 한글 및 한국어정보처리 학술발표 논문집, pp.343-350, 2002
16 Marcus, M. P., Santorini, B. and Marcinkiewicz, M. A. 'Building a large annotated corpus of english: The Penn treebank', Computational Linguistics, Vol.19, pp.313-330, 1993
17 문화관광부, 21세기 세종계획 국어 기초자료 구축, 연구보고서, 2003