한국언어정보학회:학술대회논문집 (Proceedings of the Korean Society for Language and Information Conference)
- 한국언어정보학회 2007년도 정기학술대회
- /
- Pages.385-393
- /
- 2007
Semi-Automatic Annotation Tool to Build Large Dependency Tree-Tagged Corpus
초록
Corpora annotated with lots of linguistic information are required to develop robust and statistical natural language processing systems. Building such corpora, however, is an expensive, labor-intensive, and time-consuming work. To help the work, we design and implement an annotation tool for establishing a Korean dependency tree-tagged corpus. Compared with other annotation tools, our tool is characterized by the following features: independence of applications, localization of errors, powerful error checking, instant annotated information sharing, user-friendly. Using our tool, we have annotated 100,904 Korean sentences with dependency structures. The number of annotators is 33, the average annotation time is about 4 minutes per sentence, and the total period of the annotation is 5 months. We are confident that we can have accurate and consistent annotations as well as reduced labor and time.