Automatic Word Spacing for Korean Using CRFs with Korean Features

한국어 특성과 CRFs를 이용한 자동 띄어쓰기 시스템

  • 이현우 (국립창원대학교 컴퓨터공학과 자연어처리연구실) ;
  • 차정원 (국립창원대학교 컴퓨터공학과)
  • Published : 2008.03.30

Abstract

In this work, we propose an automatic word spacing system for Korean using conditional random fields (CRFs) with Korean features. We map a word spacing problem into a classification problem in our work. We build a basic system which uses CRFs and Eumjeol bigram. After then, we analyze the result of inner-test. We extend a basic system added by some Korean features which are Josa, Eomi and two head Eumjeols of word extracting from lexicon. From the results of experiment, we can see that the proposed method is better than previous methods. Additionally the proposed method will be able to use mobile and speech applications because of very small size of model.

Keywords