Chunking Korean and an Application

한국어 낱말 묶기와 그 응용

  • Un Koaunghi (Research Institute of Language and Information, Korea University) ;
  • Hong Jungha (Research Institute of Language and Information, Korea University) ;
  • You Seok-Hoon (Research Institute of Language and Information, Korea University) ;
  • Lee Kiyong (Research Institute of Language and Information, Korea University) ;
  • Choe Jae-Woong (Research Institute of Language and Information, Korea University)
  • 은광희 (고려대학교 언어정보연구소) ;
  • 홍정하 (고려대학교 언어정보연구소) ;
  • 유석훈 (고려대학교 언어정보연구소) ;
  • 이기용 (고려대학교 언어정보연구소) ;
  • 최재웅 (고려대학교 언어정보연구소)
  • Published : 2005.12.01

Abstract

Application of chunking to English and some other European languages has shown that it is a viable parsing mechanism for natural languages. Although a small number of attempts have been made to apply chunking to the analysis of the Korean language, it still is not clear enough what criteria there are to identify appropriate units of chunking, and how efficient and valid the chunking algorithms would be when applied to some authentic Korean texts. The purpose of this research is to provide an alternative set of algorithms for chunking Korean, and to implement them, and to test them against some English-Korean parallel corpora, which is English and Korean bibles matched sentence by sentence. It is shown in the paper that aligning related texts and identifying matched phrases between the two languages can be achieved through appropriate chunking and matching algorithms defined on the morphologically-tagged parallel corpus. Chunking and matching processes are based on the content words rather than the function words, and the matching itself is done in terms of the transfer dictionary. The implementation is done in C and XML, and can be accessed through the Internet.

Keywords