Morpheme-based Korean broadcast news transcription

형태소 기반의 한국어 방송뉴스 인식

  • Published : 2002.11.01

Abstract

In this paper, we describe our LVCSR system for Korean broadcast news transcription. The main focus is to find the most proper morpheme-based lexical model for Korean broadcast news recognition to deal with the inflectional flexibilities in Korean. There are trade-offs between lexicon size and lexical coverage, and between the length of lexical unit and WER. In our system, we analyzed the training corpus to obtain a small 24k-morpheme-based lexicon with 98.8% coverage. Then, the lexicon is optimized by combining morphemes using statistics of training corpus under monosyllable constraint or maximum length constraint. In experiments, our system reduced the number of monosyllable morphemes from 52% to 29% of the lexicon and obtained 13.24% WER for anchor and 24.97% for reporter.

Keywords