Acoustic Modeling and Energy-Based Postprocessing for Automatic Speech Segmentation

자동 음성 분할을 위한 음향 모델링 및 에너지 기반 후처리

  • Published : 2002.06.01

Abstract

Speech segmentation at phoneme level is important for corpus-based text-to-speech synthesis. In this paper, we examine acoustic modeling methods to improve the performance of automatic speech segmentation system based on Hidden Markov Model (HMM). We compare monophone and triphone models, and evaluate several model training approaches. In addition, we employ an energy-based postprocessing scheme to make correction of frequent boundary location errors between silence and speech sounds. Experimental results show that our system provides 71.3% and 84.2% correct boundary locations given tolerance of 10 ms and 20 ms, respectively.

Keywords