유성음과 무성음의 경계를 이용한 연속 음성의 세그먼테이션

Segmentation of continuous Korean Speech Based on Boundaries of Voiced and Unvoiced Sounds

  • 유강주 (한국해양대학교 대학원 제어계측공학과) ;
  • 신욱근 (한국해양대학교 자동화정보공학부)
  • Yu, Gang-Ju (Dept.of Control Instrumentation Engineering, Graduate School of Korea Maritime University) ;
  • Sin, Uk-Geun
  • 발행 : 2000.07.01

초록

In this paper, we show that one can enhance the performance of blind segmentation of phoneme boundaries by adopting the knowledge of Korean syllabic structure and the regions of voiced/unvoiced sounds. eh proposed method consists of three processes : the process to extract candidate phoneme boundaries, the process to detect boundaries of voiced/unvoiced sounds, and the process to select final phoneme boundaries. The candidate phoneme boudaries are extracted by clustering method based on similarity between two adjacent clusters. The employed similarity measure in this a process is the ratio of the probability density of adjacent clusters. To detect he boundaries of voiced/unvoiced sounds, we first compute the power density spectrum of speech signal in 0∼400 Hz frequency band. Then the points where this paper density spectrum variation is greater than the threshold are chosen as the boundaries of voiced/unvoiced sounds. The final phoneme boundaries consist of all the candidate phoneme boundaries in voiced region and limited number of candidate phoneme boundaries in unvoiced region. The experimental result showed about 40% decrease of insertion rate compared to the blind segmentation method we adopted.

키워드

참고문헌

  1. $Torbj{\Phi}$rn Svendsen and Frank K. Soong, 'On the Automatic Segmentation of Speech Signals,' Proc. ICASSP 87, pp.77-80, 1987
  2. Youngjoo Sub and Youngjik Lee, 'Phoneme Segmentation of Continuous Speech Using Multi-Layer Perceptron,' Proc. ICSLP 96, Vol.3, pp.1297-1300, 1996 https://doi.org/10.1109/ICSLP.1996.607850
  3. Bryan L. Pelion and John H. L. Hansen, 'Automatic Segmentation of Speech Recorded in Unknown Noisy Channel Characteristics,' Duke Univ., Technical Report RSPL-98-9, 1998
  4. Bryan L. Pelion and John H. L. Hansen, 'Automatic Segmentation and Labeling of Speech using the Duke University Speech Time-Aligner,' Duke Univ., Technical Report RSPL-96-22, 1996
  5. Brian Eberman and William Goldenthal, 'Time-Based Clustering for Phonetic Segmentation,' Proc. ICSLP 96, Vol.2, pp.1225-1228, 1996 https://doi.org/10.1109/ICSLP.1996.607829
  6. Regine Andre-Obrecht, 'Automatic Segmentation of Continuous Speech Signals,' Proc. ICASSP 86, pp.2275-2278, 1986
  7. Regine Andre-Obrecht, 'A New Statistical Approach for the Automatic Segmentation of Continuous Speech Signals,' IEEE Trans. ASSP, Vol. 36, No.1, pp.29-40, January, 1988 https://doi.org/10.1109/29.1486
  8. Sharlene A. Liu, 'Landmark detection for distinctive feature-based speech recognition,' J. Acoust. Soc. Am., Vol.100, No.5, pp.3417-3430, November, 1996 https://doi.org/10.1121/1.416983