• Title/Summary/Keyword: harmonic-weighted energy

Search Result 2, Processing Time 0.015 seconds

Robust Feature Extraction for Voice Activity Detection in Nonstationary Noisy Environments (음성구간검출을 위한 비정상성 잡음에 강인한 특징 추출)

  • Hong, Jungpyo;Park, Sangjun;Jeong, Sangbae;Hahn, Minsoo
    • Phonetics and Speech Sciences
    • /
    • v.5 no.1
    • /
    • pp.11-16
    • /
    • 2013
  • This paper proposes robust feature extraction for accurate voice activity detection (VAD). VAD is one of the principal modules for speech signal processing such as speech codec, speech enhancement, and speech recognition. Noisy environments contain nonstationary noises causing the accuracy of the VAD to drastically decline because the fluctuation of features in the noise intervals results in increased false alarm rates. In this paper, in order to improve the VAD performance, harmonic-weighted energy is proposed. This feature extraction method focuses on voiced speech intervals and weighted harmonic-to-noise ratios to determine the amount of the harmonicity to frame energy. For performance evaluation, the receiver operating characteristic curves and equal error rate are measured.

Automatic Vowel Onset Point Detection Based on Auditory Frequency Response (청각 주파수 응답에 기반한 자동 모음 개시 지점 탐지)

  • Zang, Xian;Kim, Hag-Tae;Chong, Kil-To
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.13 no.1
    • /
    • pp.333-342
    • /
    • 2012
  • This paper presents a vowel onset point (VOP) detection method based on the human auditory system. This method maps the "perceptual" frequency scale, i.e. Mel scale onto a linear acoustic frequency, and then establishes a series of Triangular Mel-weighted Filter Bank simulate the function of band pass filtering in human ear. This nonlinear critical-band filter bank helps greatly reduce the data dimensionality, and eliminate the effect of harmonic waves to make the formants more prominent in the nonlinear spaced Mel spectrum. The sum of mel spectrum peaks energy is extracted as feature for each frame, and the instinct at which the energy amplitude starts rising sharply is detected as VOP, by convolving with Gabor window. For the single-word database which contains 12 vowels articulated with different kinds of consonants, the experimental results showed a good average detection rate of 72.73%, higher than other vowel detection methods based on short-time energy and zero-crossing rate.