Browse > Article
http://dx.doi.org/10.13067/JKIECS.2011.6.2.237

Voiced-Unvoiced-Silence Detection Algorithm using Perceptron Neural Network  

Choi, Jae-Seung (신라대학교 전자공학과)
Publication Information
The Journal of the Korea institute of electronic communication sciences / v.6, no.2, 2011 , pp. 237-242 More about this Journal
Abstract
This paper proposes a detection algorithm for each section which detects the voiced section, unvoiced section, and the silence section at each frame using a multi-layer perceptron neural network. First, a power spectrum and FFT (fast Fourier transform) coefficients obtained by FFT are used as the input to the neural network for each frame, then the neural network is trained using these power spectrum and FFT coefficients. In this experiment, the performance of the proposed algorithm for detection of the voiced section, unvoiced section, and silence section was evaluated based on the detection rates using various speeches, which are degraded by white noise and used as the input data of the neural network. In this experiment, the detection rates were 92% or more for such speech and white noise when training data and evaluation data were the different.
Keywords
Detection algorithm; perceptron neural network; Fast Fourier transform; detection rate;
Citations & Related Records
연도 인용수 순위
  • Reference
1 T.T. Le, J.S. Mason and T. Kitamura, "Characteristics of multi-layer perceptron models in enhancing degraded speech", Proc. ICSLP-94, pp. 1611-1614, 1994.
2 L. Tan, P.C. Ching, L.W. Chan, "Recurrent neural networks for speech modeling and speech recognition", International Conference on Acoustics, Speech, and Signal Processing, vol.5, pp. 3319 - 3322, 1995.
3 D.E. Rumelhart, G.E. Hinton, and R. J. Williams, "Learning representations by back-propagation errors", Nature, vol.323, pp. 533-536, 1986.   DOI   ScienceOn
4 R.P. Lippmann, "An Introduction to Computing with Neural Nets", IEEE ASSP Magazine, vol.4, no.2, pp. 4-22, April 1987.   DOI
5 T. Hirahara and H. Iwamida, "Auditory spectrograms in HMM phoneme recognition", Proc. Int. Conf. Spoken Lang. Process., ICSLP-90, pp. 1381-1384, 1990.
6 K. Yamamoto, F. Jabloun, K. Reinhard, A. Kawamura, "Robust Endpoint Detection for Speech Recognition Based on Discriminative Feature Extraction", IEEE International Conference on Acoustics, Speech and Signal Processing, vol.1, pp. I.805-I.808, 2006.
7 W. Kun-Ching, T. Yi-Hsing, "Voice Activity Detection Algorithm with Low Signal-to-Noise Ratios Based on Spectrum Entropy", Second International Symposium on Universal Communication, pp.423-428, 2008.
8 최재승, "다층 퍼셉트론 신경회로망을 사용한 구간 검출 알고리즘", 한국해양정보통신학회 추계학술대회 논문집, 14권, 2호, pp. 274-277, 2010.
9 H. Leung and V. Zue, "Some phonetic recognition experiments using artificial neural nets", ICASSP 88, pp. 422-425, 1988.
10 H. Hirsch and D. Pearce, "The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions," in Proc. ISCA ITRWASR2000 on Automatic Speech Recognition: Challenges for the Next Millennium, Paris, France, 2000.
11 R.G. Leonard, "A database for speaker independent digit recognition," IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.328-331, Mar 1984.