DOI QR코드

DOI QR Code

Voiced-Unvoiced-Silence Detection Algorithm using Perceptron Neural Network

퍼셉트론 신경회로망을 사용한 유성음, 무성음, 묵음 구간의 검출 알고리즘

  • Received : 2011.02.08
  • Accepted : 2011.04.12
  • Published : 2011.04.30

Abstract

This paper proposes a detection algorithm for each section which detects the voiced section, unvoiced section, and the silence section at each frame using a multi-layer perceptron neural network. First, a power spectrum and FFT (fast Fourier transform) coefficients obtained by FFT are used as the input to the neural network for each frame, then the neural network is trained using these power spectrum and FFT coefficients. In this experiment, the performance of the proposed algorithm for detection of the voiced section, unvoiced section, and silence section was evaluated based on the detection rates using various speeches, which are degraded by white noise and used as the input data of the neural network. In this experiment, the detection rates were 92% or more for such speech and white noise when training data and evaluation data were the different.

본 논문에서는 다층 퍼셉트론 신경회로망을 사용하여 각 프레임에서의 유성음, 무성음, 그리고 묵음 구간을 검출하는 구간검출 알고리즘을 제안한다. 다층 퍼셉트론 신경회로망의 입력으로는 고속 푸리에변환에 의한 전력스펙트럼 및 고속 푸리에변환 계수가 사용되어 네트워크가 학습된다. 본 실험에서는 원 음성에 백색잡음이 중첩된 음성을 신경회로망에 입력함으로서 각 프레임에서의 유성음, 무성음, 묵음 구간의 검출성능 결과를 나타낸다. 본 실험에서는 신경회로망의 학습 데이터 및 평가 데이터가 다를 경우에도 이러한 음성 및 백색잡음에 대하여 92% 이상의 검출율을 구할 수 있었다.

Keywords

References

  1. L. Tan, P.C. Ching, L.W. Chan, "Recurrent neural networks for speech modeling and speech recognition", International Conference on Acoustics, Speech, and Signal Processing, vol.5, pp. 3319 - 3322, 1995.
  2. D.E. Rumelhart, G.E. Hinton, and R. J. Williams, "Learning representations by back-propagation errors", Nature, vol.323, pp. 533-536, 1986. https://doi.org/10.1038/323533a0
  3. T.T. Le, J.S. Mason and T. Kitamura, "Characteristics of multi-layer perceptron models in enhancing degraded speech", Proc. ICSLP-94, pp. 1611-1614, 1994.
  4. R.P. Lippmann, "An Introduction to Computing with Neural Nets", IEEE ASSP Magazine, vol.4, no.2, pp. 4-22, April 1987. https://doi.org/10.1109/MASSP.1987.1165593
  5. T. Hirahara and H. Iwamida, "Auditory spectrograms in HMM phoneme recognition", Proc. Int. Conf. Spoken Lang. Process., ICSLP-90, pp. 1381-1384, 1990.
  6. K. Yamamoto, F. Jabloun, K. Reinhard, A. Kawamura, "Robust Endpoint Detection for Speech Recognition Based on Discriminative Feature Extraction", IEEE International Conference on Acoustics, Speech and Signal Processing, vol.1, pp. I.805-I.808, 2006.
  7. W. Kun-Ching, T. Yi-Hsing, "Voice Activity Detection Algorithm with Low Signal-to-Noise Ratios Based on Spectrum Entropy", Second International Symposium on Universal Communication, pp.423-428, 2008.
  8. 최재승, "다층 퍼셉트론 신경회로망을 사용한 구간 검출 알고리즘", 한국해양정보통신학회 추계학술대회 논문집, 14권, 2호, pp. 274-277, 2010.
  9. H. Leung and V. Zue, "Some phonetic recognition experiments using artificial neural nets", ICASSP 88, pp. 422-425, 1988.
  10. H. Hirsch and D. Pearce, "The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions," in Proc. ISCA ITRWASR2000 on Automatic Speech Recognition: Challenges for the Next Millennium, Paris, France, 2000.
  11. R.G. Leonard, "A database for speaker independent digit recognition," IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.328-331, Mar 1984.