DOI QR코드

DOI QR Code

Nonlinear Speech Enhancement Method for Reducing the Amount of Speech Distortion According to Speech Statistics Model

음성 통계 모형에 따른 음성 왜곡량 감소를 위한 비선형 음성강조법

  • Choi, Jae-Seung (Division of Smart Electrical and Electronic Engineering, Silla University)
  • 최재승 (신라대학교 스마트전기전자공학부)
  • Received : 2021.04.28
  • Accepted : 2021.06.17
  • Published : 2021.06.30

Abstract

A robust speech recognition technology is required that does not degrade the performance of speech recognition and the quality of the speech when speech recognition is performed in an actual environment of the speech mixed with noise. With the development of such speech recognition technology, it is necessary to develop an application that achieves stable and high speech recognition rate even in a noisy environment similar to the human speech spectrum. Therefore, this paper proposes a speech enhancement algorithm that processes a noise suppression based on the MMSA-STSA estimation algorithm, which is a short-time spectral amplitude method based on the error of the least mean square. This algorithm is an effective nonlinear speech enhancement algorithm based on a single channel input and has high noise suppression performance. Moreover this algorithm is a technique that reduces the amount of distortion of the speech based on the statistical model of the speech. In this experiment, in order to verify the effectiveness of the MMSA-STSA estimation algorithm, the effectiveness of the proposed algorithm is verified by comparing the input speech waveform and the output speech waveform.

잡음이 존재하는 실제 환경에서 음성인식을 실시하는 경우에 음성인식의 성능 열화 및 음성의 품질이 저화되지 않는 강건한 음성인식 기술이 필요하다. 이러한 음성인식 기술을 개발함으로써 사람의 음성 스펙트럼과 유사한 잡음 환경에서도 안정되고 높은 음성인식률이 실현되는 어플리케이션이 요구된다. 따라서 본 논문에서는 최소 평균 제곱의 오차를 기반으로 한 단시간 스펙트럼 진폭 방법인 MMSA-STSA 추정 알고리즘에 기초한 잡음억압을 처리하는 음성강조 알고리즘을 제안한다. 이 알고리즘은 단일 채널 입력에 기초한 효과적인 비선형 음성강조 알고리즘이며, 높은 잡음억제 성능을 가지고 있으며 음성의 통계적인 모델에 기초하여 음성의 왜곡량을 줄이는 기법이다. 본 실험에서는 MMSA-STSA 추정 알고리즘의 유효성을 확인하기 위하여 입력 음성파형과 출력 음성파형을 비교하여 제안한 알고리즘의 효과를 확인한다.

Keywords

References

  1. J. H. L. Hansen and M. A. Clements, "Constrained Iterative Speech Enhancement with Application to Speech Recognition," IEEE Transactions on Signal Processing, vol. 39, no. 4, Apr. 1991, pp. 795-805. https://doi.org/10.1109/78.80901
  2. H. Lee, "Acoustic Feedback and Noise Cancellation of Hearing Aids by Deep Learning Algorithm," J. of the Korea Institute of Electronic Communication Sciences, vol. 14, no. 6, Dec. 2019, pp. 1249-1256. https://doi.org/10.13067/JKIECS.2019.14.6.1249
  3. J. Choi, "Independent Component Analysis based on Frequency Domain Approach Model for Speech Source Signal Extraction," J. of the Korea Institute of Electronic Communication Sciences, vol. 15, no. 5, Oct. 2020, pp. 807-812. https://doi.org/10.13067/JKIECS.2020.15.5.807
  4. C. Lee, "Dimensionality Reduction in Speech Recognition by Principal Component Analysis," J. of the Korea Institute of Electronic Communication Sciences, vol. 8, no. 9, Sept. 2013, pp. 1299-1305. https://doi.org/10.13067/JKIECS.2013.8.9.1299
  5. M. S. Kavalekalam, M. G. Christensen, F. Gran, and J. B. Boldt, "Kalman filter for speech enhancement in cocktail party scenarios using a codebook-based approach," 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, Mar. 2016, pp. 191-195.
  6. S. F. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. on Acoustic Speech Signal Processing, vol. ASSP-27, no. 2, Apr. 1979, pp. 113-120. https://doi.org/10.1109/TASSP.1979.1163209
  7. H. Gustafsson, S. Nordholm, and I. Claesson, "Spectral subtraction with adaptive averaging of the gain function," 6th European Conference on Speech Communication and Technology(Eurospeech'99), Budapest, Hungary, Sept. 1999, pp. 2599-2602.
  8. Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean square error short-time spectral amplitude estimator," IEEE Trans. on Speech and Audio Processing, vol. ASSP-32, no. 6, Dec. 1984, pp. 1109-1121. https://doi.org/10.1109/TASSP.1984.1164453
  9. J. Lim and A. V. Oppenheim, "All-pole modeling of degraded speech," IEEE Trans. ASSP, vol. 26, no. 3, 1978, pp. 197-210.
  10. X. Dang and T. Nakai, "Noise Reduction using Modified Phase Spectra and Wiener Filter," 2011 IEEE International Workshop on Machine Learning for Signal Processing, Sept. 2011, pp. 1-5.
  11. F. Asano, S. Ikeda, M. Ogawa, H. Asoh, and N. Kitawaki, "Combined approach of array processing and independent component analysis for blind separation of acoustic signals," IEEE Trans. on Speech and Audio Processing, vol. 11, no. 3, May 2003, pp. 204-215. https://doi.org/10.1109/TSA.2003.809191
  12. J. Choi, "An Adaptive Speech Enhancement System Based on Noise Level Estimation and Lateral Inhibition," ACTA Acustica United with Acustica, vol. 93, no. 4, 2007, pp. 632-644.
  13. H. Hirsch and D. Pearce, "The Aurora experimental framework for the performance evaluation of speech recognition system under noisy conditions," Proc. ISCA ITRW Workshop on Automatic Speech Recognition, Paris, France, 2000.