스펙트럼 변이를 이용한 Soft Decision 기반의 음성향상 기법

Robust Speech Enhancement Based on Soft Decision Employing Spectral Deviation

  • 최재훈 (인하대학교 전자공학부) ;
  • 장준혁 (인하대학교 전자공학부) ;
  • 김남수 (서울대학교 전자컴퓨터공학부)
  • Choi, Jae-Hun (Dep. of Electronics Engineering, Inha University) ;
  • Chang, Joon-Hyuk (Dep. of Electronics Engineering, Inha University) ;
  • Kim, Nam-Soo (School of Electrical Engineering and Computer Science, Seoul National University)
  • 투고 : 2010.03.11
  • 발행 : 2010.09.25

초록

본 논문에서는 비정상적인 배경 잡음 환경에서 음성향상을 위한 신호의 스펙트럼 변이 (Spectral Deviation)을 적용한 Soft Decision 기반의 잡음전력 수정 기법을 제안한다. 기존의 Soft Decision 기반의 잡음전력 추정에 있어서 잡음신호의 정상성(Stationarity)을 가정한 스무딩 파라미터를 사용하여 잡음전력을 추정하고 갱신하였지만, 잡음신호의 주파수적인 특성이 상대적으로 빠르게 변하는 비정상적인 환경에서는 강인하지 못한 단점을 가지게 된다. 본 논문에서는 신호의 스펙트럼 변이를 추정하여 정상적인 잡음 환경과 비정상적인 잡음 환경에 따라 적응적으로 잡음전력을 추정하고 갱신하여 잡음신호에 의해 오염된 음성신호를 향상시킨다. 제안된 알고리즘은 다양한 배경 잡음 환경에서 객관적인 음질측정 방법인 ITU-T P.862 perceptual evaluation of speech quality (PESQ)에 의해서 평가되었으며, 기존의 Soft Decision 기반의 음성 향상 기법과 비교하여 보다 향상된 성능을 보여주었다.

In this paper, we propose a new approach to noise estimation incorporating spectral deviation with soft decision scheme to enhance the intelligibility of the degraded speech signal in non-stationary noisy environments. Since the conventional noise estimation technique based on soft decision scheme estimates and updates the noise power spectrum using a fixed smoothing parameter which was assumed in stationary noisy environments, it is difficult to obtain the robust estimates of noise power spectrum in non-stationary noisy environments that spectral characteristics of noise signal such as restaurant constantly change. In this paper, once we first classify the stationary noise and non-stationary noise environments based on the analysis of spectral deviation of noise signal, we adaptively estimate and update the noise power spectrum according to the classified noise types. The performances of the proposed algorithm are evaluated by ITU-T P. 862 perceptual evaluation of speech quality (PESQ) under various ambient noise environments and show better performances compared with the conventional method.

키워드

참고문헌

  1. Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator," IEEE Trans. Acous., Speech, Signal Process., vol. ASSP-32, no. 6, pp. 1109-1121, Dec. 1984.
  2. S. F. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. Acoust., Speech, Signal Process., ASSP-27 (2) 113-120, Apr. 1979.
  3. R. J. McAualy and M. L. Malpass, "Speech enhancement using a soft-decision noise suppression filter," IEEE Trans. Acoust., Speech, Signal Processing., ASSP-28, 137-145, Apr. 1980.
  4. R. Martin, "Spectral subtraction based on minimum statistics," in Proc. 7th EUSIPCO'94, Edinburgh, U.K., pp. 1182-1185, Sept. 1994.
  5. J. Sohn, W. Sung, "A voice activity detector employing soft decision based noise spectrum adaptation," in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing., pp. 365-368, 1998.
  6. Y. -S. Park, J. -H. Chang, "A probabilistic combination method of minimum statistics and soft decision for robust noise power estimation in speech enhancement," IEEE Signal Processing Letters, vol. 15, pp. 95-98, Jan. 2008. https://doi.org/10.1109/LSP.2007.910309
  7. I. Cohen, B. Berdugo, "Noise estimation by minima controlled recursive averaging for robust speech enhancement," IEEE Signal Processing Letters, vol. 9, no. 1, pp. 12-15, Jan. 2002. https://doi.org/10.1109/97.988717
  8. R. Martin, "Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics," IEEE Trans. On Speech and Audio Processing., 9 (5) pp. 504-512, July 2001. https://doi.org/10.1109/89.928915
  9. I. Cohen, "Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging," IEEE Trans. Speech Audio Process., vol. 11, no. 5, pp. 466-475, Sep. 2003. https://doi.org/10.1109/TSA.2003.811544
  10. N. S. Kim and J. H. Chang, `Spectral enhancement based on global soft decision," IEEE Signal Processing Letters, vol. 7, no. 5, pp. 108-110, May 2000. https://doi.org/10.1109/97.841154
  11. Perceptual Evaluation of Speech Quality (PESQ), an Objective Method for End-to-End Speech Quality Assessment of Narrow-Band Telephone Networks and Speech Codecs 2001, ITU-T P.862.
  12. TIA/EIA/IS-127, "Enhanced variable rate codec, speech service option 3 for wideband spread spectrum digital systems," 1996.