DOI QR코드

DOI QR Code

인지 모델과 웨이블릿 패킷 변환을 이용한 잡음 제거기 설계

Design of the Noise Suppressor Using the Perceptual Model and Wavelet Packet Transform

  • 발행 : 2006.10.31

초록

본 논문은 인지 모델과 웨이블릿 패킷 변환을 이용하여 단일 채널에서 유색잡음 또는 비정지적 성격의 잡음을 제거하는데 목적을 두고 있다. 이러한 잡음은 부대역을 나누어 접근해야하며, 잔여잡음과 음성의 왜곡으로 인한 문제를 해결하기 위해 웨이블릿 패킷 변환 후 웨이블릿 계수 문턱값을 적절히 개선해야 한다. 본 논문에서 부대역은 웨이블릿 패킷변환 후에 스케일과 임계대역을 매칭하여 설계하였으며, 웨이블릿 계수 문턱값은 세그멘탈 신호대잡음비 (seg_SNR)와 노이즈마스킹 임계값 (Noise Masking Threshold W)을 이용하여 적응적으로 계산했다. 결과적으로 TTA 표준인 EVRC 잡음 제거기와 유사한 성능을 가졌으며, 웨이블릿 변환 후 웨이블릿 계수에 Universal 문턱값을 적용하는 것보다 PESQ-MOS 값이 0.29 높았다. 인코딩과 디코딩 후 PESQ-MOS 값은 EVRC 잡음 제거기보다 0.23 정도 우수한 성능을 가졌다.

In this paper. we Propose the noise suppressor with the Perceptual model and wavelet packet transform. The objective is to enhance speech corrupted colored or non-stationary noise. If corrupted noise is colored. subband approach would be more efficient than whole band one. To avoid serious residual noise and speech distortion, we must adjust the Wavelet Coefficient Threshold (WCT). In this Paper. the subband is designed matching with the critical band and WCT is adapted noise masking threshold (NMT) and segmental signal to noise ratio (seg_SNR). Consequently. it has similar Performance with EVRC in PESQ-MOS. But it's better than wavelet packet transform using universal threshold about 0.289 in PESQ-MOS. The important thing is that it's more useful than EVRC in coded speech. In coded speech. PESQ-MOS is higher than EVRC about 0.23.

키워드

참고문헌

  1. S. Mallat, A Wavelet Tour of Signal Processing(A Harcout Science and Technology. Academic-Press, SanDiego, 1999)
  2. N. R. Chong, I. S. Burnett, J. F. Chicharo, 'A new waveform interpolation coding scheme based on pitch synchronous wavelet transform decomposition', IEEE Trans. speech Audio Process. 8 345-348, 2000 https://doi.org/10.1109/89.841216
  3. D. Sinha, A. H. Tewfik, 'Low bit rate transparent audio compression using adapted wavelets.', IEEE Trans. Signal Process. 41 3463-3479, 1993 https://doi.org/10.1109/78.258086
  4. J. D. Johnston, 'Transform coding of audio signal using perceptual noise criteria.', IEEE J. Select. Area Commun. 6, 314-323, 1988 https://doi.org/10.1109/49.608
  5. B. Carnero, A. Drygajlo, 'Perceptual speech coding and enhancement using frame-synchronized fast wavelet packet transform algorithm.', IEEE Trans. Signal Process. 47 1622 -1635, 1999
  6. J. Lukasiak, I. S. Burnett, 'Exploiting simultaneously masked linear prediction in a WI speech coder.', In: Int. Conf. on Acoustics, Speech and Signal Processing(ICASSP), 11 -13, Istanbul, June, 2000
  7. P. Srinivasan, L. Jamieson, 'High-quality audio compression using an adaptive wavelet packet decomposition and psychoacoustic modeling.', IEEE Trans. Signal Process. 46 1085 -1093, 1998 https://doi.org/10.1109/78.668558
  8. D. L. Donoho, I. M. Johnston, 'Ideal spatial adaptation by wavelet shrinkage.', Biometrika, 81 425-455, 1994 https://doi.org/10.1093/biomet/81.3.425
  9. D. L. Donoho, 'De-noising by soft- thresholding', IEEE Trans. Inform. Theory 41 613-627, 1995 https://doi.org/10.1109/18.382009
  10. H. Sheikhzadeh, H. R. Abutalebi, 'An improved waveletbased speech enhancement system.', In: Proc. European Conf. on Speech Communication and Technology, 1855 -1858, Aalbors, September, 2001
  11. M. Bahoura, J. Rouat, 'Wavelet speech enhancement based on the teaqer energy operator.', IEEE Signal Process. Lett. 8 10 -12. 2001 https://doi.org/10.1109/97.889636
  12. S. W. Chang, S. I. Jung. Y. H. Kwon, 'Adaptive Wavelet Based Speech Enhancement with Robust VAD in Non-stationary Noise Environment', The Journal of the Acoustical Society of Korea 22 161-166, 2003
  13. 윤석현, 유창동, '시간-주파수 영역에서 음성/잡음 우세 결정에 의한 새로운 잡음처리', 한국음향학회지 20 48-55, 2001
  14. S. Yoon, C. D. Yoo. 'Speech/Noise- dominant detection for speech enhancement.', In: Proc. European Conf. on Speech Communication and Technology, 1941-1944,2001
  15. C. T. Lu, H. C. Wang, 'Enhancement of single channel speech based on masking property and wavelet transform', Speech Commun. Mag. 41 409-427, 2003 https://doi.org/10.1016/S0167-6393(03)00011-6
  16. N. Virag, 'Single channel speech enhancement based on masking properties of the human auditory system', IEEE Trans. Signal Process. Mag. 7 126-137, 1999