DOI QR코드

DOI QR Code

Improvement of Speech Intelligibility in Noisy Environments

잡음 환경에서의 음성 명료도 향상 기술

  • Published : 2009.01.31

Abstract

In speech communications in noisy environments, speech intelligibility is seriously degraded due to the masking effect of ambient noise. In this paper, a new method to improve speech intelligibility in noisy environments is proposed. Based on the perception theory that the temporal envelope plays a major role in determining intelligibility, the proposed method uses a novel operation that enhances the fluctuation of band-wise temporal envelope and also contains pitch enhancement for improving speech naturalness. In addition, a new subjective evaluation scheme employing binaural listening is proposed in order to measure more reliable performance. The subjective performance measured with the proposed scheme shows that the proposed method improves both intelligibility and naturalness in various environments, whereas a function parameter can control the performance trade-off between intelligibility and naturalness.

주변 잡음이 심한 환경의 음성 통신에서 음성 명료도는 주변 잡음의 마스킹 효과로 인하여 크게 저하된다. 본 논문에서는 잡음 환경에서 음성 명료도를 향상시켜 통화 품질을 높이는 새로운 방법을 제안한다. 청각 이론에 의하면 음성의 시간축포락선은 명료도 결정에 중요한 역할을 한다. 이에 따라 본 논문에서는 대역별 시간축 포락선의 변화를 강화하여 명료도를 향상시키는 방법을 사용하며, 음질을 추가로 향상시키기 위한 피치 강화동작을 포함한다. 또한, 실제 통화상황에서의 정확한 주관적 성능 평가를 위하여 양 귀를 이용하는 새로운 주관적 성능 평가 방법을 제안한다. 제안하는 평가 방식을 통하여 제안하는 명료도 향상 기술의 성능을 평가하였으며, 명료도와 음질이 모두 향상되는 것을 확인하였고, 동작 파라미터 조정을 통하여 명료도와 음질 사이의 상호 관계가 조정되는 것을 확인하였다.

Keywords

References

  1. B. Sauert and P. Vary, "Near end listening enhancement : speech intelligibility improvement in noisy environments," ICASSP 2006, pp.493-496, 2006 https://doi.org/10.1109/ICASSP.2006.1660065
  2. J. Shin and N. Kim, "Perceptual reinforcement of speech signal based on partial specific loudness," IEEE Signal Pro-cessing Letters, 14(11), 2007 https://doi.org/10.1109/LSP.2007.900222
  3. P. Shankar and S. Park, "Speech intelligibility enhancement using tunable equalization filter," ICASSP2007, pp.613-616, 2007 https://doi.org/10.1109/ICASSP.2007.366987
  4. B. C. J. Moore, an introduction to the psychology of hearing, 4th Ed., Academic Press, 1996
  5. R. Drullman, J. Festen and R. Plomp, "Effect of temporal envelope smearing on speech reception," J. Acoustical Society of America, 95(2), Feb., 1994 https://doi.org/10.1121/1.408467
  6. T. Houtgast and H. J. M. Steeneken, "A review of the MTF concept in room acoustics and its use for estimating speech intelligent in audiotoria," J. Acoustical Society of America, 77(3), Mar., 1985 https://doi.org/10.1121/1.392224
  7. TIA/EIA IS-127, "Enhanced Variable Rate Codec (EVRC), Speech Service Option 3 for Wideband Spread Spectrum Digital Systems," 1997
  8. 3GPP2 C.S0014-0, "Source-Controlled Variable-Rate Multi-mode Wideband Speech Codec (VMR-WB)," 2004
  9. R. Niederjohn and J. Grotelueschen, "The enhancement of speech intelligibility in high noise levels by high-pass filtering followed by rapid amplitude compression," IEEE Trans. ASSP, 24(4), 1976 https://doi.org/10.1109/TASSP.1976.1162824
  10. J. C. R. Licklider, "The Influence of Interaural Phse Re-lations upon the Masking of Speech by White Noise," The Journal of the Acoustical Society of America, 20(2), 1948 https://doi.org/10.1121/1.1906358
  11. W. D. Voiers, "Evaluating processed speech using the Dia-gnostic Rhyme Test (DRT)," Speech Technology, vol.1, 1983