DOI QR코드

DOI QR Code

음성 분리를 위한 스펙트로그램의 마루와 골을 이용한 시간-주파수 공간에서 소리 분할 기법

A Method of Sound Segmentation in Time-Frequency Domain Using Peaks and Valleys in Spectrogram for Speech Separation

  • 임성길 (경희대학교 컴퓨터공학과) ;
  • 이현수 (경희대학교 컴퓨터공학과)
  • 발행 : 2008.11.30

초록

본 논문에서는 스펙트로그램에서 마루와 골을 이용한 주파수 채널 분할 알고리즘을 제안한다. 주파수 채널 분할 문제는 동일한 음원으로부터 발생한 음성이 포함된 주파수 채널들을 하나의 그룹으로 묶는 것을 의미한다. 제안된 알고리즘은 입력 신호의 평탄화된 스펙트럼에 기반한 알고리즘이다. 평탄화된 스펙트럼에서 마루와 골은 각각 세그먼트의 중심과 경계를 판단하기 위해 사용된다. 각 세그먼트를 하나의 소리로 묶는 그룹핑 단계 이전에 제안된 알고리즘에 의한 세그멘테이션 결과가 유용함을 평가하기 위하여 이상적인 마스크에 의한 세그멘테이션 결과와 제안된 방법을 비교한다. 제안된 방법을 협대역 잡음, 광대역 잡음, 다른 음성신호와 혼합된 음성신호에 대하여 실험하였다.

In this paper, we propose an algorithm for the frequency channel segmentation using peaks and valleys in spectrogram. The frequency channel segments means that local groups of channels in frequency domain that could be arisen from the same sound source. The proposed algorithm is based on the smoothed spectrum of the input sound. Peaks and valleys in the smoothed spectrum are used to determine centers and boundaries of segments, respectively. To evaluate a suitableness of the proposed segmentation algorithm before that the grouping stage is applied, we compare the synthesized results using ideal mask with that of proposed algorithm. Simulations are performed with mixed speech signals with narrow band noises, wide band noises and other speech signals.

키워드

참고문헌

  1. Walsh, J.M., Kim, Y.M., Doll, T.M., "Joint Iterative Multi-Speaker Identification and Source Separation using Expectation Propagation", ASPAA 2007, 283-286, 2007
  2. Mohammed, U.S., Mahmmoud, M.F., "A Blind Signal Separation Technique using Combination of Second-Order and Higher-Order Approaches", ICICT 06, 1-2, 2007
  3. Abdollahpouri, M., Khaki-Sedigh, A., Khaloozadeh, H., "A New Method for Active Noise Cancellation in the Presence of Three Unknown Moving Sources", AICMS 08, 1006-1011, 2008
  4. Bregman, A. S., "Auditory Scene Analysis : The Perceptual Organization of Sound, MIT Press", (1991)
  5. Brown, G. J. and Wang, D. L. "The separation of speech from interfering sounds based on oscillatory correlation", Trans. on Neural Networks, 10, I. 3, pp.684-697, 1999 https://doi.org/10.1109/72.761727
  6. Jin, C., van Schaik, A., Carlile, S.,"The integration of acoustical cues during human sound localisation of band-pass filtered noise", ICONIP 1999, 2, 483-488, 1999
  7. Chan, C.F., Yu, E.W.M., "Improving pitch estimation for efficient multiband excitation coding of speech", Electronics Letters, 32, I. 10, pp.870-872, 1996 https://doi.org/10.1049/el:19960593
  8. Srinivasan, S.H. and Kankanhalli, M, "Harmonicity and dynamics based audio separation", ICASSP 03, 5, V-640-3, 2003
  9. Jen-Tzung Chien, Bo-Cheng Chen, "A new independent component analysis for speech recognition and separation", Trans. on Audio, Speech and Language Processing, V. 14, I. 4, pp.1245-1254, 2006 https://doi.org/10.1109/TSA.2005.858061
  10. Dan Ellis, 'Computational Auditory Scene Analysis', Talk Slids, http://www.ee.columbia.edu/~dpwe/talks/oldenburg-casa-2005-06.pdf, 2005
  11. Cook, M.P., "Modeling Auditory Processing and Organization", Cambridge Univ. Press, 1993
  12. Hu, G. and Wang, D.L., "Monaural speech segregation based on pitch tracking and amplitude modulation", Trans. on Neural Networks, V. 15, I 5, pp.1135-1150, 2004 https://doi.org/10.1109/TNN.2004.832812