Browse > Article
http://dx.doi.org/10.22156/CS4SMB.2017.7.6.159

The Study on Speaker Change Verification Using SNR based weighted KL distance  

Cho, Joon-Beom (Department of Nursing, Nambu University)
Lee, Ji-eun (Department of Living physical Training Special Study, Chunnam Techno University)
Lee, Kyong-Rok (Department of IT & Design, Nambu University)
Publication Information
Journal of Convergence for Information Technology / v.7, no.6, 2017 , pp. 159-166 More about this Journal
Abstract
In this paper, we have experimented to improve the verification performance of speaker change detection on broadcast news. It is to enhance the input noisy speech and to apply the KL distance $D_s$ using the SNR-based weighting function $w_m$. The basic experimental system is the verification system of speaker change using GMM-UBM based KL distance D(Experiment 0). Experiment 1 applies the input noisy speech enhancement using MMSE Log-STSA. Experiment 2 applies the new KL distance $D_s$ to the system of Experiment 1. Experiments were conducted under the condition of 0% MDR in order to prevent missing information of speaker change. The FAR of Experiment 0 was 71.5%. The FAR of Experiment 1 was 67.3%, which was 4.2% higher than that of Experiment 0. The FAR of experiment 2 was 60.7%, which was 10.8% higher than that of experiment 0.
Keywords
Speaker Change Detection; Kullback Leibler distance; Speech Enhancement; Minimum Mean Square Error Log-Spectral Amplitude Estimator; Signal to Noise Ratio;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Y. Ephraim & D. Malah. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(2), 443-445. DOI : 10.1109/icmcs.2014.6911142   DOI
2 K. Paliwal, B. Schwerin & K. Wojcicki. (2012). Speech enhancement using a minimum mean-square error short-time spectral modulation magnitude estimator. Speech Communication, 54(2), 282-305. DOI : 10.1016/j.specom.2011.09.003   DOI
3 J. B. Cha. (2017). Minimum Mean Square Error, Glossary of ICT. Ktword. www.ktword.co.kr
4 B. A. Soni & K. Vaghela. (2017). Spectral Subtraction and MMSE : A Hybrid Approach For Speech Enhancement. International Reaserch Journal of Engineering and Technology, 4(4), 2340-2343.
5 R. Gray, A. Buzo, A. Gray & Y. Matsuyama. (1980). Distortion measures for speech processing. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 367-376. DOI : 10.1109/TASSP.1980.1163421   DOI
6 I. S. Gradshteyn & Z. M. Ryzhik. (1980). Table of integrals, series, and products. New York : Academic Press.
7 T. Y. Wu, L. Lu, K. Chen & H. Zhang. (2003). Universal Background Models for Real-time Speaker Change Detection. In MMM (pp. 135-149). Russia : MMM.
8 J. P. Campbell. (1997). Speaker recognition : A tutorial. Proceedings of the IEEE, 85(9), 1437-1462. USA : IEEE. DOI : 10.1109/5.628714   DOI
9 P. C. Loizou. (2013). Speech enhancement : theory and practice. USA : CRC press.
10 V. O. Alan & C. Ve. George. (2010). CHAPTER 8 Estimation with Minimum Mean Square Error. MIT Open Course Ware. https://ocw.mit.edu
11 L. Lu & H. J. Zhang. (2002). Speaker change detection and tracking in real-time news broadcasting analysis. In Proceedings of the tenth ACM international conference on Multimedia (pp. 602-610). USA : ACM. DOI : 10.1145/641007.641127   DOI
12 J. B. Cho, J. E. Lee & K. R. Lee. (2016). The Study on the Verification of Speaker Change using GMM-UBM based KL distance. Journal of Convergence for Information Technology, 6(1), 71-77. DOI : 10.22156/cs4smb.2016.6.4.071   DOI
13 Y. Ephraim & D. Malah. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6), 1109-1121. DOI : 10.1109/tassp.1984.1164453   DOI
14 M. J. Alam1, P. Kenny1, P. Dumouchel & D. O'Shaughnessy. (2014). Noise Spectrum Estimation using Gaussian Mixture Model-based Speech Presence Probability for Robust Speech Recognition. INTERSPEECH 2014, 2759-2763. Singapore : INTERSPEECH.
15 J. S. Lim & A. V. Oppenheim. (1979). Enhancement and bandwidth compression of noisy speech. Proceedings of the IEEE, 67(12), 1586-1604. USA : IEEE. DOI : 10.21236/ada073139   DOI