DOI QR코드

DOI QR Code

Combination of Classifiers Decisions for Multilingual Speaker Identification

  • Nagaraja, B.G. (Dept. of E&CE, Jain Institute of Technology) ;
  • Jayanna, H.S. (Dept. of IS&E, Siddaganga Institute of Technology)
  • Received : 2014.01.27
  • Accepted : 2014.08.05
  • Published : 2017.08.31

Abstract

State-of-the-art speaker recognition systems may work better for the English language. However, if the same system is used for recognizing those who speak different languages, the systems may yield a poor performance. In this work, the decisions of a Gaussian mixture model-universal background model (GMM-UBM) and a learning vector quantization (LVQ) are combined to improve the recognition performance of a multilingual speaker identification system. The difference between these classifiers is in their modeling techniques. The former one is based on probabilistic approach and the latter one is based on the fine-tuning of neurons. Since the approaches are different, each modeling technique identifies different sets of speakers for the same database set. Therefore, the decisions of the classifiers may be used to improve the performance. In this study, multitaper mel-frequency cepstral coefficients (MFCCs) are used as the features and the monolingual and cross-lingual speaker identification studies are conducted using NIST-2003 and our own database. The experimental results show that the combined system improves the performance by nearly 10% compared with that of the individual classifier.

Keywords

References

  1. B. S. Atal, "Automatic recognition of speakers from their voices," Proceedings of the IEEE, vol. 64, no. 4, pp. 460-475, 1976. https://doi.org/10.1109/PROC.1976.10155
  2. D. Reynolds and R. C. Rose, "Robust text-independent speaker identification using Gaussian mixture speaker models," IEEE Transactions on Speech and Audio Processing, vol. 3, no. 1, pp. 72-83, 1995. https://doi.org/10.1109/89.365379
  3. P. H. Arjun, "Speaker recognition in indian languages: a feature based approach," Ph.D. dissertation, Indian Institute of Technology Kharagpur, India, 2005.
  4. M. Akbacak and J. H. Hansen, "Language normalization for bilingual speaker recognition systems," in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2007), Honolulu, HI, 2007, pp. 257-260.
  5. G. R. Doddington, M. A. Przybocki, A. F. Martin, and D. A. Reynolds, "The NIST speaker recognition evaluation-overview, methodology, systems, results, perspective," Speech Communication, vol. 31, no. 2, pp. 225-254, 2000. https://doi.org/10.1016/S0167-6393(99)00080-1
  6. D. J. Mashao and M. Skosan, "Combining classifier decisions for robust speaker identification," Pattern Recognition, vol. 39, no. 1, pp. 147-155, 2006. https://doi.org/10.1016/j.patcog.2005.08.004
  7. E. Kim, W. Kim, and Y. Lee, "Combination of multiple classifiers for the customer's purchase behavior prediction," Decision Support Systems, vol. 34, no. 2, pp. 167-175, 2003. https://doi.org/10.1016/S0167-9236(02)00079-9
  8. T. K. Ho, J. J. Hull, and S. N. Srihari, "Decision combination in multiple classifier systems," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 1, pp. 66-75, 1994. https://doi.org/10.1109/34.273716
  9. C. C. T. Chen, C. T. Chen, and C. K. Hou, "Speaker identification using hybrid Karhunen-Loeve transform and Gaussian mixture model approach," Pattern Recognition, vol. 37, no. 5, pp. 1073-1075, 2004. https://doi.org/10.1016/j.patcog.2003.08.013
  10. J. Kittler, M. Hatef, R. P. Duin, and J. Matas, "On combining classifiers," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 226-239, 1998. https://doi.org/10.1109/34.667881
  11. H. He and Y. Cao, "SSC: a classifier combination method based on signal strength," IEEE Transactions on Neural Networks and Learning Systems, vol. 23, no. 7, pp. 1100-1117, 2012. https://doi.org/10.1109/TNNLS.2012.2198227
  12. S. Z. Boujelbene, D. Ben AyedMezghani, and N. Ellouze, "Application of combining classifiers for textindependent speaker identification," in Proceedings of the 16th IEEE International Conference on Electronics, Circuits, and Systems (ICECS 2009), Yasmine Hammamet, 2009, pp. 723-726.
  13. V. Hautamaki, T. Kinnunen, F. Sedlák, K. A. Lee, B. Ma, and H. Li, "Sparse classifier fusion for speaker verification," IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 8, pp. 1622-1631, 2013. https://doi.org/10.1109/TASL.2013.2256895
  14. B. G. Nagaraja and H. S. Jayanna, "Multilingual speaker identification by combining evidence from LPR and multitaper MFCC," Journal of Intelligent Systems, vol. 22, no. 3, pp. 241-251, 2013.
  15. The NIST Year 2003 speaker recognition evaluation plan [Online]. Available: http://www.itl.nist.gov/iad/mig/ tests/sre/2003/2003-spkrec-evalplan-v2.2.pdf.
  16. T. Kinnunen, R. Saeidi, F. Sedlak, K. A. Lee, J. Sandberg, M. Hansson-Sandsten, and H. Li, "Low-variance multitaper MFCC features: a case study in robust speaker verification," IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 7, pp. 1990-2001, 2012. https://doi.org/10.1109/TASL.2012.2191960
  17. J. R. Deller, J. H L Hansen, and J. G. Proakis, Discrete-Time Processing of Speech Signals. New York, NY: Institute of Electrical and Electronics Engineers, 1993.
  18. K. S. Riedel and A. Sidorenko, "Minimum bias multiple taper spectral estimation," IEEE Transactions on Signal Processing, vol. 43, no. 1, pp. 188-195, 1995. https://doi.org/10.1109/78.365298
  19. D. J. Thomson, "Spectrum estimation and harmonic analysis," Proceedings of the IEEE, vol. 70, no. 9, pp. 1055-1096, 1982. https://doi.org/10.1109/PROC.1982.12433
  20. M. Hansson and G. Salomonsson, "A multiple window method for estimation of peaked spectra," IEEE Transactions on Signal Processing, vol. 45, no. 3, pp. 778-781, 1997. https://doi.org/10.1109/78.558503
  21. D. A. Reynolds, "Universal background models," in Encyclopedia of Biometrics. Heidelberg: Springer, 2009, pp. 1349-1352.
  22. H. S. Jayanna, "Limited data speaker recognition," Ph.D. dissertation, Indian Institute of Technology Guwahati, India, 2009.
  23. J. P. Campbell Jr, "Testing with the YOHO CD-ROM voice verification corpus," in Proceedings of 1995 International Conference on Acoustics, Speech, and Signal Processing (ICASSP'95), Detroit, MI, 1995, pp. 341-344.
  24. T. E. F. Filho, R. O. Messina, and E. F. Cabral Jr, "Learning vector quantization in text-independent automatic speaker recognition," in Proceedings of Vth Brazilian Symposium on Neural Networks, Belo Horizonte, Brazil, 1998, pp. 135-139.
  25. S. S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd ed. New York, NY: Prentice-Hall, 1999.
  26. G. Durou, "Multilingual text-independent speaker identification," in Proceedings of the Multi-Lingual Interoperability in Speech Technology (MIST) Workshop, Leusden, The Netherlands, 1999.