DOI QR코드

DOI QR Code

GMM-Based Maghreb Dialect Identification System

  • Received : 2014.01.14
  • Accepted : 2014.09.01
  • Published : 2015.03.31

Abstract

While Modern Standard Arabic is the formal spoken and written language of the Arab world; dialects are the major communication mode for everyday life. Therefore, identifying a speaker's dialect is critical in the Arabic-speaking world for speech processing tasks, such as automatic speech recognition or identification. In this paper, we examine two approaches that reduce the Universal Background Model (UBM) in the automatic dialect identification system across the five following Arabic Maghreb dialects: Moroccan, Tunisian, and 3 dialects of the western (Oranian), central (Algiersian), and eastern (Constantinian) regions of Algeria. We applied our approaches to the Maghreb dialect detection domain that contains a collection of 10-second utterances and we compared the performance precision gained against the dialect samples from a baseline GMM-UBM system and the ones from our own improved GMM-UBM system that uses a Reduced UBM algorithm. Our experiments show that our approaches significantly improve identification performance over purely acoustic features with an identification rate of 80.49%.

Keywords

References

  1. K. Kirchhoff and D. Vergyri, "Cross-dialectal acoustic data sharing for Arabic speech recognition," in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'04), Montreal, 2004, pp. 765-768.
  2. D. Vergyri, K. Kirchhoff, V. R. R. Gadde, A. Stolcke, and J. Zheng, "Development of a conversational telephone speech recognizer for Levantine Arabic," in Proceedings of Interspeech, Lisbon, Portugal, 2005, pp. 1613-1616.
  3. L. Nour-Eddine and A. Abdelkader, "Reduced universal background model for speech recognition and identification system," in Pattern Recognition. Heidelberg: Springer, 2012, pp. 303-312.
  4. J. A. Bilmes, "A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models," International Computer Science Institute, Berkeley, CA, TR-97-021, 1998.
  5. P. A. Torres-Carrasquillo, E. Singer, M. A. Kohler, R. J. Greene, D. A. Reynolds, and J. R. Deller Jr, "Approaches to language identification using Gaussian mixture models and shifted delta cepstral features," in Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP), Denver, CO, 2002.
  6. P. A. Torres-Carrasquillo, T. P. Gleason, and D. A. Reynolds, "Dialect identification using Gaussian mixture models," in Proceedings of the Speaker and Language Recognition Workshop (ODYSSEY), Toledo, Spain, 2004.
  7. E. Wong and S. Sridharan, "Methods to improve Gaussian mixture model based language identification system," in Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP), Denver, CO, 2002.
  8. J. L. Gauvain and C. H. Lee, "Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains," IEEE Transactions on Speech and Audio Processing, vol. 2, no. 2, pp. 291-298, 1994. https://doi.org/10.1109/89.279278
  9. D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, "Speaker verification using adapted Gaussian mixture models," Digital Signal Processing, vol. 10, no. 1, pp. 19-41, 2000. https://doi.org/10.1006/dspr.1999.0361
  10. V. N. Vapnik, The Nature of Statistical Learning Theory. New York: Springer, 1995.
  11. I. W. Tsang, J. T. Kwok, and P. M. Cheung, "Core vector machines: fast SVM training on very large data sets," Journal of Machine Learning Research, vol. 6, pp. 363-392, 2005.
  12. M. Badoiu and K. L. Clarkson, "Optimal core-sets for balls," Computational Geometry, vol. 40, no. 1, pp. 14-22, 2008. https://doi.org/10.1016/j.comgeo.2007.04.002
  13. L. Nour-Eddine and A. Abdelkader, "Multi-class support vector machines methodology," in Proceedings of the 1st International Congress on Models, Optimization, and Security of Systems (ICMOSS), Algeria, 2010.
  14. C. W. Hsu and C. J. Lin, "A comparison of methods for multiclass support vector machines," IEEE Transactions on Neural Networks, vol. 13, no. 2, pp. 415-425, 2002. https://doi.org/10.1109/72.991427
  15. S. Asharaf, M. N. Murty, and S. K. Shevade, "Multiclass core vector machine," in Proceedings of the 24th International Conference on Machine Learning (ICML), Corvallis, OR, 2007, pp. 41-48.
  16. S. Szedmak and J. Shawe-Taylor, "Multiclass learning at one-class complexity," School of Electronics and Computer Science, University of Southampton, UK, 2005.
  17. J. C. Dunn, "A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters," Journal of Cybernetics, vol. 3, no. 3, pp. 32-57, 1973. https://doi.org/10.1080/01969727308546046
  18. J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms. New York: Plenum Press, 1981.
  19. J. McLaughlin, D. A. Reynolds, and T. P. Gleason, "A study of computation speed-UPS of the GMM-UBM speaker recognition system," in Proceedings of the 6th European Conference on Speech Communication and Technology (EUROSPEECH), Budapest, Hungary, 1999, pp. 1215-1218.
  20. K. S. R. Murty and B. Yegnanarayana, "Epoch extraction from speech signals," IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 8, pp. 1602-1613, 2008. https://doi.org/10.1109/TASL.2008.2004526

Cited by

  1. Compensation of audio data with a high frequency components for realistic media FTV vol.76, pp.9, 2017, https://doi.org/10.1007/s11042-016-3713-7