Speaker Verification with the Constraint of Limited Data

Kumari, Thyamagondlu Renukamurthy Jayanthi;Jayanna, Haradagere Siddaramaiah;

doi:10.3745/JIPS.01.0030

Journal of Information Processing Systems

제14권4호
/
Pages.807-823
/
2018
/
1976-913X(pISSN)
/
2092-805X(eISSN)

한국정보처리학회 (Korea Information Processing Society)

DOI QR Code

Speaker Verification with the Constraint of Limited Data

Kumari, Thyamagondlu Renukamurthy Jayanthi (Dept. of Information Science and Engineering, Siddaganga Institute of Technology) ;
Jayanna, Haradagere Siddaramaiah (Dept. of Information Science and Engineering, Siddaganga Institute of Technology)

투고 : 2016.01.27
심사 : 2017.02.09
발행 : 2018.08.31

https://doi.org/10.3745/JIPS.01.0030 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

Speaker verification system performance depends on the utterance of each speaker. To verify the speaker, important information has to be captured from the utterance. Nowadays under the constraints of limited data, speaker verification has become a challenging task. The testing and training data are in terms of few seconds in limited data. The feature vectors extracted from single frame size and rate (SFSR) analysis is not sufficient for training and testing speakers in speaker verification. This leads to poor speaker modeling during training and may not provide good decision during testing. The problem is to be resolved by increasing feature vectors of training and testing data to the same duration. For that we are using multiple frame size (MFS), multiple frame rate (MFR), and multiple frame size and rate (MFSR) analysis techniques for speaker verification under limited data condition. These analysis techniques relatively extract more feature vector during training and testing and develop improved modeling and testing for limited data. To demonstrate this we have used mel-frequency cepstral coefficients (MFCC) and linear prediction cepstral coefficients (LPCC) as feature. Gaussian mixture model (GMM) and GMM-universal background model (GMM-UBM) are used for modeling the speaker. The database used is NIST-2003. The experimental results indicate that, improved performance of MFS, MFR, and MFSR analysis radically better compared with SFSR analysis. The experimental results show that LPCC based MFSR analysis perform better compared to other analysis techniques and feature extraction techniques.

키워드

참고문헌

A. K. Jain, A. Ross, and S. Prabhakar, "An introduction to biometric recognition," IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, no. 1, pp. 4-20, 2004. https://doi.org/10.1109/TCSVT.2004.839484
S. Dey, S. Barman, R. K. Bhukya, R. K. Das, B. C. Haris, S. R. M. Prasanna, and R. Sinha, "Speech biometric based attendance system," in Proceedings of 2014 Twentieth National Conference on Communications (NCC), Kanpur, India, 2014, pp. 1-6.
T. Kinnunen and H. Li, "An overview of text-independent speaker recognition: from features to supervectors," Speech Communication, vol. 52, no. 1, pp. 12-40, 2010. https://doi.org/10.1016/j.specom.2009.08.009
G. Pradhan and S. M. Prasanna, "Speaker verification under degraded condition: a perceptual study," International Journal of Speech Technology, vol. 14, no. 4, pp. 405-417, 2011. https://doi.org/10.1007/s10772-011-9120-6
A. E. Rosenberg, "Automatic speaker verification: a review" Proceedings of the IEEE, vol. 64, no. 4, pp. 475-487, 1976. https://doi.org/10.1109/PROC.1976.10156
A. Neustein and H. A. Patil, Forensic Speaker Recognition. Heidelberg: Springer, 2012.
H. S. Jayanna and S. M. Prasanna, "Analysis, feature extraction, modeling and testing techniques for speaker recognition," IETE Technical Review, vol. 26, no. 3, pp. 181-190, 2009. https://doi.org/10.4103/0256-4602.50702
H. S. Jayanna, "Limited data speaker recognition," Ph.D. dissertation, Indian Institute of Technology Guwahati, India, 2009.
D. Pati and S. M. Prasanna, "Subsegmental, segmental and suprasegmental processing of linear prediction residual for speaker information," International Journal of Speech Technology, vol. 14, no. 1, pp. 49-64, 2011. https://doi.org/10.1007/s10772-010-9087-8
L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition. Englewood Cliffs, NJ: Prentice Hall, 1993.
S. M. Prasanna, C. G. Gupta, and B. Yegnanarayana, "Extraction of speaker-specific excitation information from linear prediction residual of speech," Speech Communication, vol. 48, no. 10, pp. 1243-1261, 2006. https://doi.org/10.1016/j.specom.2006.06.002
B. Yegnanarayana, S. M. Prasanna, J. M. Zachariah, and C. S. Gupta, "Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system," IEEE Transactions on Speech and Audio Processing, vol. 13, no. 4, pp. 575-582, 2005. https://doi.org/10.1109/TSA.2005.848892
F. Farahani, P. G. Georgiou, and S. S. Narayanan, "Speaker identification using supra-segmental pitch pattern dynamics," in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, Canada, 2004, pp. 89-92.
A. V. Jadhav and R. V. Pawar, "Review of various approaches towards speech recognition," in Proceedings of 2012 International Conference on Biomedical Engineering (ICoBE), Penang, Malaysia, 2012, pp. 99-103.
H. S. Jayanna and S. M. Prasanna, "Multiple frame size and rate analysis for speaker recognition under limited data condition," IET Signal Processing, vol. 3, no. 3, pp. 189-204, 2009. https://doi.org/10.1049/iet-spr.2008.0211
G. L. Sarada, T. Nagarajan, and H. A. Murthy, "Multiple frame size and multiple frame rate feature extraction for speech recognition," in Proceedings of 2004 International Conference on Signal Processing and Communications, Bangalore, India, 2004, pp. 592-595.
K. Samudravijaya, "Variable frame size analysis for speech recognition," in Proceedings of the International Conference on Natural Language Processing, Hyderabad, India, 2004.
Q. Zhu and A. Alwan, "On the use of variable frame rate analysis in speech recognition," in Proceedings of 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, Istanbul, Turkey, 2000, pp. 1783-1786.
P. Le Cerf and D. Van Compernolle, "A new variable frame analysis method for speech recognition," IEEE Signal Processing Letters, vol. 1, no. 12, pp. 185-187, 1994. https://doi.org/10.1109/97.338746
R. Pawar and H. Kulkarni, "Analysis of FFSR, VFSR, MFSR techniques for feature extraction in speaker recognition: a review," International Journal of Computer Science, vol. 7, no. 4, pp. 26-31, 2010.
T. Nagarajan, "Implicit systems for spoken language identification," Ph.D. dissertation, Indian Institute of Technology Madras, India, 2004.
G. S. Ghadiyaram, N. H. Nagarajan, T. N. Thangavelu, and H. A. Murthy, "Automatic transcription of continuous speech using unsupervised and incremental training," in Proceedings of the 8th International Conference on Spoken Language Processing, Jeju Island, Korea, 2004.
National Institute of Standards and Technology, "The NIST Year 2003 speaker recognition evaluation plan," 2013 [Online]. Available: https://www.nist.gov/sites/default/files/documents/2017/09/26/2003-spkrec-evalplanv2.2.pdf
S. Nakagawa, L. Wang, and S. Ohtsuka, "Speaker identification and verification by combining MFCC and phase information," IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 4, pp. 1085-1095, 2012. https://doi.org/10.1109/TASL.2011.2172422
A. Salman, E. Muhammad, and K. Khurshid, "Speaker verification using boosted cepstral features with Gaussian distributions," in Proceedings of IEEE International Multitopic Conference, Lahore, Pakistan, 2007, pp. 1-5.
D. Pat and S. M. Prasanna, "Processing of linear prediction residual in spectral and cepstral domains for speaker information," International Journal of Speech Technology, vol. 18, no. 3, pp. 333-350, 2015. https://doi.org/10.1007/s10772-015-9273-9
W. C. Hsu, W. H. Lai, and W. P. Hong, "Usefulness of residual-based features in speaker verification and their combination way with linear prediction coefficients," in Proceedings of the 9th IEEE International Symposium on Multimedia Workshops, Beijing, China, 2007, pp. 246-251.
S. Furui, "Comparison of speaker recognition methods using statistical features and dynamic features," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 29, no. 3, pp. 342-350, 1981. https://doi.org/10.1109/TASSP.1981.1163605
V. Prakash and J. H. L. Hansen, "In-set/out-of-set speaker recognition under sparse enrollment," IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 7, pp. 2044-2052, 2007. https://doi.org/10.1109/TASL.2007.902058
T. Hasan and J. H. Hansen, "A study on universal background model training in speaker verification," IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 1890-1899, 2011. https://doi.org/10.1109/TASL.2010.2102753
N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, "Front-end factor analysis for speaker verification," IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 4, pp. 788-798, 2011. https://doi.org/10.1109/TASL.2010.2064307
E. Wong and S. Sridharan, "Comparison of linear prediction cepstrum coefficients and mel-frequency cepstrum coefficients for language identification," in Proceedings of 2001 International Symposium on Intelligent Multimedia, Video and Speech Processing, Hong Kong, China, 2001, pp. 95-98.

Journal of Information Processing Systems

Speaker Verification with the Constraint of Limited Data

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)