• Title/Summary/Keyword: GMM-UBM

Search Result 20, Processing Time 0.019 seconds

Scream Sound Detection Based on Universal Background Model Under Various Sound Environments (다양한 소리 환경에서 UBM 기반의 비명 소리 검출)

  • Chung, Yong-Joo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.12 no.3
    • /
    • pp.485-492
    • /
    • 2017
  • GMM has been one of the most popular methods for scream sound detection. In the conventional GMM, the whole training data is divided into scream sound and non-scream sound, and the GMM is trained for each of them in the training process. Motivated by the idea that the process of scream sound detection is very similar to that of speaker recognition, the UBM which has been used quite successfully in speaker recognition, is proposed for use in scream sound detection in this study. We could find that UBM shows better performance than the traditional GMM from the experimental results.

GMM-Based Maghreb Dialect Identification System

  • Nour-Eddine, Lachachi;Abdelkader, Adla
    • Journal of Information Processing Systems
    • /
    • v.11 no.1
    • /
    • pp.22-38
    • /
    • 2015
  • While Modern Standard Arabic is the formal spoken and written language of the Arab world; dialects are the major communication mode for everyday life. Therefore, identifying a speaker's dialect is critical in the Arabic-speaking world for speech processing tasks, such as automatic speech recognition or identification. In this paper, we examine two approaches that reduce the Universal Background Model (UBM) in the automatic dialect identification system across the five following Arabic Maghreb dialects: Moroccan, Tunisian, and 3 dialects of the western (Oranian), central (Algiersian), and eastern (Constantinian) regions of Algeria. We applied our approaches to the Maghreb dialect detection domain that contains a collection of 10-second utterances and we compared the performance precision gained against the dialect samples from a baseline GMM-UBM system and the ones from our own improved GMM-UBM system that uses a Reduced UBM algorithm. Our experiments show that our approaches significantly improve identification performance over purely acoustic features with an identification rate of 80.49%.

The Study on the Verification of Speaker Change using GMM-UBM based KL distance (GMM-UBM 기반 KL 거리를 활용한 화자변화 검증에 대한 연구)

  • Cho, Joon-Beom;Lee, Ji-eun;Lee, Kyong-Rok
    • Journal of Convergence Society for SMB
    • /
    • v.6 no.4
    • /
    • pp.71-77
    • /
    • 2016
  • In this paper, we proposed a verification of speaker change utilizing the KL distance based on GMM-UBM to improve the performance of conventional BIC based Speaker Change Detection(SCD). We have verified Conventional BIC-based SCD using KL-distance based SCD which is robust against difference of information volume than BIC-based SCD. And we have applied GMM-UBM to compensate asymmetric information volume. Conventional BIC-based SCD was composed of two steps. Step 1, to detect the Speaker Change Candidate Point(SCCP). SCCP is positive local maximum point of dissimilarity d. Step 2, to determine the Speaker Change Point(SCP). If ${\Delta}BIC$ of SCCP is positive, it decides to SCP. We examined verification of SCP using GMM-UBM based KL distance D. If the value of D on each SCP is higher than threshold, we accepted that point to the final SCP. In the experimental condition MDR(Missed Detection Rate) is 0, FAR(False Alarm Rate) when the threshold value of 0.028 has been improved to 60.7%.

Fast Speaker Identification Using a Universal Background Model Clustering Method (Universal Background Model 클러스터링 방법을 이용한 고속 화자식별)

  • Park, Jumin;Suh, Youngjoo;Kim, Hoirin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.33 no.3
    • /
    • pp.216-224
    • /
    • 2014
  • In this paper, we propose a new method to drastically reduce computational complexity in Gaussian Mixture Model (GMM)-based Speaker Identification (SI). Generally, GMM-based SI systems have very high computational complexity proportional to the length of the test utterance, the number of enrolled speakers, and the GMM size. These make the SI systems difficult to be used in various real applications in spite of their broad applicability. Thus, a trade-off between computational complexity and identification accuracy is considered as a primary issue for practical applications. In order to reduce computational complexity sharply with a little loss of accuracy, we introduce a method based on the Universal Background Model (UBM) clustering approach and then we show that it can be used successfully in real-time applications. In experiments with the proposed algorithm, we obtained a speed-up factor of 6 with a negligible loss of accuracy.

A Study on SVM-Based Speaker Classification Using GMM-supervector (GMM-supervector를 사용한 SVM 기반 화자분류에 대한 연구)

  • Lee, Kyong-Rok
    • Journal of IKEEE
    • /
    • v.24 no.4
    • /
    • pp.1022-1027
    • /
    • 2020
  • In this paper, SVM-based speaker classification is experimented with GMM-supervector. To create a speaker cluster, conventional speaker change detection is performed with the KL distance using the SNR-based weighting function. SVM-based speaker classification consists of two steps. In the first step, SVM-based classification between UBM and speaker models is performed, speaker information is indexed in each cluster, and then grouped by speaker. In the second step, the SVM-based classification between UBM and speaker models is performed by inputting the speaker cluster group. Linear and RBF are applied as kernel functions for SVM-based classification. As a result, in the first step, the case of applying the linear kernel showed better performance than RBF with 148 speaker clusters, MDR 0, FAR 47.3, and ER 50.7. The second step experiment result also showed the best performance with 109 speaker clusters, MDR 1.3, FAR 28.4, and ER 32.1 when the linear kernel was applied.

Research of Hybrid GMM/SVM Approach for Speaker Verification (화자 확인을 위한 하이브리드 GMM/SVM 방식에 대한 연구)

  • Yoon, You-Sun
    • Annual Conference of KIPS
    • /
    • 2008.05a
    • /
    • pp.139-140
    • /
    • 2008
  • 문장 독립 화자 확인에서 SVM을 위한 적응된 GMM을 바탕으로 특징을 추출함으로써 GMM과 SVM 사이의 새로운 접근 방식을 제안한다. 우수한 측정성으로 인해, 적응된 GMM은 SVM 화자 확인을 위한 대규모의 음성 데이터로부터 적은 양의, 전형적인 특징 벡터를 추출해오곤 했다. 이 새로운 접근방식을 사용함으로써, 제안된 화자 확인 시스템은 기존의 GMM-UBM 시스템보다 훨씬 나은 성능을 보였다.

Speaker Verification with the Constraint of Limited Data

  • Kumari, Thyamagondlu Renukamurthy Jayanthi;Jayanna, Haradagere Siddaramaiah
    • Journal of Information Processing Systems
    • /
    • v.14 no.4
    • /
    • pp.807-823
    • /
    • 2018
  • Speaker verification system performance depends on the utterance of each speaker. To verify the speaker, important information has to be captured from the utterance. Nowadays under the constraints of limited data, speaker verification has become a challenging task. The testing and training data are in terms of few seconds in limited data. The feature vectors extracted from single frame size and rate (SFSR) analysis is not sufficient for training and testing speakers in speaker verification. This leads to poor speaker modeling during training and may not provide good decision during testing. The problem is to be resolved by increasing feature vectors of training and testing data to the same duration. For that we are using multiple frame size (MFS), multiple frame rate (MFR), and multiple frame size and rate (MFSR) analysis techniques for speaker verification under limited data condition. These analysis techniques relatively extract more feature vector during training and testing and develop improved modeling and testing for limited data. To demonstrate this we have used mel-frequency cepstral coefficients (MFCC) and linear prediction cepstral coefficients (LPCC) as feature. Gaussian mixture model (GMM) and GMM-universal background model (GMM-UBM) are used for modeling the speaker. The database used is NIST-2003. The experimental results indicate that, improved performance of MFS, MFR, and MFSR analysis radically better compared with SFSR analysis. The experimental results show that LPCC based MFSR analysis perform better compared to other analysis techniques and feature extraction techniques.

SVM Based Speaker Verification Using Sparse Maximum A Posteriori Adaptation

  • Kim, Younggwan;Roh, Jaeyoung;Kim, Hoirin
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.2 no.5
    • /
    • pp.277-281
    • /
    • 2013
  • Modern speaker verification systems based on support vector machines (SVMs) use Gaussian mixture model (GMM) supervectors as their input feature vectors, and the maximum a posteriori (MAP) adaptation is a conventional method for generating speaker-dependent GMMs by adapting a universal background model (UBM). MAP adaptation requires the appropriate amount of input utterance due to the number of model parameters to be estimated. On the other hand, with limited utterances, unreliable MAP adaptation can be performed, which causes adaptation noise even though the Bayesian priors used in the MAP adaptation smooth the movements between the UBM and speaker dependent GMMs. This paper proposes a sparse MAP adaptation method, which is known to perform well in the automatic speech recognition area. By introducing sparse MAP adaptation to the GMM-SVM-based speaker verification system, the adaptation noise can be mitigated effectively. The proposed method utilizes the L0 norm as a regularizer to induce sparsity. The experimental results on the TIMIT database showed that the sparse MAP-based GMM-SVM speaker verification system yields a 42.6% relative reduction in the equal error rate with few additional computations.

  • PDF

Forensic Automatic Speaker Identification System for Korean Speakers (과학수사를 위한 한국인 음성 특화 자동화자식별시스템)

  • Kim, Kyung-Wha;So, Byung-Min;Yu, Ha-Jin
    • Phonetics and Speech Sciences
    • /
    • v.4 no.3
    • /
    • pp.95-101
    • /
    • 2012
  • In this paper, we introduce the automatic speaker identification system 'SPO(Supreme Prosecutors Office) Verifier'. SPO Verifier is a GMM(Gaussian mixture model)-UBM(universal background model) based automatic speaker recognition system and has been developed using Korean speakers' utterances. This system uses a channel compensation algorithm to compensate recording device characteristics. The system can give the users the ability to manage reference models with utterances from various environments to get more accurate recognition results. To evaluate the performance of SPO Verifier on Korean speakers, we compared this system with one of the most widely used commercial systems in the forensic field. The results showed that SPO Verifier shows lower EER(equal error rate) than that of the commercial system.

Performance Improvement of a Text-Independent Speaker Identification System Using MCE Training (MCE 학습 알고리즘을 이용한 문장독립형 화자식별의 성능 개선)

  • Kim Tae-Jin;Choi Jae-Gil;Kwon Chul-Hong
    • MALSORI
    • /
    • no.57
    • /
    • pp.165-174
    • /
    • 2006
  • In this paper we use a training algorithm, MCE (Minimum Classification Error), to improve the performance of a text-independent speaker identification system. The MCE training scheme takes account of possible competing speaker hypotheses and tries to reduce the probability of incorrect hypotheses. Experiments performed on a small set speaker identification task show that the discriminant training method using MCE can reduce identification errors by up to 54% over a baseline system trained using Bayesian adaptation to derive GMM (Gaussian Mixture Models) speaker models from a UBM (Universal Background Model).

  • PDF