• Title/Summary/Keyword: Speaker Identification

Search Result 152, Processing Time 0.023 seconds

A Robust Speaker Identification Using Optimized Confidence and Modified HMM Decoder (최적화된 관측 신뢰도와 변형된 HMM 디코더를 이용한 잡음에 강인한 화자식별 시스템)

  • Tariquzzaman, Md.;Kim, Jin-Young;Na, Seung-Yu
    • MALSORI
    • /
    • no.64
    • /
    • pp.121-135
    • /
    • 2007
  • Speech signal is distorted by channel characteristics or additive noise and then the performances of speaker or speech recognition are severely degraded. To cope with the noise problem, we propose a modified HMM decoder algorithm using SNR-based observation confidence, which was successfully applied for GMM in speaker identification task. The modification is done by weighting observation probabilities with reliability values obtained from SNR. Also, we apply PSO (particle swarm optimization) method to the confidence function for maximizing the speaker identification performance. To evaluate our proposed method, we used the ETRI database for speaker recognition. The experimental results showed that the performance was definitely enhanced with the modified HMM decoder algorithm.

  • PDF

Speaker Variation in Number Production by Males (남성의 숫자음 발성에 나타난 화자변이)

  • Yang, Byung-Gon
    • Speech Sciences
    • /
    • v.8 no.3
    • /
    • pp.93-104
    • /
    • 2001
  • The author analyzed acoustic parameters of ten Korean numbers produced by ten male students using Praat. Variations of f0, F1, F2 and F3 within and between speakers were examined by determining an average and standard deviation of the parameters of each number and by comparing the acoustic values with one another. Results showed that each subject produced the numbers within a certain range of variation across time. Thus, speaker identification can be more certain using dynamic information of the acoustic parameters within each vocalic segment. Also, percent difference of within-subjects' variation to that of between-subjects can be utilized to determine which sounds would be better stimuli for speaker identification. According to the criteria, the number '2' proved the best stimulus while the number '7' was the worst. Future studies will be necessary to explore robust methods of speaker identification under noisy conditions.

  • PDF

Speaker Identification Based on Vowel Classification and Vector Quantization (모음 인식과 벡터 양자화를 이용한 화자 인식)

  • Lim, Chang-Heon;Lee, Hwang-Soo;Un, Chong-Kwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.8 no.4
    • /
    • pp.65-73
    • /
    • 1989
  • In this paper, we propose a text-independent speaker identification algorithm based on VQ(vector quantization) and vowel classification, and its performance is studied and compared with that of a conventional speaker identification algorithm using VQ. The proposed speaker identification algorithm is composed of three processes: vowel segmentation, vowel recognition and average distortion calculation. The vowel segmentation is performed automatlcally using RMS energy, BTR(Back-to-Total cavity volume Ratio)and SFBR(Signed Front-to-Back maximum area Ratio) extracted from input speech signal. If the Input speech signal Is noisy, particularity when the SNR is around 20dB, the proposed speaker identification algorithm performs better than the reference speaker identification algorithm when the correct vowel segmentation is done. The same result is obtained when we use the noisy telephone speech signal as an input, too.

  • PDF

Speaker Identification Using GMM Based on Local Fuzzy PCA (국부 퍼지 클러스터링 PCA를 갖는 GMM을 이용한 화자 식별)

  • Lee, Ki-Yong
    • Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.159-166
    • /
    • 2003
  • To reduce the high dimensionality required for training of feature vectors in speaker identification, we propose an efficient GMM based on local PCA with Fuzzy clustering. The proposed method firstly partitions the data space into several disjoint clusters by fuzzy clustering, and then performs PCA using the fuzzy covariance matrix in each cluster. Finally, the GMM for speaker is obtained from the transformed feature vectors with reduced dimension in each cluster. Compared to the conventional GMM with diagonal covariance matrix, the proposed method needs less storage and shows faster result, under the same performance.

  • PDF

Speaker Identification Using GMM Based on LPCA (LPCA에 기반한 GMM을 이용한 화자 식별)

  • Seo, Chang-Woo;Lee, Youn-Jeong;Lee, Ki-Yong
    • Speech Sciences
    • /
    • v.12 no.2
    • /
    • pp.171-182
    • /
    • 2005
  • An efficient GMM (Gaussian mixture modeling) method based on LPCA (local principal component analysis) with VQ (vector quantization) for speaker identification is proposed. To reduce the dimension and correlation of the feature vector, this paper proposes a speaker identification method based on principal component analysis. The proposed method firstly partitions the data space into several disjoint regions by VQ, and then performs PCA in each region. Finally, the GMM for the speaker is obtained from the transformed feature vectors in each region. Compared to the conventional GMM method with diagonal covariance matrix, the proposed method requires less storage and complexity while maintaining the same performance requires less storage and shows faster results.

  • PDF

Estimation of Mixture Numbers of GMM for Speaker Identification (화자 식별을 위한 GMM의 혼합 성분의 개수 추정)

  • Lee, Youn-Jeong;Lee, Ki-Yong
    • Speech Sciences
    • /
    • v.11 no.2
    • /
    • pp.237-245
    • /
    • 2004
  • In general, Gaussian mixture model(GMM) is used to estimate the speaker model for speaker identification. The parameter estimates of the GMM are obtained by using the expectation-maximization (EM) algorithm for the maximum likelihood(ML) estimation. However, if the number of mixtures isn't defined well in the GMM, those parameters are obtained inappropriately. The problem to find the number of components is significant to estimate the optimal parameter in mixture model. In this paper, to estimate the optimal number of mixtures, we propose the method that starts from the sufficient mixtures, after, the number is reduced by investigating the mutual information between mixtures for GMM. In result, we can estimate the optimal number of mixtures. The effectiveness of the proposed method is shown by the experiment using artificial data. Also, we performed the speaker identification applying the proposed method comparing with other approaches.

  • PDF

Parameters Comparison in the speaker Identification under the Noisy Environments (화자식별을 위한 파라미터의 잡음환경에서의 성능비교)

  • Choi, Hong-Sub
    • Speech Sciences
    • /
    • v.7 no.3
    • /
    • pp.185-195
    • /
    • 2000
  • This paper seeks to compare the feature parameters used in speaker identification systems under noisy environments. The feature parameters compared are LP cepstrum (LPCC), Cepstral mean subtraction(CMS), Pole-filtered CMS(PFCMS), Adaptive component weighted cepstrum(ACW) and Postfilter cepstrum(PF). The GMM-based text independent speaker identification system is designed for this target. Some series of experiments show that the LPCC parameter is adequate for modelling the speaker in the matched environments between train and test stages. But in the mismatched training and testing conditions, modified parameters are preferable the LPCC. Especially CMS and PFCMS parameters are more effective for the microphone mismatching conditions while the ACW and PF parameters are good for more noisy mismatches.

  • PDF

Global Covariance based Principal Component Analysis for Speaker Identification (화자식별을 위한 전역 공분산에 기반한 주성분분석)

  • Seo, Chang-Woo;Lim, Young-Hwan
    • Phonetics and Speech Sciences
    • /
    • v.1 no.1
    • /
    • pp.69-73
    • /
    • 2009
  • This paper proposes an efficient global covariance-based principal component analysis (GCPCA) for speaker identification. Principal component analysis (PCA) is a feature extraction method which reduces the dimension of the feature vectors and the correlation among the feature vectors by projecting the original feature space into a small subspace through a transformation. However, it requires a larger amount of training data when performing PCA to find the eigenvalue and eigenvector matrix using the full covariance matrix by each speaker. The proposed method first calculates the global covariance matrix using training data of all speakers. It then finds the eigenvalue matrix and the corresponding eigenvector matrix from the global covariance matrix. Compared to conventional PCA and Gaussian mixture model (GMM) methods, the proposed method shows better performance while requiring less storage space and complexity in speaker identification.

  • PDF

Combination of Classifiers Decisions for Multilingual Speaker Identification

  • Nagaraja, B.G.;Jayanna, H.S.
    • Journal of Information Processing Systems
    • /
    • v.13 no.4
    • /
    • pp.928-940
    • /
    • 2017
  • State-of-the-art speaker recognition systems may work better for the English language. However, if the same system is used for recognizing those who speak different languages, the systems may yield a poor performance. In this work, the decisions of a Gaussian mixture model-universal background model (GMM-UBM) and a learning vector quantization (LVQ) are combined to improve the recognition performance of a multilingual speaker identification system. The difference between these classifiers is in their modeling techniques. The former one is based on probabilistic approach and the latter one is based on the fine-tuning of neurons. Since the approaches are different, each modeling technique identifies different sets of speakers for the same database set. Therefore, the decisions of the classifiers may be used to improve the performance. In this study, multitaper mel-frequency cepstral coefficients (MFCCs) are used as the features and the monolingual and cross-lingual speaker identification studies are conducted using NIST-2003 and our own database. The experimental results show that the combined system improves the performance by nearly 10% compared with that of the individual classifier.

Speaker Identification Using Dynamic Time Warping Algorithm (동적 시간 신축 알고리즘을 이용한 화자 식별)

  • Jeong, Seung-Do
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.5
    • /
    • pp.2402-2409
    • /
    • 2011
  • The voice has distinguishable acoustic properties of speaker as well as transmitting information. The speaker recognition is the method to figures out who speaks the words through acoustic differences between speakers. The speaker recognition is roughly divided two kinds of categories: speaker verification and identification. The speaker verification is the method which verifies speaker himself based on only one's voice. Otherwise, the speaker identification is the method to find speaker by searching most similar model in the database previously consisted of multiple subordinate sentences. This paper composes feature vector from extracting MFCC coefficients and uses the dynamic time warping algorithm to compare the similarity between features. In order to describe common characteristic based on phonological features of spoken words, two subordinate sentences for each speaker are used as the training data. Thus, it is possible to identify the speaker who didn't say the same word which is previously stored in the database.