• Title/Summary/Keyword: Speaker Identification

Search Result 152, Processing Time 0.03 seconds

Double Compensation Framework Based on GMM For Speaker Recognition (화자 인식을 위한 GMM기반의 이중 보상 구조)

  • Kim Yu-Jin;Chung Jae-Ho
    • MALSORI
    • /
    • no.45
    • /
    • pp.93-105
    • /
    • 2003
  • In this paper, we present a single framework based on GMM for speaker recognition. The proposed framework can simultaneously minimize environmental variations on mismatched conditions and adapt the bias free and speaker-dependent characteristics of claimant utterances to the background GMM to create a speaker model. We compare the closed-set speaker identification for conventional method and the proposed method both on TIMIT and NTIMIT. In the several sets of experiments we show the improved recognition rates on a simulated channel and a telephone channel condition by 7.2% and 27.4% respectively.

  • PDF

RPCA-GMM for Speaker Identification (화자식별을 위한 강인한 주성분 분석 가우시안 혼합 모델)

  • 이윤정;서창우;강상기;이기용
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.7
    • /
    • pp.519-527
    • /
    • 2003
  • Speech is much influenced by the existence of outliers which are introduced by such an unexpected happenings as additive background noise, change of speaker's utterance pattern and voice detection errors. These kinds of outliers may result in severe degradation of speaker recognition performance. In this paper, we proposed the GMM based on robust principal component analysis (RPCA-GMM) using M-estimation to solve the problems of both ouliers and high dimensionality of training feature vectors in speaker identification. Firstly, a new feature vector with reduced dimension is obtained by robust PCA obtained from M-estimation. The robust PCA transforms the original dimensional feature vector onto the reduced dimensional linear subspace that is spanned by the leading eigenvectors of the covariance matrix of feature vector. Secondly, the GMM with diagonal covariance matrix is obtained from these transformed feature vectors. We peformed speaker identification experiments to show the effectiveness of the proposed method. We compared the proposed method (RPCA-GMM) with transformed feature vectors to the PCA and the conventional GMM with diagonal matrix. Whenever the portion of outliers increases by every 2%, the proposed method maintains almost same speaker identification rate with 0.03% of little degradation, while the conventional GMM and the PCA shows much degradation of that by 0.65% and 0.55%, respectively This means that our method is more robust to the existence of outlier.

Speaker Identification on Various Environments Using an Ensemble of Kernel Principal Component Analysis (커널 주성분 분석의 앙상블을 이용한 다양한 환경에서의 화자 식별)

  • Yang, Il-Ho;Kim, Min-Seok;So, Byung-Min;Kim, Myung-Jae;Yu, Ha-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.31 no.3
    • /
    • pp.188-196
    • /
    • 2012
  • In this paper, we propose a new approach to speaker identification technique which uses an ensemble of multiple classifiers (speaker identifiers). KPCA (kernel principal component analysis) enhances features for each classifier. To reduce the processing time and memory requirements, we select limited number of samples randomly which are used as estimation set for each KPCA basis. The experimental result shows that the proposed approach gives a higher identification accuracy than GKPCA (greedy kernel principal component analysis).

Improvement in Supervector Linear Kernel SVM for Speaker Identification Using Feature Enhancement and Training Length Adjustment (특징 강화 기법과 학습 데이터 길이 조절에 의한 Supervector Linear Kernel SVM 화자식별 개선)

  • So, Byung-Min;Kim, Kyung-Wha;Kim, Min-Seok;Yang, Il-Ho;Kim, Myung-Jae;Yu, Ha-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.6
    • /
    • pp.330-336
    • /
    • 2011
  • In this paper, we propose a new method to improve the performance of supervector linear kernel SVM (Support Vector Machine) for speaker identification. This method is based on splitting one training datum into several pieces of utterances. We use four different databases for evaluating performance and use PCA (Principal Component Analysis), GKPCA (Greedy Kernel PCA) and KMDA (Kernel Multimodal Discriminant Analysis) for feature enhancement. As a result, the proposed method shows improved performance for speaker identification using supervector linear kernel SVM.

A Robust Speaker Identification Method Based on the Wavelet Filter Banks (웨이블렛 필터뱅크에 기반을 둔 강인한 화자식별 기법)

  • Lee, Dae-Jong;Gwak, Geun-Chang;Yu, Jeong-Ung;Jeon, Myeong-Geun
    • The KIPS Transactions:PartC
    • /
    • v.9C no.4
    • /
    • pp.459-466
    • /
    • 2002
  • This paper proposes a robust speaker identification algorithm based on the wavelet filter banks and multiple decision-making scheme. Since the proposed speaker identification algorithm has a structure performing the identification algorithm independently for each subband, the noise effect of an subband can be localized. Through this process, we can obtain more robust results for the environmental noises which generally have band limited frequency. In the experiments, the proposed method showed more 15∼60% improvement than the vector quantization method for the various noisy environments.

Character-Based Video Summarization Using Speaker Identification (화자 인식을 통한 등장인물 기반의 비디오 요약)

  • Lee Soon-Tak;Kim Jong-Sung;Kang Chan-Mi;Baek Joong-Hwan
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.6 no.4
    • /
    • pp.163-168
    • /
    • 2005
  • In this paper, we propose a character-based summarization algorithm using speaker identification method from the dialog in video. First, we extract the dialog of shots containing characters' face and then, classify the scene according to actor/actress by performing speaker identification. The classifier is based on the GMM(Gaussian Mixture Model) using the 24 values of MFCC(Mel Frequency Cepstrum Coefficient). GMM is trained to recognize one actor/actress among four who are all trained by GMM. Our experiment result shows that GMM classifier obtains the error rate of 0.138 from our video data.

  • PDF

Text-independent Speaker Identification by Bagging VQ Classifier

  • Kyung, Youn-Jeong;Park, Bong-Dae;Lee, Hwang-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.2E
    • /
    • pp.17-24
    • /
    • 2001
  • In this paper, we propose the bootstrap and aggregating (bagging) vector quantization (VQ) classifier to improve the performance of the text-independent speaker recognition system. This method generates multiple training data sets by resampling the original training data set, constructs the corresponding VQ classifiers, and then integrates the multiple VQ classifiers into a single classifier by voting. The bagging method has been proven to greatly improve the performance of unstable classifiers. Through two different experiments, this paper shows that the VQ classifier is unstable. In one of these experiments, the bias and variance of a VQ classifier are computed with a waveform database. The variance of the VQ classifier is compared with that of the classification and regression tree (CART) classifier[1]. The variance of the VQ classifier is shown to be as large as that of the CART classifier. The other experiment involves speaker recognition. The speaker recognition rates vary significantly by the minor changes in the training data set. The speaker recognition experiments involving a closed set, text-independent and speaker identification are performed with the TIMIT database to compare the performance of the bagging VQ classifier with that of the conventional VQ classifier. The bagging VQ classifier yields improved performance over the conventional VQ classifier. It also outperforms the conventional VQ classifier in small training data set problems.

  • PDF

Noise Robust Text-Independent Speaker Identification for Ubiquitous Robot Companion (지능형 서비스 로봇을 위한 잡음에 강인한 문맥독립 화자식별 시스템)

  • Kim, Sung-Tak;Ji, Mi-Kyoung;Kim, Hoi-Rin;Kim, Hye-Jin;Yoon, Ho-Sub
    • 한국HCI학회:학술대회논문집
    • /
    • 2008.02a
    • /
    • pp.190-194
    • /
    • 2008
  • This paper presents a speaker identification technique which is one of the basic techniques of the ubiquitous robot companion. Though the conventional mel-frequency cepstral coefficients guarantee high performance of speaker identification in clean condition, the performance is degraded dramatically in noise condition. To overcome this problem, we employed the relative autocorrelation sequence mel-frequency cepstral coefficient which is one of the noise robust features. However, there are two problems in relative autocorrelation sequence mel-frequency cepstral coefficient: 1) the limited information problem. 2) the residual noise problem. In this paper, to deal with these drawbacks, we propose a multi-streaming method for the limited information problem and a hybrid method for the residual noise problem. To evaluate proposed methods, noisy speech is used in which air conditioner noise, classic music, and vacuum noise are artificially added. Through experiments, proposed methods provide better performance of speaker identification than the conventional methods.

  • PDF

Speaker Change Detection Based on a Graph-Partitioning Criterion

  • Seo, Jin-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.2
    • /
    • pp.80-85
    • /
    • 2011
  • Speaker change detection involves the identification of time indices of an audio stream, where the identity of the speaker changes. In this paper, we propose novel measures for the speaker change detection based on a graph-partitioning criterion over the pairwise distance matrix of feature-vector stream. Experiments on both synthetic and real-world data were performed and showed that the proposed approach yield promising results compared with the conventional statistical measures.

Speaker Identification Using an Ensemble of Feature Enhancement Methods (특징 강화 방법의 앙상블을 이용한 화자 식별)

  • Yang, IL-Ho;Kim, Min-Seok;So, Byung-Min;Kim, Myung-Jae;Yu, Ha-Jin
    • Phonetics and Speech Sciences
    • /
    • v.3 no.2
    • /
    • pp.71-78
    • /
    • 2011
  • In this paper, we propose an approach which constructs classifier ensembles of various channel compensation and feature enhancement methods. CMN and CMVN are used as channel compensation methods. PCA, kernel PCA, greedy kernel PCA, and kernel multimodal discriminant analysis are used as feature enhancement methods. The proposed ensemble system is constructed with the combination of 15 classifiers which include three channel compensation methods (including 'without compensation') and five feature enhancement methods (including 'without enhancement'). Experimental results show that the proposed ensemble system gives highest average speaker identification rate in various environments (channels, noises, and sessions).

  • PDF