• Title/Summary/Keyword: speaker identification

Search Result 152, Processing Time 0.028 seconds

Forensic Automatic Speaker Identification System for Korean Speakers (과학수사를 위한 한국인 음성 특화 자동화자식별시스템)

  • Kim, Kyung-Wha;So, Byung-Min;Yu, Ha-Jin
    • Phonetics and Speech Sciences
    • /
    • v.4 no.3
    • /
    • pp.95-101
    • /
    • 2012
  • In this paper, we introduce the automatic speaker identification system 'SPO(Supreme Prosecutors Office) Verifier'. SPO Verifier is a GMM(Gaussian mixture model)-UBM(universal background model) based automatic speaker recognition system and has been developed using Korean speakers' utterances. This system uses a channel compensation algorithm to compensate recording device characteristics. The system can give the users the ability to manage reference models with utterances from various environments to get more accurate recognition results. To evaluate the performance of SPO Verifier on Korean speakers, we compared this system with one of the most widely used commercial systems in the forensic field. The results showed that SPO Verifier shows lower EER(equal error rate) than that of the commercial system.

Histogram Equalization Using Background Speakers' Utterances for Speaker Identification (화자 식별에서의 배경화자데이터를 이용한 히스토그램 등화 기법)

  • Kim, Myung-Jae;Yang, Il-Ho;So, Byung-Min;Kim, Min-Seok;Yu, Ha-Jin
    • Phonetics and Speech Sciences
    • /
    • v.4 no.2
    • /
    • pp.79-86
    • /
    • 2012
  • In this paper, we propose a novel approach to improve histogram equalization for speaker identification. Our method collects all speech features of UBM training data to make a reference distribution. The ranks of the feature vectors are calculated in the sorted list of the collection of the UBM training data and the test data. We use the ranks to perform order-based histogram equalization. The proposed method improves the accuracy of the speaker recognition system with short utterances. We use four kinds of speech databases to evaluate the proposed speaker recognition system and compare the system with cepstral mean normalization (CMN), mean and variance normalization (MVN), and histogram equalization (HEQ). Our system reduced the relative error rate by 33.3% from the baseline system.

Dysarthric speaker identification with different degrees of dysarthria severity using deep belief networks

  • Farhadipour, Aref;Veisi, Hadi;Asgari, Mohammad;Keyvanrad, Mohammad Ali
    • ETRI Journal
    • /
    • v.40 no.5
    • /
    • pp.643-652
    • /
    • 2018
  • Dysarthria is a degenerative disorder of the central nervous system that affects the control of articulation and pitch; therefore, it affects the uniqueness of sound produced by the speaker. Hence, dysarthric speaker recognition is a challenging task. In this paper, a feature-extraction method based on deep belief networks is presented for the task of identifying a speaker suffering from dysarthria. The effectiveness of the proposed method is demonstrated and compared with well-known Mel-frequency cepstral coefficient features. For classification purposes, the use of a multi-layer perceptron neural network is proposed with two structures. Our evaluations using the universal access speech database produced promising results and outperformed other baseline methods. In addition, speaker identification under both text-dependent and text-independent conditions are explored. The highest accuracy achieved using the proposed system is 97.3%.

Frame Selection, Hybrid, Modified Weighting Model Rank Method for Robust Text-independent Speaker Identification (강건한 문맥독립 화자식별을 위한 프레임 선택방법, 복합방법, 수정된 가중모델순위 방법)

  • 김민정;오세진;정호열;정현열
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.8
    • /
    • pp.735-743
    • /
    • 2002
  • In this paper, we propose three new text-independent speaker identification methods. At first, to exclude the frames not having enough features of speaker's vocal from calculation of the maximum likelihood, we propose the FS(Frame Selection) method. This approach selects the important frames by evaluating the difference between the biggest likelihood and the second in each frame, and uses only the frames in calculating the score of likelihood. Our secondly proposed, called the Hybrid, is a combined version of the FS and WMR(Weighting Model Rank). This method determines the claimed speaker using exponential function weights, instead of likelihood itself, only on the selected frames obtained from the FS method. The last proposed, called MWMR (Modified WMR), considers both original likelihood itself and its relative position, when the claimed speaker is determined. It is different from the WMR that take into account only the relative position of likelihood. Through the experiments of the speaker identification, we show that the all the proposed have higher identification rates than the ML. In addition, the Hybrid and MWMR have higher identification rate about 2% and about 3% than WMR, respectively.

Text-Independent Speaker Identification System Using Speaker Decision Network Based on Delayed Summing (지연누적에 기반한 화자결정회로망이 도입된 구문독립 화자인식시스템)

  • 이종은;최진영
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.8 no.2
    • /
    • pp.82-95
    • /
    • 1998
  • In this paper, we propose a text-independent speaker identification system which has a classifier composed of two parts; to calculate the degree of likeness of each speech frame and to select the most probable speaker from the entire speech duration. The first part is realized using RBFN which is selforganized through learning and in the second part the speaker is determined using a con-tbination of MAXNET and delayed summings. And we use features from linear speech production model and features from fractal geometry. Closed-set speaker identification experiments on 13 male homogeneous speakers show that the proposed techniques can achieve the identification ratio of 100% as the number of delays increases.

  • PDF

Improvement of Reliability based Information Integration in Audio-visual Person Identification (시청각 화자식별에서 신뢰성 기반 정보 통합 방법의 성능 향상)

  • Tariquzzaman, Md.;Kim, Jin-Young;Hong, Joon-Hee
    • MALSORI
    • /
    • no.62
    • /
    • pp.149-161
    • /
    • 2007
  • In this paper we proposed a modified reliability function for improving bimodal speaker identification(BSI) performance. The convectional reliability function, used by N. Fox[1], is extended by introducing an optimization factor. We evaluated the proposed method in BSI domain. A BSI system was implemented based on GMM and it was tested using VidTIMIT database. Through speaker identification experiments we verified the usefulness of our proposed method. The experiments showed the improved performance, i.e., the reduction of error rate by 39%.

  • PDF

Speaker Identification Using Score-based Confidence in Noisy Environments (스코어 기반 관측신뢰도를 이용한 잡음환경하 화자식별)

  • Min, So-Hee;Song, Min-Gyu;Na, Seung-You;Choi, Seung-Ho;Kim, Jin-Young
    • Speech Sciences
    • /
    • v.14 no.4
    • /
    • pp.145-156
    • /
    • 2007
  • The performance of speaker identification is severely degraded in noisy environments. Recently probability weighting method based on observation membership was proposed for overcoming the noise problem[1]. In the paper[1] the observation confidence was calculated from SNR with sigmoid function. However, estimating SNR needs additive calculation amount and estimated SNR is corrupted in dynamic noisy environments. In this paper we propose estimation methods of the observation confidence based on score-based reliabilities (SBR) of entropy and dispersion measures. Generally SBRs are obtained from speaker models' probabilities. The proposed methods are evaluated with ETRI speaker recognition DB. We compared the performances of the proposed methods with those in [1][8]. The experimental results show that the proposed methods can be successfully applied for the case where SNR is not available.

  • PDF

On the Use of Various Resolution Filterbanks for Speaker Identification

  • Lee, Bong-Jin;Kang, Hong-Goo;Youn, Dae-Hee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.3E
    • /
    • pp.80-86
    • /
    • 2007
  • In this paper, we utilize generalized warped filterbanks to improve the performance of speaker recognition systems. At first, the performance of speaker identification systems is analyzed by varying the type of warped filterbanks. Based on the results that the error pattern of recognition system is different depending on the type of filterbank used, we combine the likelihood values of the statistical models that consist of the features extracting from multiple warped filterbanks. Simulation results with TIMIT and NTIMIT database verify that the proposed system shows relative improvement of identification rate by 31.47% and 15.14% comparing it to the conventional system.

Speaker Identification Using Greedy Kernel PCA (Greedy Kernel PCA를 이용한 화자식별)

  • Kim, Min-Seok;Yang, Il-Ho;Yu, Ha-Jin
    • MALSORI
    • /
    • no.66
    • /
    • pp.105-116
    • /
    • 2008
  • In this research, we propose a speaker identification system using a kernel method which is expected to model the non-linearity of speech features well. We have been using principal component analysis (PCA) successfully, and extended to kernel PCA, which is used for many pattern recognition tasks such as face recognition. However, we cannot use kernel PCA for speaker identification directly because the storage required for the kernel matrix grows quadratically, and the computational cost grows linearly (computing eigenvector of $l{\times}l$ matrix) with the number of training vectors I. Therefore, we use greedy kernel PCA which can approximate kernel PCA with small representation error. In the experiments, we compare the accuracy of the greedy kernel PCA with the baseline Gaussian mixture models using MFCCs and PCA. As the results with limited enrollment data show, the greedy kernel PCA outperforms conventional methods.

  • PDF

On a robust text-dependent speaker identification over telephone channels (전화음성에 강인한 문장종속 화자인식에 관한 연구)

  • Jung, Eu-Sang;Choi, Hong-Sub
    • Speech Sciences
    • /
    • v.2
    • /
    • pp.57-66
    • /
    • 1997
  • This paper studies the effects of the method, CMS(Cepstral Mean Subtraction), (which compensates for some of the speech distortion. caused by telephone channels), on the performance of the text-dependent speaker identification system. This system is based on the VQ(Vector Quantization) and HMM(Hidden Markov Model) method and chooses the LPC-Cepstrum and Mel-Cepstrum as the feature vectors extracted from the speech data transmitted through telephone channels. Accordingly, we can compare the correct recognition rates of the speaker identification system between the use of LPC-Cepstrum and Mel-Cepstrum. Finally, from the experiment results table, it is found that the Mel-Cepstrum parameter is proven to be superior to the LPC-Cepstrum and that recognition performance improves by about 10% when compensating for telephone channel using the CMS.

  • PDF