• Title/Summary/Keyword: Speaker Verification

Search Result 162, Processing Time 0.025 seconds

Frame Selection, Hybrid, Modified Weighting Model Rank Method for Robust Text-independent Speaker Identification (강건한 문맥독립 화자식별을 위한 프레임 선택방법, 복합방법, 수정된 가중모델순위 방법)

  • 김민정;오세진;정호열;정현열
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.8
    • /
    • pp.735-743
    • /
    • 2002
  • In this paper, we propose three new text-independent speaker identification methods. At first, to exclude the frames not having enough features of speaker's vocal from calculation of the maximum likelihood, we propose the FS(Frame Selection) method. This approach selects the important frames by evaluating the difference between the biggest likelihood and the second in each frame, and uses only the frames in calculating the score of likelihood. Our secondly proposed, called the Hybrid, is a combined version of the FS and WMR(Weighting Model Rank). This method determines the claimed speaker using exponential function weights, instead of likelihood itself, only on the selected frames obtained from the FS method. The last proposed, called MWMR (Modified WMR), considers both original likelihood itself and its relative position, when the claimed speaker is determined. It is different from the WMR that take into account only the relative position of likelihood. Through the experiments of the speaker identification, we show that the all the proposed have higher identification rates than the ML. In addition, the Hybrid and MWMR have higher identification rate about 2% and about 3% than WMR, respectively.

Implementation of a Robust Speaker Recognition System in Noisy Environment Using AR HMM with Duration-term (지속시간항을 갖는 AR HMM을 이용한 잡음환경에서의 강인 화자인식 시스템 구현)

  • 이기용;임재열
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.6
    • /
    • pp.26-33
    • /
    • 2001
  • Though speaker recognition based on conventional AR HMM shows good performance, its lack of modeling the environmental noise makes its performance degraded in case of practical noisy environment. In this paper, a robust speaker recognition system based on AR HMM is proposed, where noise is considered in the observation signal model for practical noisy environment and duration-term is considered to increase performance. Experimental results, using the digits database from 100 speakers (77 males and 23 females) under white noise and car noise, show improved performance.

  • PDF

A Study on Channel Mis-match Compensation Technique for Robust Speaker Verification System (강인한 화자확인 시스템을 위한 채널 불일치 보상 기법에 관한 연구)

  • 강철호;정희석
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.3
    • /
    • pp.228-234
    • /
    • 2004
  • In this paper, we proposed the compensation technique that overcomes the limitations of the conventional approaches through summing up the bias terms between world's codebook and individual codebook vectors of feature parameters. But, mean compensation without condition can bring higher false acceptance. Therefore, the proposed technique compensates the channel mis-match condition by weighted bias sum using nonlinear function regarding to the distortion between speech and silence. The simulation results show that the FRR (flase reject rate) is decreased 14.95% when the proposed algorithm was applied.

A Novel Two-Level Pitch Detection Approach for Speaker Tracking in Robot Control

  • Hejazi, Mahmoud R.;Oh, Han;Kim, Hong-Kook;Ho, Yo-Sung
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2005.06a
    • /
    • pp.89-92
    • /
    • 2005
  • Using natural speech commands for controlling a human-robot is an interesting topic in the field of robotics. In this paper, our main focus is on the verification of a speaker who gives a command to decide whether he/she is an authorized person for commanding. Among possible dynamic features of natural speech, pitch period is one of the most important ones for characterizing speech signals and it differs usually from person to person. However, current techniques of pitch detection are still not to a desired level of accuracy and robustness. When the signal is noisy or there are multiple pitch streams, the performance of most techniques degrades. In this paper, we propose a two-level approach for pitch detection which in compare with standard pitch detection algorithms, not only increases accuracy, but also makes the performance more robust to noise. In the first level of the proposed approach we discriminate voiced from unvoiced signals based on a neural classifier that utilizes cepstrum sequences of speech as an input feature set. Voiced signals are then further processed in the second level using a modified standard AMDF-based pitch detection algorithm to determine their pitch periods precisely. The experimental results show that the accuracy of the proposed system is better than those of conventional pitch detection algorithms for speech signals in clean and noisy environments.

  • PDF

A Method on the Learning Speed Improvement of the Online Error Backpropagation Algorithm in Speech Processing (음성처리에서 온라인 오류역전파 알고리즘의 학습속도 향상방법)

  • 이태승;이백영;황병원
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.5
    • /
    • pp.430-437
    • /
    • 2002
  • Having a variety of good characteristics against other pattern recognition techniques, the multilayer perceptron (MLP) has been widely used in speech recognition and speaker recognition. But, it is known that the error backpropagation (EBP) algorithm that MLP uses in learning has the defect that requires restricts long learning time, and it restricts severely the applications like speaker recognition and speaker adaptation requiring real time processing. Because the learning data for pattern recognition contain high redundancy, in order to increase the learning speed it is very effective to use the online-based learning methods, which update the weight vector of the MLP by the pattern. A typical online EBP algorithm applies the fixed learning rate for each update of the weight vector. Though a large amount of speedup with the online EBP can be obtained by choosing the appropriate fixed rate, firing the rate leads to the problem that the algorithm cannot respond effectively to different learning phases as the phases change and the number of patterns contributing to learning decreases. To solve this problem, this paper proposes a Changing rate and Omitting patterns in Instant Learning (COIL) method to apply the variable rate and the only patterns necessary to the learning phase when the phases come to change. In this paper, experimentations are conducted for speaker verification and speech recognition, and results are presented to verify the performance of the COIL.

Automatic Speech Style Recognition Through Sentence Sequencing for Speaker Recognition in Bilateral Dialogue Situations (양자 간 대화 상황에서의 화자인식을 위한 문장 시퀀싱 방법을 통한 자동 말투 인식)

  • Kang, Garam;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.2
    • /
    • pp.17-32
    • /
    • 2021
  • Speaker recognition is generally divided into speaker identification and speaker verification. Speaker recognition plays an important function in the automatic voice system, and the importance of speaker recognition technology is becoming more prominent as the recent development of portable devices, voice technology, and audio content fields continue to expand. Previous speaker recognition studies have been conducted with the goal of automatically determining who the speaker is based on voice files and improving accuracy. Speech is an important sociolinguistic subject, and it contains very useful information that reveals the speaker's attitude, conversation intention, and personality, and this can be an important clue to speaker recognition. The final ending used in the speaker's speech determines the type of sentence or has functions and information such as the speaker's intention, psychological attitude, or relationship to the listener. The use of the terminating ending has various probabilities depending on the characteristics of the speaker, so the type and distribution of the terminating ending of a specific unidentified speaker will be helpful in recognizing the speaker. However, there have been few studies that considered speech in the existing text-based speaker recognition, and if speech information is added to the speech signal-based speaker recognition technique, the accuracy of speaker recognition can be further improved. Hence, the purpose of this paper is to propose a novel method using speech style expressed as a sentence-final ending to improve the accuracy of Korean speaker recognition. To this end, a method called sentence sequencing that generates vector values by using the type and frequency of the sentence-final ending appearing in the utterance of a specific person is proposed. To evaluate the performance of the proposed method, learning and performance evaluation were conducted with a actual drama script. The method proposed in this study can be used as a means to improve the performance of Korean speech recognition service.

Performance Analysis of Speech Parameters and a New Decision Logic for Speaker Recognition (화자인식을 위한 음성 요소들의 성능분석 및 새로운 판단 논리)

  • Lee, Hyuk-Jae;Lee, Byeong-Gi
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.26 no.7
    • /
    • pp.146-156
    • /
    • 1989
  • This paper discusses how to choose speech parameters and decision logics to improve the performance of speaker recognition systems. It also considers the influence of the reference patterns on the speaker recognition. It is observed from the performance analysis based on LPSs, PARCOR coefficients and LPC-cepstrum coefficients that LPC-cepstrum coefficients are superior to the others in speaker recognition without regard to the reference patterns. In order to improve the recognition performance, a new decision logic is proposed based on a generalized-distance concept. It differs from the existing methods in that it considers the statistics of customer and impostors at the same time. It turns out from a speaker verification test that the proposed decision logic ferforms better than the existing ones.

  • PDF

An Implementation of Security System Using Speaker Recognition Algorithm (화자인식 알고리즘을 이용한 보안 시스템 구축)

  • Shin, You-Shik;Park, Kee-Young;Kim, Chong-Kyo
    • Journal of the Korean Institute of Telematics and Electronics T
    • /
    • v.36T no.4
    • /
    • pp.17-23
    • /
    • 1999
  • This paper described a security system using text-independent speaker recognition algorithm. Security system is based on PIC16F84 and sound card. Speaker recognition algorithm applied a k-means based model and weighted cepstrum for speech features. As the experimental results, recognition rate of the training data is 100%, non-training data is 99%. Also false rejection rate is 1%, false acceptance rate is 0% and verification mean error rate is 0.5% for registered 5 persons.

  • PDF

Driver Verification System Using Biometrical GMM Supervector Kernel (생체기반 GMM Supervector Kernel을 이용한 운전자검증 기술)

  • Kim, Hyoung-Gook
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.9 no.3
    • /
    • pp.67-72
    • /
    • 2010
  • This paper presents biometrical driver verification system in car experiment through analysis of speech, and face information. We have used Mel-scale Frequency Cesptral Coefficients (MFCCs) for speaker verification using speech information. For face verification, face region is detected by AdaBoost algorithm and dimension-reduced feature vector is extracted by using principal component analysis only from face region. In this paper, we apply the extracted speech- and face feature vectors to an SVM kernel with Gaussian Mixture Models(GMM) supervector. The experimental results of the proposed approach show a clear improvement compared to a simple GMM or SVM approach.

Performance Improvement of Speaker Verification System By Speaker Information Weighting (개인성 정보의 가중화에 의한 화자확인의 성능향상)

  • 김세현;장길진;오영환
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1999.10b
    • /
    • pp.539-541
    • /
    • 1999
  • 기존의 문장종속형 화자인식 기법에서는 음성 신호의 각 분석 프레임이 같은 기여도를 갖는 것으로 간주한다. 화자인식 시스템의 성능향상을 위해서는 음운정보보다는 인식의 단서가 되는 화자의 개인성 정보가 잘 반영되도록 하는 것이 중요하다. 본 논문에서는 HMM (hidden Markov model)을 기반으로 한 문장종속형 화자확인 시스템의 성능향상을 위해 프레임별로 인식의 단서가 되는 개인성 정보의 양을 측정하는 방법과, 이를 화자확인 시스템에 적용하는 기법을 제안한다. 제안한 방법을 적용한 결과, 기존의 우도비(likelihood ratio) 정규화 점수를 사용하는 방법에 비해 동일오류율(EER; equal error rate)을 평균 34% 감소시켜 인식율 향상을 얻을 수 있다.

  • PDF