• Title/Summary/Keyword: robust speaker verification

Search Result 20, Processing Time 0.022 seconds

A Phase-related Feature Extraction Method for Robust Speaker Verification (열악한 환경에 강인한 화자인증을 위한 위상 기반 특징 추출 기법)

  • Kwon, Chul-Hong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.3
    • /
    • pp.613-620
    • /
    • 2010
  • Additive noise and channel distortion strongly degrade the performance of speaker verification systems, as it introduces distortion of the features of speech. This distortion causes a mismatch between the training and recognition conditions such that acoustic models trained with clean speech do not model noisy and channel distorted speech accurately. This paper presents a phase-related feature extraction method in order to improve the robustness of the speaker verification systems. The instantaneous frequency is computed from the phase of speech signals and features from the histogram of the instantaneous frequency are obtained. Experimental results show that the proposed technique offers significant improvements over the standard techniques in both clean and adverse testing environments.

Implementation of a Robust Speaker Recognition System in Noisy Environment Using AR HMM with Duration-term (지속시간항을 갖는 AR HMM을 이용한 잡음환경에서의 강인 화자인식 시스템 구현)

  • 이기용;임재열
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.6
    • /
    • pp.26-33
    • /
    • 2001
  • Though speaker recognition based on conventional AR HMM shows good performance, its lack of modeling the environmental noise makes its performance degraded in case of practical noisy environment. In this paper, a robust speaker recognition system based on AR HMM is proposed, where noise is considered in the observation signal model for practical noisy environment and duration-term is considered to increase performance. Experimental results, using the digits database from 100 speakers (77 males and 23 females) under white noise and car noise, show improved performance.

  • PDF

A Robust Method for Speech Replay Attack Detection

  • Lin, Lang;Wang, Rangding;Yan, Diqun;Dong, Li
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.1
    • /
    • pp.168-182
    • /
    • 2020
  • Spoofing attacks, especially replay attacks, pose great security challenges to automatic speaker verification (ASV) systems. Current works on replay attacks detection primarily focused on either developing new features or improving classifier performance, ignoring the effects of feature variability, e.g., the channel variability. In this paper, we first establish a mathematical model for replay speech and introduce a method for eliminating the negative interference of the channel. Then a novel feature is proposed to detect the replay attacks. To further boost the detection performance, four post-processing methods using normalization techniques are investigated. We evaluate our proposed method on the ASVspoof 2017 dataset. The experimental results show that our approach outperforms the competing methods in terms of detection accuracy. More interestingly, we find that the proposed normalization strategy could also improve the performance of the existing algorithms.

Speaker Verification System with Hybrid Model Improved by Adapted Continuous Wavelet Transform

  • Kim, Hyoungsoo;Yang, Sung-il;Younghun Kwon;Kyungjoon Cha
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.3E
    • /
    • pp.30-36
    • /
    • 1999
  • In this paper, we develop a hybrid speaker recognition system [1] enhanced by pre-recognizer and post-recognizer. The pre-recognizer consists of general speech recognition systems and the post-recognizer is a pitch detection system using adapted continuous wavelet transform (ACWT) to improve the performance of the hybrid speaker recognition system. Two schemes to design ACWT is considered. One is the scheme to search basis library covering the whole band of speech fundamental frequency (speech pitch). The other is the scheme to determine which one is the best basis. Information cost functional is used for the criterion for the latter. ACWT is robust enough to classify the pitch of speech very well, even though the speech signal is badly damaged by environmental noises.

  • PDF

Noise-Robust Speaker Recognition Using Subband Likelihoods and Reliable-Feature Selection

  • Kim, Sung-Tak;Ji, Mi-Kyong;Kim, Hoi-Rin
    • ETRI Journal
    • /
    • v.30 no.1
    • /
    • pp.89-100
    • /
    • 2008
  • We consider the feature recombination technique in a multiband approach to speaker identification and verification. To overcome the ineffectiveness of conventional feature recombination in broadband noisy environments, we propose a new subband feature recombination which uses subband likelihoods and a subband reliable-feature selection technique with an adaptive noise model. In the decision step of speaker recognition, a few very low unreliable feature likelihood scores can cause a speaker recognition system to make an incorrect decision. To overcome this problem, reliable-feature selection adjusts the likelihood scores of an unreliable feature by comparison with those of an adaptive noise model, which is estimated by the maximum a posteriori adaptation technique using noise features directly obtained from noisy test speech. To evaluate the effectiveness of the proposed methods in noisy environments, we use the TIMIT database and the NTIMIT database, which is the corresponding telephone version of TIMIT database. The proposed subband feature recombination with subband reliable-feature selection achieves better performance than the conventional feature recombination system with reliable-feature selection.

  • PDF

The Study on the Verification of Speaker Change using GMM-UBM based KL distance (GMM-UBM 기반 KL 거리를 활용한 화자변화 검증에 대한 연구)

  • Cho, Joon-Beom;Lee, Ji-eun;Lee, Kyong-Rok
    • Journal of Convergence Society for SMB
    • /
    • v.6 no.4
    • /
    • pp.71-77
    • /
    • 2016
  • In this paper, we proposed a verification of speaker change utilizing the KL distance based on GMM-UBM to improve the performance of conventional BIC based Speaker Change Detection(SCD). We have verified Conventional BIC-based SCD using KL-distance based SCD which is robust against difference of information volume than BIC-based SCD. And we have applied GMM-UBM to compensate asymmetric information volume. Conventional BIC-based SCD was composed of two steps. Step 1, to detect the Speaker Change Candidate Point(SCCP). SCCP is positive local maximum point of dissimilarity d. Step 2, to determine the Speaker Change Point(SCP). If ${\Delta}BIC$ of SCCP is positive, it decides to SCP. We examined verification of SCP using GMM-UBM based KL distance D. If the value of D on each SCP is higher than threshold, we accepted that point to the final SCP. In the experimental condition MDR(Missed Detection Rate) is 0, FAR(False Alarm Rate) when the threshold value of 0.028 has been improved to 60.7%.

A Study on Channel Mis-match Compensation Technique for Robust Speaker Verification System (강인한 화자확인 시스템을 위한 채널 불일치 보상 기법에 관한 연구)

  • 강철호;정희석
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.3
    • /
    • pp.228-234
    • /
    • 2004
  • In this paper, we proposed the compensation technique that overcomes the limitations of the conventional approaches through summing up the bias terms between world's codebook and individual codebook vectors of feature parameters. But, mean compensation without condition can bring higher false acceptance. Therefore, the proposed technique compensates the channel mis-match condition by weighted bias sum using nonlinear function regarding to the distortion between speech and silence. The simulation results show that the FRR (flase reject rate) is decreased 14.95% when the proposed algorithm was applied.

Frame Selection, Hybrid, Modified Weighting Model Rank Method for Robust Text-independent Speaker Identification (강건한 문맥독립 화자식별을 위한 프레임 선택방법, 복합방법, 수정된 가중모델순위 방법)

  • 김민정;오세진;정호열;정현열
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.8
    • /
    • pp.735-743
    • /
    • 2002
  • In this paper, we propose three new text-independent speaker identification methods. At first, to exclude the frames not having enough features of speaker's vocal from calculation of the maximum likelihood, we propose the FS(Frame Selection) method. This approach selects the important frames by evaluating the difference between the biggest likelihood and the second in each frame, and uses only the frames in calculating the score of likelihood. Our secondly proposed, called the Hybrid, is a combined version of the FS and WMR(Weighting Model Rank). This method determines the claimed speaker using exponential function weights, instead of likelihood itself, only on the selected frames obtained from the FS method. The last proposed, called MWMR (Modified WMR), considers both original likelihood itself and its relative position, when the claimed speaker is determined. It is different from the WMR that take into account only the relative position of likelihood. Through the experiments of the speaker identification, we show that the all the proposed have higher identification rates than the ML. In addition, the Hybrid and MWMR have higher identification rate about 2% and about 3% than WMR, respectively.

A Novel Two-Level Pitch Detection Approach for Speaker Tracking in Robot Control

  • Hejazi, Mahmoud R.;Oh, Han;Kim, Hong-Kook;Ho, Yo-Sung
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2005.06a
    • /
    • pp.89-92
    • /
    • 2005
  • Using natural speech commands for controlling a human-robot is an interesting topic in the field of robotics. In this paper, our main focus is on the verification of a speaker who gives a command to decide whether he/she is an authorized person for commanding. Among possible dynamic features of natural speech, pitch period is one of the most important ones for characterizing speech signals and it differs usually from person to person. However, current techniques of pitch detection are still not to a desired level of accuracy and robustness. When the signal is noisy or there are multiple pitch streams, the performance of most techniques degrades. In this paper, we propose a two-level approach for pitch detection which in compare with standard pitch detection algorithms, not only increases accuracy, but also makes the performance more robust to noise. In the first level of the proposed approach we discriminate voiced from unvoiced signals based on a neural classifier that utilizes cepstrum sequences of speech as an input feature set. Voiced signals are then further processed in the second level using a modified standard AMDF-based pitch detection algorithm to determine their pitch periods precisely. The experimental results show that the accuracy of the proposed system is better than those of conventional pitch detection algorithms for speech signals in clean and noisy environments.

  • PDF

A Study of Cepstrum Normalization Using World Model for Robust Speaker Verification (강인한 화자 확인 시스템을 위한 World 모델을 이용한 켑스트럼 정규화 연구)

  • Kim Yu-Jin;Chung Jae-Ho
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.55-58
    • /
    • 2000
  • 본 논문에서는 화자 확인 시스템의 등록과 확인 과정의 채널 환경 불일치로 성능이 저하되는 문제를 해결하기 위한 새로운 정규화 방법에 대해 설명한다. 제안된 방법은 첫째, 입력 음성으로부터 효과적으로 채널을 추정$\cdot$보상하고 둘째, 스코어 정규화 과정에서 사칭자 모델로서 사용되는 world모델과의 차이를 채널 추정 및 화자 모델 생성에 효과적으로 사용하는 것을 목표로 한다. 이를 위해 입력 음성의 켑스트럼과 HMM world 모델의 파라메터인 평균 켑스트럼과의 차이를 통해 음소열에 종속적인 채널 켑스트럼인 Phone-Dependent Difference Cepstrum을 추정한다. 한편 입력 음성의 음소열은 world모델의 스코어를 얻는 과정에서 함께 얻어질 수 있다. 채널 추정 실험 결과를 통해서 가장 일반적인 채널 정규화방법인 CMS에 의해 추정된 채널에 비해 실제 채널과 유사하며 화자 고유의 특성을 왜곡시키지 않는 채널 추정이 가능함을 확인할 수 있었다.

  • PDF