• Title/Summary/Keyword: Speaker normalization

Search Result 46, Processing Time 0.032 seconds

A Robust Method for Speech Replay Attack Detection

  • Lin, Lang;Wang, Rangding;Yan, Diqun;Dong, Li
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.1
    • /
    • pp.168-182
    • /
    • 2020
  • Spoofing attacks, especially replay attacks, pose great security challenges to automatic speaker verification (ASV) systems. Current works on replay attacks detection primarily focused on either developing new features or improving classifier performance, ignoring the effects of feature variability, e.g., the channel variability. In this paper, we first establish a mathematical model for replay speech and introduce a method for eliminating the negative interference of the channel. Then a novel feature is proposed to detect the replay attacks. To further boost the detection performance, four post-processing methods using normalization techniques are investigated. We evaluate our proposed method on the ASVspoof 2017 dataset. The experimental results show that our approach outperforms the competing methods in terms of detection accuracy. More interestingly, we find that the proposed normalization strategy could also improve the performance of the existing algorithms.

Voice Verification System for m-Commerce on CDMA Network

  • Kyung, Youn-Jeong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.4E
    • /
    • pp.176-182
    • /
    • 2003
  • As the needs for wireless Internet service is increasing, the needs for secure m-commerce is also increasing. Conventional security techniques are reinforced by biometric security technique. This paper utilized the voice as biometric security techniques. We developed speaker verification system for m-commerce (mobile commerce) via wireless internet and wireless application protocol (WAP). We named this system the mVprotek. We implemented the system as client-server architecture. The clients are mobile phone simulator and personal digital assistant (PDA). The verification results are obtained by integrating the mVprotek system with SK Telecom's code dimension multiple access (CDMA) system. Utilizing f-ratio weighting and virtual cohort model normalization showed much better performance than conventional background model normalization technique.

A Study on Speaker Normalization using VTN (VTN을 이용한 화자 정규화에 관한 연구)

  • 손창희;손종목;배건성
    • Proceedings of the IEEK Conference
    • /
    • 2001.09a
    • /
    • pp.499-502
    • /
    • 2001
  • 본 연구에서는 화자에 따라 서로 다른 성도의 길이에 의해 발생하는 음성인식 시스템의 성능 저하를 줄이기 위하여, VTN(Vocal Tract Normalization)을 음성인식 시스템에 적용하고, 주소 인식 실험을 통하여 인식 성능을 평가하였다. 또, VTN을 CMN과 동시에 적용하여 인식 실험을 하였다. 실험에서는 화자간 성도길이의 차이를 반영하기 위하여 13개의 Warping 계수에 대해 필터 뱅크를 이용한 선형 Warping 방법을 적용하였다. 실험결과, Baseline 인식 시스템에 비하여 VTN을 적용하면, WER(Word Error Rate)이 1.24% 감소하였고, CMN과 VTN을 동시에 적용한 실험에서는 Baseline 인식 시스템과 비교하여 WER이 0.33% 감소 하였지만 VTN을 적용한 실험결과와 비교하면 오히려 0.91% 증가하였다.

  • PDF

Modified SNR-Normalization Technique for Robust Speech Recognition

  • Jung, Hoi-In;Shim, Kab-Jong;Kim, Hyung-Soon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.3E
    • /
    • pp.14-18
    • /
    • 1997
  • One fo the major problems in speech recognition is the mismatch between training and testing environments. Recently, SNR normalization technique, which normalizes the dynamic range of frequency channels in mel-scaled filterbank, was proposed[1]. While it showed improved robustness against additive noise, it requires a reliable speech detection mechanism and several adaptation parameters to be optimized. In this paper, we propose a modified SNR normalization technique. In this technique, we take simply the maximum of filterbank output and predetermined masking constant for each frequency band. According to the speaker-independent isolated word recognition in car noise environments, proposed modification yields better recognition performance that the original SNR normalization method, with rather reduced complexity.

  • PDF

A New Method of Selecting Cohort for Speaker Verification (화자검증을 위한 새로운 코호트 선택 방법)

  • 김성준;계영철
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.5
    • /
    • pp.383-387
    • /
    • 2003
  • This paper deals with the method of speaker verification based on the conventional cohort of fixed size. In particular, a new cohort of variable size, which makes use of the distance between speaker models, is proposed: The density of neighboring speaker models within the fixed distance from each speaker is taken into account in the proposed method. The high density leads to the increase of cohort size, thus improving the speaker verification rate. On the other hand, the low density leads to its decrease, thus reducing the amount of computations. The simulation results show that the proposed method outperforms the conventional one, achieving a reduction in the EER.

On-Line Blind Channel Normalization for Noise-Robust Speech Recognition

  • Jung, Ho-Young
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.1 no.3
    • /
    • pp.143-151
    • /
    • 2012
  • A new data-driven method for the design of a blind modulation frequency filter that suppresses the slow-varying noise components is proposed. The proposed method is based on the temporal local decorrelation of the feature vector sequence, and is done on an utterance-by-utterance basis. Although the conventional modulation frequency filtering approaches the same form regardless of the task and environment conditions, the proposed method can provide an adaptive modulation frequency filter that outperforms conventional methods for each utterance. In addition, the method ultimately performs channel normalization in a feature domain with applications to log-spectral parameters. The performance was evaluated by speaker-independent isolated-word recognition experiments under additive noise environments. The proposed method achieved outstanding improvement for speech recognition in environments with significant noise and was also effective in a range of feature representations.

  • PDF

Text-dependent Speaker Recognition System Using DTW & VQ (VQ와 DTW를 이용한 문장 의존형 화자인식 시스템)

  • Jung JongSoon;Oh SeYoung;Bae MyungJin
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.97-103
    • /
    • 2001
  • The speaker recognition method using DTW algorithm has the problem that is reducing the performance of the speaker recognition system as the time variation. So there are many proposed algorithms to solve these problems. This paper proposes the new method If make the reference pattern that is acceptable to intra-speaker variation by reference pattern normalization. And to avoid reducing performance of speaker recognition system, we use the modified reference pattern to recognize the system user. The used methods in this paper are VQ and DTW. As the result of simulation we can obtain the $97.5\%$ of recognition accuracy rate.

  • PDF

A Study on the Fast Enrollment of Text-Independent Speaker Verification for Vehicle Security (차량 보안을 위한 어구독립 화자증명의 등록시간 단축에 관한 연구)

  • Lee, Tae-Seung;Choi, Ho-Jin
    • Journal of Advanced Navigation Technology
    • /
    • v.5 no.1
    • /
    • pp.1-10
    • /
    • 2001
  • Speech has a good characteristics of which car drivers busy to concern with miscellaneous operation can make use in convenient handling and manipulating of devices. By utilizing this, this works proposes a speaker verification method for protecting cars from being stolen and identifying a person trying to access critical on-line services. In this, continuant phonemes recognition which uses language information of speech and MLP(mult-layer perceptron) which has some advantages against previous stochastic methods are adopted. The recognition method, though, involves huge computation amount for learning, so it is somewhat difficult to adopt this in speaker verification application in which speakers should enroll themselves at real time. To relieve this problem, this works presents a solution that introduces speaker cohort models from speaker verification score normalization technique established before, dividing background speakers into small cohorts in advance. As a result, this enables computation burden to be reduced through classifying the enrolling speaker into one of those cohorts and going through enrollment for only that cohort.

  • PDF

New Postprocessing Methods for Rejectin Out-of-Vocabulary Words

  • Song, Myung-Gyu
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.3E
    • /
    • pp.19-23
    • /
    • 1997
  • The goal of postprocessing in automatic speech recognition is to improve recognition performance by utterance verification at the output of recognition stage. It is focused on the effective rejection of out-of vocabulary words based on the confidence score of hypothesized candidate word. We present two methods for computing confidence scores. Both methods are based on the distance between each observation vector and the representative code vector, which is defined by the most likely code vector at each state. While the first method employs simple time normalization, the second one uses a normalization technique based on the concept of on-line garbage mode[1]. According to the speaker independent isolated words recognition experiment with discrete density HMM, the second method outperforms both the first one and conventional likelihood ratio scoring method[2].

  • PDF

An Isolated Word Recognition Using the Mellin Transform (Mellin 변환을 이용한 격리 단어 인식)

  • 김진만;이상욱;고세문
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.24 no.5
    • /
    • pp.905-913
    • /
    • 1987
  • This paper presents a speaker dependent isolated digit recognition algorithm using the Mellin transform. Since the Mellin transform converts a scale information into a phase information, attempts have been made to utilize this scale invariance property of the Mellin transform in order to alleviate a time-normalization procedure required for a speech recognition. It has been found that good results can be obtained by taking the Mellin transform to the features such as a ZCR, log energy, normalized autocorrelation coefficients, first predictor coefficient and normalized prediction error. We employed a difference function for evaluating a similarity between two patterns. When the proposed algorithm was tested on Korean digit words, a recognition rate of 83.3% was obtained. The recognition accuracy is not compatible with the other technique such as LPC distance however, it is believed that the Mellin transform can effectively perform the time-normalization processing for the speech recognition.

  • PDF