Search | Korea Science

A Robust Method for Speech Replay Attack Detection

Lin, Lang;Wang, Rangding;Yan, Diqun;Dong, Li
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.14 no.1
- /
- pp.168-182
- /
- 2020
Spoofing attacks, especially replay attacks, pose great security challenges to automatic speaker verification (ASV) systems. Current works on replay attacks detection primarily focused on either developing new features or improving classifier performance, ignoring the effects of feature variability, e.g., the channel variability. In this paper, we first establish a mathematical model for replay speech and introduce a method for eliminating the negative interference of the channel. Then a novel feature is proposed to detect the replay attacks. To further boost the detection performance, four post-processing methods using normalization techniques are investigated. We evaluate our proposed method on the ASVspoof 2017 dataset. The experimental results show that our approach outperforms the competing methods in terms of detection accuracy. More interestingly, we find that the proposed normalization strategy could also improve the performance of the existing algorithms.
https://doi.org/10.3837/tiis.2020.01.010 인용 PDF KSCI HTML

The Effects of the Methods of Disguised Voice on the Aural Decision (위장 발화 방법의 차이가 청취 판단에 미치는 영향)

Song Min-Chang;Shin Jiyoung;Kang SunMee
- MALSORI
- /
- no.46
- /
- pp.25-35
- /
- 2003
This study deals with the disguised voice (or voice disguise) in the field of forensic phonetics. We especially studied the effects of the methods of disguised voice on the aural decision. Within the nonelectronic-deliberate voice disguise area, the methods of disguised voice include use of lowered pitch, pinched nostrils, falsetto, and whisper. Ten (male:5, female:5) Seoul speakers made a recording of 16 sentences. In the aural test, 30 subjects listened normal and disguised voice. And they were asked to make a decision whether speakers identified or not. The result is as follows: The speaker verification of the falsetto and whisper was more difficult than the lowered pitch and pinched nostrils.
PDF

A Study on the Vowel Fomants in Disguised Speech (위장발화의 단모음 포만트 연구)

Noh, Seok-Eun;Park, Mi-Kyoung;Cho, Min-Ha;Shin, Ji-Young;Kang, Sun-Mee
- Proceedings of the KSPS conference
- /
- 2004.05a
- /
- pp.215-218
- /
- 2004
The aim of this paper is to analyze the acoustic features for disguised voice. In this paper we examined the features such as pitch range, vowel formants(F1, F2, F3, F4). So the result of the analysis is as follows. : (1) Pitch range and average of pitch value is very important cue for speaker verification. (2) F3-F2 is also important cue for speaker verification (3) /a/ is more verified than other vowels.
PDF

Covariance Model Based on Multi-Band for Speaker Verification in Noise (잡음 환경에서 화자 확인을 위한 다중대역에 기반한 공분산 방법)

Choi Min Jung;Lee Ki Yong
- Proceedings of the Acoustical Society of Korea Conference
- /
- autumn
- /
- pp.127-130
- /
- 2004
기존의 전대역(Full-Band)에서 특징 파라미터를 추출하는 화자 확인(Speaker Verification) 시스템은 저대역이나 고대역에서 화자 정보의 특징이 제거되기 쉽다. 또한, 주파수 스펙트럼에 부분적으로 오염이 되는 경우, 특징 파라미터를 왜곡시켜 화자 확인 시스템의 성능을 저하시킨다. 본 논문에서는 이러한 문제점을 해결하기 위해 다중대역 공분산 모델(Covariance Model)을 제안한다. 제안한 방법은 주파수 영역에서 전대역을 여러 개의 부대역(Sub-Band)으로 분할하고, 부대역별로 독립적으로 특징 파라미터를 추출하여 공분산 모델을 구한다. 제안된 방법의 성능 확인을 위하여 공분산 모델 간의 거리를 측정하는 화자 확인 실험을 하였다. 잡음 환경에서 기존의 방법인 전대역에 기반한 공분산 모델과 제안한 방법을 비교 분석한 결과, 제안한 방법이 기존 방법보다 $2\%$정도 성능이 향상되었다. 또한, 제안된 방법은 전대역에 기반한 파라미터 차원 수를 다중대역의 개수로 분할하여 사용하므로 계산량의 감소와 저장 공간면에서 효율적이다.
PDF

The Hardware Implementation of Speaker Verification System Using Support Vector Machine (SVM을 이용한 화자인증 시스템 하드웨어 구현)

Hwang, Byung-Hee;Choi, Woo-Yong;Moon, Dae-Sung;Pan, Sung-Bum;Chung, Yong-Wha;Chung, Sang-Hwa
- Proceedings of the Korea Information Processing Society Conference
- /
- 2003.05c
- /
- pp.1933-1936
- /
- 2003
최근 목소리를 이용하여 사용자를 인증하는 화자인증(speaker verification)에 대한 관심이 증가하고 있으며, 다양한 화자 인증방법 중에서 SVM을 적용한 방법이 다른 알고리즘에 비해 우수한 성능을 나타내고 있다. 그러나 SVM을 이용한 화자인증 방법은 복잡한 계산으로 인해 휴대폰 등 휴대기기에서 실시간 처리에 어려움이 있다. 본 논문에서는 SVM을 이용한 화자인증 알고리즘을 실시간으로 처리하기 위한 하드웨어 구조를 제안하였고, VHDL을 이용하여 모델링 후 실험한 결과를 분석하였으며 전체 시스템 구성에 대하여 설명하였다.
PDF

Short utterance speaker verification using PLDA model adaptation and data augmentation (PLDA 모델 적응과 데이터 증강을 이용한 짧은 발화 화자검증)

Yoon, Sung-Wook;Kwon, Oh-Wook
- Phonetics and Speech Sciences
- /
- v.9 no.2
- /
- pp.85-94
- /
- 2017
Conventional speaker verification systems using time delay neural network, identity vector and probabilistic linear discriminant analysis (TDNN-Ivector-PLDA) are known to be very effective for verifying long-duration speech utterances. However, when test utterances are of short duration, duration mismatch between enrollment and test utterances significantly degrades the performance of TDNN-Ivector-PLDA systems. To compensate for the I-vector mismatch between long and short utterances, this paper proposes to use probabilistic linear discriminant analysis (PLDA) model adaptation with augmented data. A PLDA model is trained on vast amount of speech data, most of which have long duration. Then, the PLDA model is adapted with the I-vectors obtained from short-utterance data which are augmented by using vocal tract length perturbation (VTLP). In computer experiments using the NIST SRE 2008 database, the proposed method is shown to achieve significantly better performance than the conventional TDNN-Ivector-PLDA systems when there exists duration mismatch between enrollment and test utterances.
https://doi.org/10.13064/KSSS.2017.9.2.085 인용 PDF KSCI

A Study On the Disguised Voice - From a prosodic point of view - (위장발화에 대한 연구 - 운율적 특성을 중심으로 -)

Cho Minha;Nho Seogeun;Song Minkyu;Shin Jiyoung;Kang Sunmee
- Proceedings of the KSPS conference
- /
- 2003.05a
- /
- pp.191-195
- /
- 2003
The aim of this paper is to analyze the phonetic features for disguised voice. In this paper we examined the features such as phonation types, pitch range, speech rate, intonation type and boundary tones etc. So the result of the analysis is as follows. : $\circled1$ Phonation types are very important manner of disguised voice for male subjects. $\circled2$ Pitch range and average of pitch value is very important cue for speaker verification. $\circled3$ pitch contour, speech rate and boundary tones can be a secondary cue for speaker verification.
PDF

Efficient Speaker Verification in Noise Environment with Noise-added Speaker Model Composition (잡음 첨가된 화자 모델 구성에 의한 잡음 환경의 효과적인 화자확인)

안성주;강선미;고한석
- Proceedings of the Korean Information Science Society Conference
- /
- 1999.10b
- /
- pp.542-544
- /
- 1999
본 논문에서는 다수의 화자 모델을 구성함으로써 잡음에 강인한 화자확인 방법을 제안한다. Non-stationary한 잡음을 가진 입력음성의 SNR을 측정하는 것은 어렵기 때문에, 각 화자에 대해 잡음이 없을 때의 화자모델에 여러 SNR에 대한 잡음 모델을 결합시킴으로써 여러 개의 잡음 첨가된 화자 모델을 구성한다. 그리고, 화자확인에서는 이렇게 구한 각 모델에 대한 입력 음성의 likelihood를 구해 그 중 가장 큰 likelihood만을 선택한다. 이 값을 이용하여 화자확인을 수행한다. 실험 결과, 제안한 방법은 입력음성의 SNR을 모르는 잡음환경에서 일반적으로 하나의 모델을 사용하는 것보다 훨씬 좋은 성능을 보였다.
PDF

A Study for Effective Speaker Adaptation and a priori Threshold Updating in Speaker Verification (화자 인증에서의 효과적인 화자 적응과 a priori Threshold Updating에 관한 연구)

조영훈;이수호;홍대희;고한석
- Proceedings of the IEEK Conference
- /
- 2001.09a
- /
- pp.491-494
- /
- 2001
실제 화자 인증기를 설계함에 있어서 발생하는 가장큰 문제는, 적은 Enrollment data로 화자 모델이 만들어 지므로 화자 인증기의 성능이 시간이 지남에 따라 굉장히 줄어들게 되는 것과, 미리 훈련된 데이터 만으로 Threshold를 설정함에 따라 차후 실제 사용 시에 발생하는 변이를 고려하지 못하여 역시 성능 저하의 문제를 발생시킨다는 것이다. 위의 문제를 해결하기 위해 이 논문은 화자 모델을 구성하는데 있어 MAP 방법을 적용하고, threshold를 Resetting하는 방법을 적용했다. 본 논문에서 제안한 방법으로 HTER값이 23%정도 줄어듦을 보여준다.
PDF

Quantization Based Speaker Normalization for DHMM Speech Recognition System (DHMM 음성 인식 시스템을 위한 양자화 기반의 화자 정규화)

신옥근
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.4
- /
- pp.299-307
- /
- 2003
There have been many studies on speaker normalization which aims to minimize the effects of speaker's vocal tract length on the recognition performance of the speaker independent speech recognition system. In this paper, we propose a simple vector quantizer based linear warping speaker normalization method based on the observation that the vector quantizer can be successfully used for speaker verification. For this purpose, we firstly generate an optimal codebook which will be used as the basis of the speaker normalization, and then the warping factor of the unknown speaker will be extracted by comparing the feature vectors and the codebook. Finally, the extracted warping factor is used to linearly warp the Mel scale filter bank adopted in the course of MFCC calculation. To test the performance of the proposed method, a series of recognition experiments are conducted on discrete HMM with thirteen mono-syllabic Korean number utterances. The results showed that about 29% of word error rate can be reduced, and that the proposed warping factor extraction method is useful due to its simplicity compared to other line search warping methods.
PDF KSCI

Search Result 162, Processing Time 0.033 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)