Search | Korea Science

Group-based speaker embeddings for text-independent speaker verification (문장 독립 화자 검증을 위한 그룹기반 화자 임베딩)

Jung, Youngmoon;Eom, Youngsik;Lee, Yeonghyeon;Kim, Hoirin
- The Journal of the Acoustical Society of Korea
- /
- v.40 no.5
- /
- pp.496-502
- /
- 2021
Recently, deep speaker embedding approach has been widely used in text-independent speaker verification, which shows better performance than the traditional i-vector approach. In this work, to improve the deep speaker embedding approach, we propose a novel method called group-based speaker embedding which incorporates group information. We cluster all speakers of the training data into a predefined number of groups in an unsupervised manner, so that a fixed-length group embedding represents the corresponding group. A Group Decision Network (GDN) produces a group weight, and an aggregated group embedding is generated from the weighted sum of the group embeddings and the group weights. Finally, we generate a group-based embedding by adding the aggregated group embedding to the deep speaker embedding. In this way, a speaker embedding can reduce the search space of the speaker identity by incorporating group information, and thereby can flexibly represent a significant number of speakers. We conducted experiments using the VoxCeleb1 database to show that our proposed approach can improve the previous approaches.
https://doi.org/10.7776/ASK.2021.40.5.496 인용 PDF KSCI

A Blind Segmentation Algorithm for Speaker Verification System (화자확인 시스템을 위한 분절 알고리즘)

김지운;김유진;민홍기;정재호
- The Journal of the Acoustical Society of Korea
- /
- v.19 no.3
- /
- pp.45-50
- /
- 2000
This paper proposes a delta energy method based on Parameter Filtering(PF), which is a speech segmentation algorithm for text dependent speaker verification system over telephone line. Our parametric filter bank adopts a variable bandwidth along with a fixed center frequency. Comparing with other methods, the proposed method turns out very robust to channel noise and background noise. Using this method, we segment an utterance into consecutive subword units, and make models using each subword nit. In terms of EER, the speaker verification system based on whole word model represents 6.1%, whereas the speaker verification system based on subword model represents 4.0%, improving about 2% in EER.
PDF

Speaker Verification Model Using Short-Time Fourier Transform and Recurrent Neural Network (STFT와 RNN을 활용한 화자 인증 모델)

Kim, Min-seo;Moon, Jong-sub
- Journal of the Korea Institute of Information Security & Cryptology
- /
- v.29 no.6
- /
- pp.1393-1401
- /
- 2019
Recently as voice authentication function is installed in the system, it is becoming more important to accurately authenticate speakers. Accordingly, a model for verifying speakers in various ways has been suggested. In this paper, we propose a new method for verifying speaker verification using a Short-time Fourier Transform(STFT). Unlike the existing Mel-Frequency Cepstrum Coefficients(MFCC) extraction method, we used window function with overlap parameter of around 66.1%. In this case, the speech characteristics of the speaker with the temporal characteristics are studied using a deep running model called RNN (Recurrent Neural Network) with LSTM cell. The accuracy of proposed model is around 92.8% and approximately 5.5% higher than that of the existing speaker certification model.
https://doi.org/10.13089/JKIISC.2019.29.6.1393 인용 PDF KSCI HTML

Improving Speaker Enrolling Speed for Speaker Verification Systems Based on Multilayer Perceptrons by Using a Qualitative Background Speaker Selection (정질적 기준을 이용한 다층신경망 기반 화자증명 시스템의 등록속도 단축방법)

이태승;황병원
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.5
- /
- pp.360-366
- /
- 2003
Although multilayer perceptrons (MLPs) present several advantages against other pattern recognition methods, MLP-based speaker verification systems suffer from slow enrollment speed caused by many background speakers to achieve a low verification error. To solve this problem, the quantitative discriminative cohort speakers (QnDCS) method, by introducing the cohort speakers method into the systems, reduced the number of background speakers required to enroll speakers. Although the QnDCS achieved the goal to some extent, the improvement rate for the enrolling speed was still unsatisfactory. To improve the enrolling speed, this paper proposes the qualitative DCS (QlDCS) by introducing a qualitative criterion to select less background speakers. An experiment for both methods is conducted to use the speaker verification system based on MLPs and continuants, and speech database. The results of the experiment show that the proposed QlDCS method enrolls speakers in two times shorter time than the QnDCS does over the online error backpropagation(EBP) method.
PDF KSCI

Impostor Detection in Speaker Recognition Using Confusion-Based Confidence Measures

Kim, Kyu-Hong;Kim, Hoi-Rin;Hahn, Min-Soo
- ETRI Journal
- /
- v.28 no.6
- /
- pp.811-814
- /
- 2006
In this letter, we introduce confusion-based confidence measures for detecting an impostor in speaker recognition, which does not require an alternative hypothesis. Most traditional speaker verification methods are based on a hypothesis test, and their performance depends on the robustness of an alternative hypothesis. Compared with the conventional Gaussian mixture model-universal background model (GMM-UBM) scheme, our confusion-based measures show better performance in noise-corrupted speech. The additional computational requirements for our methods are negligible when used to detect or reject impostors.
PDF

Segment unit shuffling layer in deep neural networks for text-independent speaker verification (문장 독립 화자 인증을 위한 세그멘트 단위 혼합 계층 심층신경망)

Heo, Jungwoo;Shim, Hye-jin;Kim, Ju-ho;Yu, Ha-Jin
- The Journal of the Acoustical Society of Korea
- /
- v.40 no.2
- /
- pp.148-154
- /
- 2021
Text-Independent speaker verification needs to extract text-independent speaker embedding to improve generalization performance. However, deep neural networks that depend on training data have the potential to overfit text information instead of learning the speaker information when repeatedly learning from the identical time series. In this paper, to prevent the overfitting, we propose a segment unit shuffling layer that divides and rearranges the input layer or a hidden layer along the time axis, thus mixes the time series information. Since the segment unit shuffling layer can be applied not only to the input layer but also to the hidden layers, it can be used as generalization technique in the hidden layer, which is known to be effective compared to the generalization technique in the input layer, and can be applied simultaneously with data augmentation. In addition, the degree of distortion can be adjusted by adjusting the unit size of the segment. We observe that the performance of text-independent speaker verification is improved compared to the baseline when the proposed segment unit shuffling layer is applied.
https://doi.org/10.7776/ASK.2021.40.2.148 인용 PDF KSCI

Speaker Verification System Using Continuants and Multilayer Perceptrons (지속음 및 다층신경망을 이용한 화자증명 시스템)

Lee, Tae-Seung;Park, Sung-Won;Hwang, Byong-Won
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2003.10a
- /
- pp.1015-1020
- /
- 2003
Among the techniques to protect private information by adopting biometrics, speaker verification is expected to be widely used due to advantages in convenient usage and implementation cost. Speaker verification should achieve a high degree of the reliability in the verification score, the flexibility in speech text usage, and the efficiency in verification system complexity. Continuants have excellent speaker-discriminant power and the modest number of phonemes in the category, and multilayer perceptrons (MLPs) have superior recognition ability and fast operation speed. In consequence, the two provide viable ways for speaker verification system to obtain the above properties. This paper implements a system to which continuants and MLPs are applied, and evaluates the system using a Korean speech database. The results of the experiment prove that continuants and MLPs enable the system to acquire the three properties.
PDF

Speaker Verification System Using Support Vector Machine with Genetic Algorithms (유전자 알고리즘을 결합한 Support Vector Machine의 화자인증에서의 성능분석)

최우용;이경희;반성범
- Proceedings of the IEEK Conference
- /
- 2003.11a
- /
- pp.557-560
- /
- 2003
Voice is one of the promising biometrics because it is one of the most convenient ways human would distinguish someone from others. The target of speaker verification is to divide the client from imposters. Support Vector Machine(SVM) is in the limelight as a binary classifier, so it can work well in speaker verification. In this paper, we combined SVM with genetic algorithm(GA) to reduce the dimensionality of input feature. Experiments were conducted with Korean connected digit database using different feature dimensions. The verification accuracy of SVM with GA is slightly lower than that of SVM, but the proposed algorithm has greater strength in the memory limited systems.
PDF

On a Method Which Improves Text Independent Speaker Verification Performance through Limiting Speech Production Loudness (성량제한을 적용한 어구독립 화자증명 성능향상 방안)

이태승;최호진
- Proceedings of the Korean Information Science Society Conference
- /
- 2001.10b
- /
- pp.457-459
- /
- 2001
지속음(continuants) 단위로 화자간 차이를 식별하는 어구독립 화자증명(text-independent speaker verification) 방식에서 입력음성의 성량을 제한하여 보다 높은 인식률을 달성할 수 있는 화자인식 방법을 제안한다.
PDF

A Phase-related Feature Extraction Method for Robust Speaker Verification (열악한 환경에 강인한 화자인증을 위한 위상 기반 특징 추출 기법)

Kwon, Chul-Hong
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.14 no.3
- /
- pp.613-620
- /
- 2010
Additive noise and channel distortion strongly degrade the performance of speaker verification systems, as it introduces distortion of the features of speech. This distortion causes a mismatch between the training and recognition conditions such that acoustic models trained with clean speech do not model noisy and channel distorted speech accurately. This paper presents a phase-related feature extraction method in order to improve the robustness of the speaker verification systems. The instantaneous frequency is computed from the phase of speech signals and features from the histogram of the instantaneous frequency are obtained. Experimental results show that the proposed technique offers significant improvements over the standard techniques in both clean and adverse testing environments.
https://doi.org/10.6109/jkiice.2010.14.3.613 인용 PDF KSCI

Search Result 162, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)