Search | Korea Science

Deep neural networks for speaker verification with short speech utterances (짧은 음성을 대상으로 하는 화자 확인을 위한 심층 신경망)

Yang, IL-Ho;Heo, Hee-Soo;Yoon, Sung-Hyun;Yu, Ha-Jin
- The Journal of the Acoustical Society of Korea
- /
- v.35 no.6
- /
- pp.501-509
- /
- 2016
We propose a method to improve the robustness of speaker verification on short test utterances. The accuracy of the state-of-the-art i-vector/probabilistic linear discriminant analysis systems can be degraded when testing utterance durations are short. The proposed method compensates for utterance variations of short test feature vectors using deep neural networks. We design three different types of DNN (Deep Neural Network) structures which are trained with different target output vectors. Each DNN is trained to minimize the discrepancy between the feed-forwarded output of a given short utterance feature and its original long utterance feature. We use short 2-10 s condition of the NIST (National Institute of Standards Technology, U.S.) 2008 SRE (Speaker Recognition Evaluation) corpus to evaluate the method. The experimental results show that the proposed method reduces the minimum detection cost relative to the baseline system.
https://doi.org/10.7776/ASK.2016.35.6.501 인용 PDF KSCI

An Adaptive Utterance Verification Framework Using Minimum Verification Error Training

Shin, Sung-Hwan;Jung, Ho-Young;Juang, Biing-Hwang
- ETRI Journal
- /
- v.33 no.3
- /
- pp.423-433
- /
- 2011
This paper introduces an adaptive and integrated utterance verification (UV) framework using minimum verification error (MVE) training as a new set of solutions suitable for real applications. UV is traditionally considered an add-on procedure to automatic speech recognition (ASR) and thus treated separately from the ASR system model design. This traditional two-stage approach often fails to cope with a wide range of variations, such as a new speaker or a new environment which is not matched with the original speaker population or the original acoustic environment that the ASR system is trained on. In this paper, we propose an integrated solution to enhance the overall UV system performance in such real applications. The integration is accomplished by adapting and merging the target model for UV with the acoustic model for ASR based on the common MVE principle at each iteration in the recognition stage. The proposed iterative procedure for UV model adaptation also involves revision of the data segmentation and the decoded hypotheses. Under this new framework, remarkable enhancement in not only recognition performance, but also verification performance has been obtained.
https://doi.org/10.4218/etrij.11.0110.0489 인용 PDF KSCI

Voice Similarities between Sisters

Ko, Do-Heung
- Speech Sciences
- /
- v.8 no.3
- /
- pp.43-50
- /
- 2001
This paper deals with voice similarities between sisters who are supposed to have common physiological characteristics from a single biological mother. Nine pairs of sisters who are believed to have similar voices participated in this experiment. The speech samples obtained from one pair of sisters were eliminated in the analysis because their perceptual score was relatively low. The words were measured in both isolation and context, and the subjects were asked to read the text five times with about three seconds of interval between readings. Recordings were made at natural speed in a quiet room. The data were analyzed in pitch and formant frequencies using CSL (Computerized Speech Lab) and PCQuirer. It was found that data of the initial vowels are much more similar and homogeneous than those of vowels in other positions. The acoustic data showed that voice similarities are strikingly high in both pitch and formant frequencies. It is assumed that statistical data obtained from this experiment can be used as a guideline for modelling speaker identification and speaker verification.
PDF

Forensic Automatic Speaker Identification System for Korean Speakers (과학수사를 위한 한국인 음성 특화 자동화자식별시스템)

Kim, Kyung-Wha;So, Byung-Min;Yu, Ha-Jin
- Phonetics and Speech Sciences
- /
- v.4 no.3
- /
- pp.95-101
- /
- 2012
In this paper, we introduce the automatic speaker identification system 'SPO(Supreme Prosecutors Office) Verifier'. SPO Verifier is a GMM(Gaussian mixture model)-UBM(universal background model) based automatic speaker recognition system and has been developed using Korean speakers' utterances. This system uses a channel compensation algorithm to compensate recording device characteristics. The system can give the users the ability to manage reference models with utterances from various environments to get more accurate recognition results. To evaluate the performance of SPO Verifier on Korean speakers, we compared this system with one of the most widely used commercial systems in the forensic field. The results showed that SPO Verifier shows lower EER(equal error rate) than that of the commercial system.
https://doi.org/10.13064/KSSS.2012.4.3.095 인용 PDF

Implementation of Speaker Verification Security System Using DSP Processor(TMS320C32) (DSP Processor(TMS320C32)를 이용한 화자인증 보안시스템의 구현)

Haam, Young-Jun;Kwon, Hyuk-Jae;Choi, Soo-Young;Jeong, lk-Joo
- Journal of Industrial Technology
- /
- v.21 no.B
- /
- pp.107-116
- /
- 2001
The speech includes various kinds of information : language information, speaker's information, affectivity, hygienic condition, utterance environment etc. when a person communicates with others. All technologies to utilize in real life processing this speech are called the speech technology. The speech technology contains speaker's information that among them and it includes a speech which is known as a speaker recognition. DTW(Dynamic Time Warping) is the speaker recognition technology that seeks the pattern of standard speech signal and the similarity degree in an inputted speech signal using dynamic programming. ln this study, using TMS320C32 DSP processor, we are to embody this DTW and to construct a security system.
PDF

On-Line Linear Combination of Classifiers Based on Incremental Information in Speaker Verification

Huenupan, Fernando;Yoma, Nestor Becerra;Garreton, Claudio;Molina, Carlos
- ETRI Journal
- /
- v.32 no.3
- /
- pp.395-405
- /
- 2010
A novel multiclassifier system (MCS) strategy is proposed and applied to a text-dependent speaker verification task. The presented scheme optimizes the linear combination of classifiers on an on-line basis. In contrast to ordinary MCS approaches, neither a priori distributions nor pre-tuned parameters are required. The idea is to improve the most accurate classifier by making use of the incremental information provided by the second classifier. The on-line multiclassifier optimization approach is applicable to any pattern recognition problem. The proposed method needs neither a priori distributions nor pre-estimated weights, and does not make use of any consideration about training/testing matching conditions. Results with Yoho database show that the presented approach can lead to reductions in equal error rate as high as 28%, when compared with the most accurate classifier, and 11% against a standard method for the optimization of linear combination of classifiers.
https://doi.org/10.4218/etrij.10.0109.0301 인용 PDF KSCI

Fast Sequential Probability Ratio Test Method to Obtain Consistent Results in Speaker Verification (화자확인에서 일정한 결과를 얻기 위한 빠른 순시 확률비 테스트 방법)

Kim, Eun-Young;Seo, Chang-Woo;Jeon, Sung-Chae
- Phonetics and Speech Sciences
- /
- v.2 no.2
- /
- pp.63-68
- /
- 2010
A new version of sequential probability ratio test (SPRT) which has been investigated in utterance-length control is proposed to obtain uniform response results in speaker verification (SV). Although SPRTs can obtain fast responses in SV tests, differences in the performance may occur depending on the compositions of consonants and vowels in the sentences used. In this paper, a fast sequential probability ratio test (FSPRT) method that shows consistent performances at all times regardless of the compositions of vocalized sentences for SV will be proposed. In generating frames, the FSPRT will first conduct SV test processes with only generated frames without any overlapping and if the results do not satisfy discrimination criteria, the FSPRT will sequentially use frames applied with overlapping. With the progress of processes as such, the test will not be affected by the compositions of sentences for SV and thus fast response outcomes and even consistent performances can be obtained. Experimental results show that the FSPRT has better performance to the SPRT method while requiring less complexity with equal error rates (EER).
PDF

A Study on the Mixed Model Approach and Symbol Probability Weighting Function for Maximization of Inter-Speaker Variation (화자간 변별력 최대화를 위한 혼합 모델 방식과 심볼 확률 가중함수에 관한 연구)

Chin Se-Hoon;Kang Chul-Ho
- The Journal of the Acoustical Society of Korea
- /
- v.24 no.7
- /
- pp.410-415
- /
- 2005
Recently, most of the speaker verification systems are based on the pattern recognition approach method. And performance of the pattern-classifier depends on how to classify a variety of speakers' feature parameters. In order to classify feature parameters efficiently and effectively, it is of great importance to enlarge variations between speakers and effectively measure distances between feature parameters. Therefore, this paper would suggest the positively mixed model scheme that can enlarge inter-speaker variation by searching the individual model with world model at the same time. During decision procedure, we can maximize inter-speaker variation by using the proposed mixed model scheme. We also make use of a symbol probability weighting function in this system so as to reduce vector quantization errors by measuring symbol probability derived from the distance rate of between the world codebook and individual codebook. As the result of our experiment using this method, we could halve the Detection Cost Function (DCF) of the system from $2.37\%\;to\;1.16\%$.
PDF KSCI

Authentication Performance Optimization for Smart-phone based Multimodal Biometrics (스마트폰 환경의 인증 성능 최적화를 위한 다중 생체인식 융합 기법 연구)

Moon, Hyeon-Joon;Lee, Min-Hyung;Jeong, Kang-Hun
- Journal of Digital Convergence
- /
- v.13 no.6
- /
- pp.151-156
- /
- 2015
In this paper, we have proposed personal multimodal biometric authentication system based on face detection, recognition and speaker verification for smart-phone environment. Proposed system detect the face with Modified Census Transform algorithm then find the eye position in the face by using gabor filter and k-means algorithm. Perform preprocessing on the detected face and eye position, then we recognize with Linear Discriminant Analysis algorithm. Afterward in speaker verification process, we extract the feature from the end point of the speech data and Mel Frequency Cepstral Coefficient. We verified the speaker through Dynamic Time Warping algorithm because the speech feature changes in real-time. The proposed multimodal biometric system is to fuse the face and speech feature (to optimize the internal operation by integer representation) for smart-phone based real-time face detection, recognition and speaker verification. As mentioned the multimodal biometric system could form the reliable system by estimating the reasonable performance.
https://doi.org/10.14400/JDC.2015.13.6.151 인용 PDF KSCI

SoC Design of Speaker Connection System by Efficient Cosimulation (효율적인 통합시뮬레이션에 의한 스피커 연결 시스템의 SoC 설계)

Song, Moon-Vin;Song, The-Hoon;Oh, Chae-Gon;Chung, Yun-Mo
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.43 no.10 s.352
- /
- pp.68-73
- /
- 2006
This, paper proposes a cosimulation methodology that results in an efficient SoC design as well as fast verification by integrating HDL, SystemC, and algorithm-level abstraction using the design tools Active-HDL and Matlab's Simulink. To demonstrate the proposed design methodology, we implemented the design technique on a serial connection multi-channel speaker system. We have demonstrated the proposed cosimulation method utilizing an ARM processor based SoC Master board with the AMBA bus interface and a Xilinx Vertex4 FPGA. The proposed method has the advantage of simultaneous simulation verification of both software and hardware parts in high levels of abstraction mixed with some performance critical parts in more concrete RTL codes. This allows relatively fast and easy design of a speaker connection system which typically requires significant amount of data processing for verification.
PDF KSCI

Search Result 162, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)