• Title/Summary/Keyword: Speaker Recognition

Search Result 554, Processing Time 0.023 seconds

Effective Recognition of Velopharyngeal Insufficiency (VPI) Patient's Speech Using Simulated Speech Model (모의 음성 모델을 이용한 효과적인 구개인두부전증 환자 음성 인식)

  • Sung, Mee Young;Kwon, Tack-Kyun;Sung, Myung-Whun;Kim, Wooil
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.5
    • /
    • pp.1243-1250
    • /
    • 2015
  • This paper presents an effective recognition method of VPI patient's speech for a VPI speech reconstruction system. Speaker adaptation technique is employed to improve VPI speech recognition. This paper proposes to use simulated speech for generating an initial model for speaker adaptation, in order to effectively utilize the small size of VPI speech for model adaptation. We obtain 83.60% in average word accuracy by applying MLLR for speaker adaptation. The proposed speaker adaptation method using simulated speech model brings 6.38% improvement in average accuracy. The experimental results demonstrate that the proposed speaker adaptation method is highly effective for developing recognition system of VPI speech which is not suitable for constructing large-size speech database.

Speaker Recognition Using Dynamic Time Variation fo Orthogonal Parameters (직교인자의 동적 특성을 이용한 화자인식)

  • 배철수
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.17 no.9
    • /
    • pp.993-1000
    • /
    • 1992
  • Recently, many researchers have found that the speaker recognition rate is high when they perform the speaker recognition using statistical processing method of orthogonal parameter, which are derived from the analysis of speech signal and contain much of the speaker's identity. This method, however, has problems caused by vocalization speed or time varying feature of speed. Thus, to solve these problems, this paper proposes two methods of speaker recognition which combine DTW algorithm with the method using orthogonal parameters extracted from $Karthumem-Lo\'{e}ve$ Transform method which applies orthogonal parameters as feature vector to ETW algorithm and the other is the method which applies orthogonal parameters to the optimal path. In addition, we compare speaker recognition rate obtained from the proposed two method with that from the conventional method of statistical process of orthogonal parameters. Orthogonal parameters used in this paper are derived from both linear prediction coefficients and partial correlation coefficients of speech signal.

  • PDF

On a Performance Improvement of Speaker Recognition by using the Auditory Characteristics of Speech (음성의 청각특성을 이용한 화자식별시스템의 성능향상에 관한 연구)

  • 이윤주;오세영배재옥배명진
    • Proceedings of the IEEK Conference
    • /
    • 1998.10a
    • /
    • pp.1223-1226
    • /
    • 1998
  • The pre-emephasis filter as the conventional method emphasizes all components of high frequency that reflects the speaker characteristics. However this filter don't show the auditory characteristics of speaker's speech. In order to emphasize the perceptual characteristics, we propose the speaker recognition system that uses the perceptual weighting as the preprocessor because the Auditory characteristic of human is sensitive to the formant peaks. This filter has the characteristcs that both deemphasizes the low-formants and emphasizes the high formants. As a result of the proposed method, we improve the total recognition rate 1.7% better than the conventional method.

  • PDF

New Data Extraction Method using the Difference in Speaker Recognition (화자인식에서 차분을 이용한 새로운 데이터 추출 방법)

  • Seo, Chang-Woo;Ko, Hee-Ae;Lim, Yong-Hwan;Choi, Min-Jung;Lee, Youn-Jeong
    • Speech Sciences
    • /
    • v.15 no.3
    • /
    • pp.7-15
    • /
    • 2008
  • This paper proposes the method to extract new feature vectors using the difference between the cepstrum for static characteristics and delta cepstrum for dynamic characteristics in speaker recognition (SR). The difference vector (DV) which it proposes from this paper is containing the static and the dynamic characteristics simultaneously at the intermediate characteristic vector which uses the deference between the static and the dynamic characteristics and as the characteristic vector which is new there is a possibility of doing. Compared to the conventional method, the proposed method can achieve new feature vector without increasing of new parameter, but only need the calculation process for the difference between the cepstrum and delta cepstrum. Experimental results show that the proposed method has a good performance more than 2.03%, on average, compared with conventional method in speaker identification (SI).

  • PDF

Automatic Log-in System by the Speaker Certification

  • Sohn, Young-Sun
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.4 no.2
    • /
    • pp.176-181
    • /
    • 2004
  • This paper introduces a Web site login system that uses user's native voice to improve the bother of remembering the ID and password in order to login the Web site. The DTW method that applies fuzzy inference is used as the speaker recognition algorithm. We get the ACC(Average Cepstrum Coefficient) membership function by each degree, by using the LPC that models the vocal chords, to block the recorded voice that is problem for the speaker recognition. We infer the existence of the recorded voice by setting on the basis of the number of zeros that is the value of the ACC membership function, and on the basis of the average value of the ACC membership function. We experiment the six Web sites for the six subjects and get the result that protects the recorded voice about 98% that is recorded by the digital recorder.

Speech Signal Processing for Analysis of Chaos Pattern (카오스 패턴 발견을 위한 음성 데이터의 처리 기법)

  • Kim, Tae-Sik
    • Speech Sciences
    • /
    • v.8 no.3
    • /
    • pp.149-157
    • /
    • 2001
  • Based on the chaos theory, a new method of presentation of speech signal has been presented in this paper. This new method can be used for pattern matching such as speaker recognition. The expressions of attractors are represented very well by the logistic maps that show the chaos phenomena. In the speaker recognition field, a speaker's vocal habit could be a very important matching parameter. The attractor configuration using change value of speech signal can be utilized to analyze the influence of voice undulations at a point on the vocal loudness scale to the next point. The attractors arranged by the method could be used in research fields of speech recognition because the attractors also contain unique information for each speaker.

  • PDF

A study on the spoken digit recognition performance of the Two-Stage recurrent neural network (2단 회귀신경망의 숫자음 인식에관한 연구)

  • 안점영
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.25 no.3B
    • /
    • pp.565-569
    • /
    • 2000
  • We compose the two-stage recurrent neural network that returns both signals of a hidden and an output layer to the hidden layer. It is tested on the basis of syllables for Korean spoken digit from /gong/to /gu. For these experiments, we adjust the neuron number of the hidden layer, the predictive order of input data and self-recurrent coefficient of the decision state layer. By the experimental results, the recognition rate of this neural network is between 91% and 97.5% in the speaker-dependent case and between 80.75% and 92% in the speaker-independent case. In the speaker-dependent case, this network shows an equivalent recognition performance to Jordan and Elman network but in the speaker-independent case, it does improved performance.

  • PDF

A Study on Background Speaker Selection Method in Speaker Verification System (화자인증 시스템에서 선정 방법에 관한 연구)

  • Choi, Hong-Sub
    • Speech Sciences
    • /
    • v.9 no.2
    • /
    • pp.135-146
    • /
    • 2002
  • Generally a speaker verification system improves its system recognition ratio by regularizing log likelihood ratio, using a speaker model and its background speaker model that are required to be verified. The speaker-based cohort method is one of the methods that are widely used for selecting background speaker model. Recently, Gaussian-based cohort model has been suggested as a virtually synthesized cohort model, and unlike a speaker-based model, this is the method that chooses only the probability distributions close to basic speaker's probability distribution among the several neighboring speakers' probability distributions and thereby synthesizes a new virtual speaker model. It shows more excellent results than the existing speaker-based method. This study compared the existing speaker-based background speaker models and virtual speaker models and then constructed new virtual background speaker model groups which combined them in a certain ratio. For this, this study constructed a speaker verification system that uses GMM (Gaussin Mixture Model), and found that the suggested method of selecting virtual background speaker model shows more improved performance.

  • PDF

Vector Quantizer Based Speaker Normalization for Continuos Speech Recognition (연속음성 인식기를 위한 벡터양자화기 기반의 화자정규화)

  • Shin Ok-keun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.8
    • /
    • pp.583-589
    • /
    • 2004
  • Proposed is a speaker normalization method based on vector quantizer for continuous speech recognition (CSR) system in which no acoustic information is made use of. The proposed method, which is an improvement of the previously reported speaker normalization scheme for a simple digit recognizer, builds up a canonical codebook by iteratively training the codebook while the size of codebook is increased after each iteration from a relatively small initial size. Once the codebook established, the warp factors of speakers are estimated by comparing exhaustively the warped versions of each speaker's utterance with the codebook. Two sets of phones are used to estimate the warp factors: one, a set of vowels only. and the other, a set composed of all the Phonemes. A Piecewise linear warping function which corresponds to the estimated warp factor is adopted to warp the power spectrum of the utterance. Then the warped feature vectors are extracted to be used to train and to test the speech recognizer. The effectiveness of the proposed method is investigated by a set of recognition experiments using the TIMIT corpus and HTK speech recognition tool kit. The experimental results showed comparable recognition rate improvement with the formant based warping method.

A study on the Method of the Keyword Spotting Recognition in the Continuous speech using Neural Network (신경 회로망을 이용한 연속 음성에서의 keyword spotting 인식 방식에 관한 연구)

  • Yang, Jin-Woo;Kim, Soon-Hyob
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.4
    • /
    • pp.43-49
    • /
    • 1996
  • This research proposes a system for speaker independent Korean continuous speech recognition with 247 DDD area names using keyword spotting technique. The applied recognition algorithm is the Dynamic Programming Neural Network(DPNN) based on the integration of DP and multi-layer perceptron as model that solves time axis distortion and spectral pattern variation in the speech. To improve performance, we classify word model into keyword model and non-keyword model. We make an experiment on postprocessing procedure for the evaluation of system performance. Experiment results are as follows. The recognition rate of the isolated word is 93.45% in speaker dependent case. The recognition rate of the isolated word is 84.05% in speaker independent case. The recognition rate of simple dialogic sentence in keyword spotting experiment is 77.34% as speaker dependent, and 70.63% as speaker independent.

  • PDF