• Title/Summary/Keyword: speaker

Search Result 1,679, Processing Time 0.028 seconds

A Study on the Improvement of DTW with Speech Silence Detection (음성의 묵음구간 검출을 통한 DTW의 성능개선에 관한 연구)

  • Kim, Jong-Kuk;Jo, Wang-Rae;Bae, Myung-Jin
    • Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.117-124
    • /
    • 2003
  • Speaker recognition is the technology that confirms the identification of speaker by using the characteristic of speech. Such technique is classified into speaker identification and speaker verification: The first method discriminates the speaker from the preregistered group and recognize the word, the second verifies the speaker who claims the identification. This method that extracts the information of speaker from the speech and confirms the individual identification becomes one of the most efficient technology as the service via telephone network is popularized. Some problems, however, must be solved for the real application as follows; The first thing is concerning that the safe method is necessary to reject the imposter because the recognition is not performed for the only preregistered customer. The second thing is about the fact that the characteristic of speech is changed as time goes by, So this fact causes the severe degradation of recognition rate and the inconvenience of users as the number of times to utter the text increases. The last thing is relating to the fact that the common characteristic among speakers causes the wrong recognition result. The silence parts being included the center of speech cause that identification rate is decreased. In this paper, to make improvement, We proposed identification rate can be improved by removing silence part before processing identification algorithm. The methods detecting speech area are zero crossing rate, energy of signal detect end point and starting point of the speech and process DTW algorithm by using two methods in this paper. As a result, the proposed method is obtained about 3% of improved recognition rate compare with the conventional methods.

  • PDF

Sequential Speaker Classification Using Quantized Generic Speaker Models (양자화 된 범용 화자모델을 이용한 연속적 화자분류)

  • Kwon, Soon-Il
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.44 no.1
    • /
    • pp.26-32
    • /
    • 2007
  • In sequential speaker classification, the lack of prior information about the speakers poses a challenge for model initialization. To address the challenge, a predetermined generic model set, called Sample Speaker Models, was previously proposed. This approach can be useful for accurate speaker modeling without requiring initial speaker data. However, an optimal method for sampling the models from a generic model pool is still required. To solve this problem, the Speaker Quantization method, motivated by vector quantization, is proposed. Experimental results showed that the new approach outperformed the random sampling approach with 25% relative improvement in error rate on switchboard telephone conversations.

SVM Based Speaker Verification Using Sparse Maximum A Posteriori Adaptation

  • Kim, Younggwan;Roh, Jaeyoung;Kim, Hoirin
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.2 no.5
    • /
    • pp.277-281
    • /
    • 2013
  • Modern speaker verification systems based on support vector machines (SVMs) use Gaussian mixture model (GMM) supervectors as their input feature vectors, and the maximum a posteriori (MAP) adaptation is a conventional method for generating speaker-dependent GMMs by adapting a universal background model (UBM). MAP adaptation requires the appropriate amount of input utterance due to the number of model parameters to be estimated. On the other hand, with limited utterances, unreliable MAP adaptation can be performed, which causes adaptation noise even though the Bayesian priors used in the MAP adaptation smooth the movements between the UBM and speaker dependent GMMs. This paper proposes a sparse MAP adaptation method, which is known to perform well in the automatic speech recognition area. By introducing sparse MAP adaptation to the GMM-SVM-based speaker verification system, the adaptation noise can be mitigated effectively. The proposed method utilizes the L0 norm as a regularizer to induce sparsity. The experimental results on the TIMIT database showed that the sparse MAP-based GMM-SVM speaker verification system yields a 42.6% relative reduction in the equal error rate with few additional computations.

  • PDF

An Amplitude Warping Approach to Intra-Speaker Normalization for Speech Recognition (음성인식에서 화자 내 정규화를 위한 진폭 변경 방법)

  • Kim Dong-Hyun;Hong Kwang-Seok
    • Journal of Internet Computing and Services
    • /
    • v.4 no.3
    • /
    • pp.9-14
    • /
    • 2003
  • The method of vocal tract normalization is a successful method for improving the accuracy of inter-speaker normalization. In this paper, we present an intra-speaker warping factor estimation based on pitch alteration utterance. The feature space distributions of untransformed speech from the pitch alteration utterance of intra-speaker would vary due to the acoustic differences of speech produced by glottis and vocal tract. The variation of utterance is two types: frequency and amplitude variation. The vocal tract normalization is frequency normalization among inter-speaker normalization methods. Therefore, we have to consider amplitude variation, and it may be possible to determine the amplitude warping factor by calculating the inverse ratio of input to reference pitch. k, the recognition results, the error rate is reduced from 0.4% to 2.3% for digit and word decoding.

  • PDF

A Study on Speaker Identification Parameter Using Difference and Correlation Coeffieicent of Digit_sound Spectrum (숫자음의 스펙트럼 차이값과 상관계수를 이용한 화자인증 파라미터 연구)

  • Lee, Hoo-Dong;Kang, Sun-Mee;Chang, Moon-Soo;Yang, Byung-Gon
    • Speech Sciences
    • /
    • v.11 no.3
    • /
    • pp.131-142
    • /
    • 2004
  • Speaker identification system basically functions by comparing spectral energy of an individual production model with that of an input signal. This study aimed to develop a new speaker identification system from two parameters from the spectral energy of numeric sounds: difference sum and correlation coefficient. A narrow-band spectrogram yielded more stable spectral energy across time than a wide-band one. In this paper, we collected empirical data from four male speakers and tested the speaker identification system. The subjects produced 18 combinations of three-digit numeric. sounds !en times each. Five productions of each three-digit number were statistically averaged to make a model for each speaker. Then, the remaining five productions were tested on the system. Results showed that when the threshold for the absolute difference sum was set to 1200, all the speakers could not pass the system while everybody could pass if set to 2800. The minimum correlation coefficient to allow all to pass was 0.82 while the coefficient of 0.95 rejected all. Thus, both threshold levels can be adjusted to the need of speaker identification system, which is desirable for further study.

  • PDF

A Noble Decoding Algorithm Using MLLR Adaptation for Speaker Verification (MLLR 화자적응 기법을 이용한 새로운 화자확인 디코딩 알고리듬)

  • 김강열;김지운;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.2
    • /
    • pp.190-198
    • /
    • 2002
  • In general, we have used the Viterbi algorithm of Speech recognition for decoding. But a decoder in speaker verification has to recognize same word of every speaker differently. In this paper, we propose a noble decoding algorithm that could replace the typical Viterbi algorithm for the speaker verification system. We utilize for the proposed algorithm the speaker adaptation algorithms that transform feature vectors into the region of the client' characteristics in the speech recognition. There are many adaptation algorithms, but we take MLLR (Maximum Likelihood Linear Regression) and MAP (Maximum A-Posterior) adaptation algorithms for proposed algorithm. We could achieve improvement of performance about 30% of EER (Equal Error Rate) using proposed algorithm instead of the typical Viterbi algorithm.

Text-independent Speaker Identification by Bagging VQ Classifier

  • Kyung, Youn-Jeong;Park, Bong-Dae;Lee, Hwang-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.2E
    • /
    • pp.17-24
    • /
    • 2001
  • In this paper, we propose the bootstrap and aggregating (bagging) vector quantization (VQ) classifier to improve the performance of the text-independent speaker recognition system. This method generates multiple training data sets by resampling the original training data set, constructs the corresponding VQ classifiers, and then integrates the multiple VQ classifiers into a single classifier by voting. The bagging method has been proven to greatly improve the performance of unstable classifiers. Through two different experiments, this paper shows that the VQ classifier is unstable. In one of these experiments, the bias and variance of a VQ classifier are computed with a waveform database. The variance of the VQ classifier is compared with that of the classification and regression tree (CART) classifier[1]. The variance of the VQ classifier is shown to be as large as that of the CART classifier. The other experiment involves speaker recognition. The speaker recognition rates vary significantly by the minor changes in the training data set. The speaker recognition experiments involving a closed set, text-independent and speaker identification are performed with the TIMIT database to compare the performance of the bagging VQ classifier with that of the conventional VQ classifier. The bagging VQ classifier yields improved performance over the conventional VQ classifier. It also outperforms the conventional VQ classifier in small training data set problems.

  • PDF

Design and Performance Characteristics of a Broadband Underwater Speaker System (광대역 수중 스피커 시스템의 설계 및 성능 특성)

  • Lee, Dae-Jae
    • Korean Journal of Fisheries and Aquatic Sciences
    • /
    • v.44 no.5
    • /
    • pp.543-549
    • /
    • 2011
  • An underwater speaker was developed for use as an acoustic deterrent device that transmits acoustic energy through the water omnidirectionally over a broadband frequency range to eliminate marine mammal attacks and to prevent physical damage to the inshore and coastal fishing grounds of Korea. The underwater speaker was constructed of two vibration caps machined from 6061-T6 aluminum alloy and a stack of PZ 26 piezoelectric ceramic rings (Ferroperm Piezoceramics A/S) connected mechanically in series and electrically in parallel. The performance characteristics of the underwater speaker were measured and analyzed in an experimental water tank of $5\;m{\times}5\;m{\times}6\;m$. The peak transmitting voltage response (TVR) was measured at 11.16 kHz with 163.45 dB re $1\;{\mu}Pa$/V at 1m. The underwater speaker showed a near omnidirectional beam pattern at the peak TVR resonance frequency. The usable frequency range was 4-25 kHz with a lower TVR limit of approximately 140 dB. We conclude that this underwater speaker could be satisfactorily used as an acoustic deterrent device against marine mammals, particularly the bottlenose dolphin, to protect catches and fishing grounds as well as the mammals themselves, for example, by keeping them away from fishing gear and/or vessels.

Speaker Verification Performance Improvement Using Weighted Residual Cepstrum (가중된 예측 오차 파라미터를 사용한 화자 확인 성능 개선)

  • 위진우;강철호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.5
    • /
    • pp.48-53
    • /
    • 2001
  • In speaker verification based on LPC analysis the prediction residues are ignored and LPCC(LPC cepstrum) are only used to compose feature vectors. In this study, LPCC and RCEP (residual cepstrum) extracted from residues are used as feature parameters in the various environmental speaker verification. We propose the weighting function which can enlarge inter-speaker variation by weighting pitch, speaker inherent vector, included in residual cepstrum. Simulation results show that the average speaker verification rate is improved in the rate of 6% with RCEP and LPCC at the same time and is improved in the rate of 2.45% with the proposed weighted RCEP and LPCC at the same time compared with no weighting.

  • PDF

Real Time Speaker Close-Up System using The Lip Motion Informations (입술 움직임 정보를 이용한 실시간 화자 클로즈업 시스템 구현)

  • 권혁봉;장언동;윤태승;안재형
    • Journal of Korea Multimedia Society
    • /
    • v.4 no.6
    • /
    • pp.510-517
    • /
    • 2001
  • In this paper, we implement a real time speaker close-up system using lip motion information from input images having some people. After detecting a speaker from input moving pictures through one color CCD camera, the other camera closes up the speaker by using lip motion information. The implemented system detects a face and lip area of each person by means of a facial color and a morphological information, and then finds out a speaker by using lip area variation. A PTZ(Pan/Tilt/Zoom) camera is used in order to close up the detected speaker and it is controlled by RS-232C serial port. Consequently, we can exactly detect a speaker in input moving pictures including more than three people.

  • PDF