• Title/Summary/Keyword: Speaker Adaptation

Search Result 122, Processing Time 0.023 seconds

An Utterance Verification using Vowel String (모음 열을 이용한 발화 검증)

  • 유일수;노용완;홍광석
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2003.06a
    • /
    • pp.46-49
    • /
    • 2003
  • The use of confidence measures for word/utterance verification has become art essential component of any speech input application. Confidence measures have applications to a number of problems such as rejection of incorrect hypotheses, speaker adaptation, or adaptive modification of the hypothesis score during search in continuous speech recognition. In this paper, we present a new utterance verification method using vowel string. Using subword HMMs of VCCV unit, we create anti-models which include vowel string in hypothesis words. The experiment results show that the utterance verification rate of the proposed method is about 79.5%.

  • PDF

Noisy Environmental Adaptation for Word Recognition System Using Maximum a Posteriori Estimation (최대사후확률 추정법을 이용한 단어인식기의 잡음환경적응화)

  • Lee, Jung-Hoon;Lee, Shi-Wook;Chung, Hyun-Yeol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.2
    • /
    • pp.107-113
    • /
    • 1997
  • To achive a robust Korean word recognition system for both channel distortion and additive noise, maximum a posteriori estimation(MAP) adaptation is proposed and the effectiveness of environmental adaptation for improving recognition performance is investigated in this paper. To do this, recognition experiments using MAP adaptation are carried out for the three different speech ; 1) channel distortion is introduced, 2) environmental noise is added, 3) both channel distortion and additive noise are presented. Theeffectiveness of additive feature parameters, such as regressive coefficients and durations, for environmental adaptation are also investigated. From the speaker independent 100 words recognition tests, we had 9.0% of recognition improvement for the case 1), more than 75% for the case 2), and 11%~61.4% for the case 3) respectively, resulting that a MAP environmental adaptation is effective for both channel distorted and noise added speech recognition. But it turned out that duration information used as additive feature parameter did not played an important role in the tests.

  • PDF

An Adaptive Utterance Verification Framework Using Minimum Verification Error Training

  • Shin, Sung-Hwan;Jung, Ho-Young;Juang, Biing-Hwang
    • ETRI Journal
    • /
    • v.33 no.3
    • /
    • pp.423-433
    • /
    • 2011
  • This paper introduces an adaptive and integrated utterance verification (UV) framework using minimum verification error (MVE) training as a new set of solutions suitable for real applications. UV is traditionally considered an add-on procedure to automatic speech recognition (ASR) and thus treated separately from the ASR system model design. This traditional two-stage approach often fails to cope with a wide range of variations, such as a new speaker or a new environment which is not matched with the original speaker population or the original acoustic environment that the ASR system is trained on. In this paper, we propose an integrated solution to enhance the overall UV system performance in such real applications. The integration is accomplished by adapting and merging the target model for UV with the acoustic model for ASR based on the common MVE principle at each iteration in the recognition stage. The proposed iterative procedure for UV model adaptation also involves revision of the data segmentation and the decoded hypotheses. Under this new framework, remarkable enhancement in not only recognition performance, but also verification performance has been obtained.

A study imitating human auditory system for tracking the position of sound source (인간의 청각 시스템을 응용한 음원위치 추정에 관한 연구)

  • Bae, Jeen-Man;Cho, Sun-Ho;Park, Chong-Kuk
    • Proceedings of the KIEE Conference
    • /
    • 2003.11c
    • /
    • pp.878-881
    • /
    • 2003
  • To acquire an appointed speaker's clear voice signal from inspect-camera, picture-conference or hands free microphone eliminating interference noises needs to be preceded speaker's position automatically. Presumption of sound source position's basic algorithm is about measuring TDOA(Time Difference Of Arrival) from reaching same signals between two microphones. This main project uses ADF(Adaptive Delay Filter) [4] and CPS(Cross Power Spectrum) [5] which are one of the most important analysis of TDOA. From these analysis this project proposes presumption of real time sound source position and improved model NI-ADF which makes possible to presume both directions of sound source position. NI-ADF noticed that if auditory sense of humankind reaches above to some specified level in specified frequency, it will accept sound through activated nerve. NI-ADF also proposes practicable algorithm, the presumption of real time sound source position including both directions, that when microphone loads to some specified system, it will use sounds level difference from external system related to sounds of diffraction phenomenon. In accordance with the project, when existing both direction adaptation filter's algorithm measures sound source, it increases more than twice number by measuring one way. Preserving this weak point, this project proposes improved algorithm to presume real time in both directions.

  • PDF

A Method on the Learning Speed Improvement of the Online Error Backpropagation Algorithm in Speech Processing (음성처리에서 온라인 오류역전파 알고리즘의 학습속도 향상방법)

  • 이태승;이백영;황병원
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.5
    • /
    • pp.430-437
    • /
    • 2002
  • Having a variety of good characteristics against other pattern recognition techniques, the multilayer perceptron (MLP) has been widely used in speech recognition and speaker recognition. But, it is known that the error backpropagation (EBP) algorithm that MLP uses in learning has the defect that requires restricts long learning time, and it restricts severely the applications like speaker recognition and speaker adaptation requiring real time processing. Because the learning data for pattern recognition contain high redundancy, in order to increase the learning speed it is very effective to use the online-based learning methods, which update the weight vector of the MLP by the pattern. A typical online EBP algorithm applies the fixed learning rate for each update of the weight vector. Though a large amount of speedup with the online EBP can be obtained by choosing the appropriate fixed rate, firing the rate leads to the problem that the algorithm cannot respond effectively to different learning phases as the phases change and the number of patterns contributing to learning decreases. To solve this problem, this paper proposes a Changing rate and Omitting patterns in Instant Learning (COIL) method to apply the variable rate and the only patterns necessary to the learning phase when the phases come to change. In this paper, experimentations are conducted for speaker verification and speech recognition, and results are presented to verify the performance of the COIL.

Noisy Speech Recognition Based on Spectral Mapping Techniques (스펙트럼사상기법을 기초로 한 잡음음성인식)

  • Lee, Ki-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.1E
    • /
    • pp.39-45
    • /
    • 1995
  • This paper presents noisy speech recognition method based on spectral mapping techniques of speaker adaptation method. In the presented method, the spectral mapping training makes the spectral distortion of noisy speech reduced, and for the more correctively spectral mapping, let the adjustment window;s slope be adaptive to several word lengths. As a result of recognition experiment, the recognition rate is higher than that of the conventional method using VQ and DTW without noise processing. Even when SNR level is 0 dB, the recognition rate is 10 times more than that using the conventional method. It is confirmed that the speacker adaptation technique using the spectral mapping training has an ability to improve the recognition performance for noisy speech.

  • PDF

Adaptation of Classification Model for Improving Speech Intelligibility in Noise (음성 명료도 향상을 위한 분류 모델의 잡음 환경 적응)

  • Jung, Junyoung;Kim, Gibak
    • Journal of Broadcast Engineering
    • /
    • v.23 no.4
    • /
    • pp.511-518
    • /
    • 2018
  • This paper deals with improving speech intelligibility by applying binary mask to time-frequency units of speech in noise. The binary mask is set to "0" or "1" according to whether speech is dominant or noise is dominant by comparing signal-to-noise ratio with pre-defined threshold. Bayesian classifier trained with Gaussian mixture model is used to estimate the binary mask of each time-frequency signal. The binary mask based noise suppressor improves speech intelligibility only in noise condition which is included in the training data. In this paper, speaker adaptation techniques for speech recognition are applied to adapt the Gaussian mixture model to a new noise environment. Experiments with noise-corrupted speech are conducted to demonstrate the improvement of speech intelligibility by employing adaption techniques in a new noise environment.

Korean Broadcast News Transcription Using Morpheme-based Recognition Units

  • Kwon, Oh-Wook;Alex Waibel
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.1E
    • /
    • pp.3-11
    • /
    • 2002
  • Broadcast news transcription is one of the hardest tasks in speech recognition because broadcast speech signals have much variability in speech quality, channel and background conditions. We developed a Korean broadcast news speech recognizer. We used a morpheme-based dictionary and a language model to reduce the out-of·vocabulary (OOV) rate. We concatenated the original morpheme pairs of short length or high frequency in order to reduce insertion and deletion errors due to short morphemes. We used a lexicon with multiple pronunciations to reflect inter-morpheme pronunciation variations without severe modification of the search tree. By using the merged morpheme as recognition units, we achieved the OOV rate of 1.7% comparable to European languages with 64k vocabulary. We implemented a hidden Markov model-based recognizer with vocal tract length normalization and online speaker adaptation by maximum likelihood linear regression. Experimental results showed that the recognizer yielded 21.8% morpheme error rate for anchor speech and 31.6% for mostly noisy reporter speech.

Genetic Algorithm for Speaker Adaptation in Speech Recognition (유전자 알고리듬을 이용한 화자 적응적 음성인식)

  • 임동철
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.06c
    • /
    • pp.107-110
    • /
    • 1998
  • 본 논문은 DTW(Dynamic Time Warping)을 이용한 음성인식에서 표준패턴(reference patterns)으로 사용되는 벡터열을 GA(Genetic Algorithm)을 이용하여 보다 적응된 패턴의 벡터열로 생성하는 방법을 제시한다. 본 논문의 필요성은 다음과 같다. 음성인식의 주요한 엔진들 중에 하나로 DTW가 사용된다[1]. DTW는 표준패턴과 시험패턴(test patterns)간의 최적 경로(optimal path)를 찾아내어 가장 유사한 패턴을 찾아내는 방법을 말한다. 그러나 음성은 같은 발음에 대해서도 사람의 발성 길이와 목의 상태 등에 따라 다양한 패턴으로 나타나며 동일 화자의 같은 어휘도 시간과 환경에 따라 변한다. 따라서 이러한 음성의 동적 특성에 적응하는 방법이 필요하다. 본 논문은 이러한 문제에 대한 해결 방법으로 GA를 이용하여 보다 적합하고 적응적인 표준 패턴을 생성시켜 적응하는 방법을 개발하였다.

  • PDF

Efficient Rapid Speaker Adaptation Using Merging Eigenvoices (Eigenvoice 병합을 이용한 효율적인 고속 화자 적응)

  • Choi Dong-jin;Oh Yung-Hwan
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.115-118
    • /
    • 2004
  • 음성 인식 분야에서는 화자 적응을 통해 화자 독립 시스템의 성능을 화자 종속 시스템에 근접시키려는 여러 가지 노력이 시도되고 있다. 특히 30 초미만의 매우 적은 양의 적응 자료를 이용하는 고속 화자 적응에 대한 관심이 증가하고 있다. 고속 화자 적응에 적합한 eigenvoice 를 이용한 적응 방법은 eigenvoice 를 구성하기 위해 너무 많은 계산량과 메모리를 요구한다. 본 논문에서는 각각 따로 계산된 eigenvoice 들을 한 번에 구성한 eigenvoice 들과 거의 같은 정확도를 갖도록 병합하여 고속 화자 적응에 이용하는 방법을 제안한다. 이 방법을 이용하면 훈련 자료의 추가시 처음부터 새롭게 eigenvoice 를 구하는 대신 추가된 자료에 대한 eigenvoice 를 구하고 병합함으로써 계산량과 메모리양을 현저히 줄일 수 있다. 실험 결과, 메모리와 계산량은 추가되는 화자 종속 모델의 수에 따라 감소하며 성능 저하는 거의 없었다.

  • PDF