• Title/Summary/Keyword: 화자독립

Search Result 231, Processing Time 0.02 seconds

Korean Word Recognition using the Transition Matrix of VQ-Code and DHMM (VQ코드의 천이 행렬과 이산 HMM을 이용한 한국어 단어인식)

  • Chung, Kwang-Woo;Hong, Kwang-Seok;Park, Byung-Chul
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.4
    • /
    • pp.40-49
    • /
    • 1994
  • In this paper, we propose methods for improving the performance of word recognition system. The ray stratey of the first method is to apply the inertia to the feature vector sequences of speech signal to stabilize the transitions between VQ cdoes. The second method is generating the new observation probabilities using the transition matrix of VQ codes as weights at the observation probability of the output symbol, so as to take into account the time relation between neighboring frames in DHMM. By applying the inertia to the feature vector sequences, we can reduce the overlapping of probability distribution of the response paths for each word and stabilize state transitions in the HMM. By using the transition matrix of VQ codes as weights in conventional DHMM. we can divide the probability distribution of feature vectors more and more, and restrict the feature distribution to a suitable region so that the performance of recognition system can improve. To evaluate the performance of the proposed methods, we carried out experiments for 50 DDD area names. As a result, the proposed methods improved the recognition rate by $4.2\%$ in the speaker-dependent test and $12.45\%$ in the speaker-independent test, respectively, compared with the conventional DHMM.

  • PDF

Speaker Adapted Real-time Dialogue Speech Recognition Considering Korean Vocal Sound System (한국어 음운체계를 고려한 화자적응 실시간 단모음인식에 관한 연구)

  • Hwang, Seon-Min;Yun, Han-Kyung;Song, Bok-Hee
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.6 no.4
    • /
    • pp.201-207
    • /
    • 2013
  • Voice Recognition technique has been developed and it has been actively applied to various information devices such as smart phones and car navigation system. But the basic research technique related the speech recognition is based on research results in English. Since the lip sync producing generally requires tedious hand work of animators and it serious affects the animation producing cost and development period to get a high quality lip animation. In this research, a real time processed automatic lip sync algorithm for virtual characters in digital contents is studied by considering Korean vocal sound system. This suggested algorithm contributes to produce a natural lip animation with the lower producing cost and the shorter development period.

Robust Speech Recognition Parameters for Emotional Variation (감정 변화에 강인한 음성 인식 파라메터)

  • Kim Weon-Goo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.6
    • /
    • pp.655-660
    • /
    • 2005
  • This paper studied the feature parameters less affected by the emotional variation for the development of the robust speech recognition technologies. For this purpose, the effect of emotional variation on the speech recognition system and robust feature parameters of speech recognition system were studied using speech database containing various emotions. In this study, LPC cepstral coefficient, met-cepstral coefficient, root-cepstral coefficient, PLP coefficient, RASTA met-cepstral coefficient were used as a feature parameters. And CMS and SBR method were used as a signal bias removal techniques. Experimental results showed that the HMM based speaker independent word recognizer using RASTA met-cepstral coefficient :md its derivatives and CMS as a signal bias removal showed the best performance of $7.05\%$ word error rate. This corresponds to about a $52\%$ word error reduction as compare to the performance of baseline system using met - cepstral coefficient.

ImprovementofMLLRAlgorithmforRapidSpeakerAdaptationandReductionofComputation (빠른 화자 적응과 연산량 감소를 위한 MLLR알고리즘 개선)

  • Kim, Ji-Un;Chung, Jae-Ho
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.1C
    • /
    • pp.65-71
    • /
    • 2004
  • We improved the MLLR speaker adaptation algorithm with reduction of the order of HMM parameters using PCA(Principle Component Analysis) or ICA(Independent Component Analysis). To find a smaller set of variables with less redundancy, we adapt PCA(principal component analysis) and ICA(independent component analysis) that would give as good a representation as possible, minimize the correlations between data elements, and remove the axis with less covariance or higher-order statistical independencies. Ordinary MLLR algorithm needs more than 30 seconds adaptation data to represent higher word recognition rate of SD(Speaker Dependent) models than of SI(Speaker Independent) models, whereas proposed algorithm needs just more than 10 seconds adaptation data. 10 components for ICA and PCA represent similar performance with 36 components for ordinary MLLR framework. So, compared with ordinary MLLR algorithm, the amount of total computation requested in speaker adaptation is reduced by about 1/167 in proposed MLLR algorithm.

A Study on the Voice Dialing using HMM and Post Processing of the Connected Digits (HMM과 연결 숫자음의 후처리를 이용한 음성 다이얼링에 관한 연구)

  • Yang, Jin-Woo;Kim, Soon-Hyob
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.5
    • /
    • pp.74-82
    • /
    • 1995
  • This paper is study on the voice dialing using HMM and post processing of the connected digits. HMM algorithm is widely used in the speech recognition with a good result. But, the maximum likelihood estimation of HMM(Hidden Markov Model) training in the speech recognition does not lead to values which maximize recognition rate. To solve the problem, we applied the post processing to segmental K-means procedure are in the recognition experiment. Korea connected digits are influenced by the prolongation more than English connected digits. To decrease the segmentation error in the level building algorithm some word models which can be produced by the prolongation are added. Some rules for the added models are applied to the recognition result and it is updated. The recognition system was implemented with DSP board having a TMS320C30 processor and IBM PC. The reference patterns were made by 3 male speakers in the noisy laboratory. The recognition experiment was performed for 21 sort of telephone number, 252 data. The recognition rate was $6\%$ in the speaker dependent, and $80.5\%$ in the speaker independent recognition test.

  • PDF

Robust Speech Recognition Using Missing Data Theory (손실 데이터 이론을 이용한 강인한 음성 인식)

  • 김락용;조훈영;오영환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.3
    • /
    • pp.56-62
    • /
    • 2001
  • In this paper, we adopt a missing data theory to speech recognition. It can be used in order to maintain high performance of speech recognizer when the missing data occurs. In general, hidden Markov model (HMM) is used as a stochastic classifier for speech recognition task. Acoustic events are represented by continuous probability density function in continuous density HMM(CDHMM). The missing data theory has an advantage that can be easily applicable to this CDHMM. A marginalization method is used for processing missing data because it has small complexity and is easy to apply to automatic speech recognition (ASR). Also, a spectral subtraction is used for detecting missing data. If the difference between the energy of speech and that of background noise is below given threshold value, we determine that missing has occurred. We propose a new method that examines the reliability of detected missing data using voicing probability. The voicing probability is used to find voiced frames. It is used to process the missing data in voiced region that has more redundant information than consonants. The experimental results showed that our method improves performance than baseline system that uses spectral subtraction method only. In 452 words isolated word recognition experiment, the proposed method using the voicing probability reduced the average word error rate by 12% in a typical noise situation.

  • PDF

A Study on Hybrid Structure of Semi-Continuous HMM and RBF for Speaker Independent Speech Recognition (화자 독립 음성 인식을 위한 반연속 HMM과 RBF의 혼합 구조에 관한 연구)

  • 문연주;전선도;강철호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.8
    • /
    • pp.94-99
    • /
    • 1999
  • It is the hybrid structure of HMM and neural network(NN) that shows high recognition rate in speech recognition algorithms. And it is a method which has majorities of statistical model and neural network model respectively. In this study, we propose a new style of the hybrid structure of semi-continuous HMM(SCHMM) and radial basis function(RBF), which re-estimates weighting coefficients probability affecting observation probability after Baum-Welch estimation. The proposed method takes account of the similarity of basis Auction of RBF's hidden layer and SCHMM's probability density functions so as to discriminate speech signals sensibly through the learned and estimated weighting coefficients of RBF. As simulation results show that the recognition rates of the hybrid structure SCHMM/RBF are higher than those of SCHMM in unlearned speakers' recognition experiment, the proposed method has been proved to be one which has more sensible property in recognition than SCHMM.

  • PDF

An Implementation of the Real Time Speech Recognition for the Automatic Switching System (자동 교환 시스템을 위한 실시간 음성 인식 구현)

  • 박익현;이재성;김현아;함정표;유승균;강해익;박성현
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.4
    • /
    • pp.31-36
    • /
    • 2000
  • This paper describes the implementation and the evaluation of the speech recognition automatic exchange system. The system provides government or public offices, companies, educational institutions that are composed of large number of members and parts with exchange service using speech recognition technology. The recognizer of the system is a Speaker-Independent, Isolated-word, Flexible-Vocabulary recognizer based on SCHMM(Semi-Continuous Hidden Markov Model). For real-time implementation, DSP TMS320C32 made in Texas Instrument Inc. is used. The system operating terminal including the diagnosis of speech recognition DSP and the alternation of speech recognition candidates makes operation easy. In this experiment, 8 speakers pronounced words of 1,300 vocabulary related to automatic exchange system over wire telephone network and the recognition system achieved 91.5% of word accuracy.

  • PDF

Speech Recognition in Noisy environment using Transition Constrained HMM (천이 제한 HMM을 이용한 잡음 환경에서의 음성 인식)

  • Kim, Weon-Goo;Shin, Won-Ho;Youn, Dae-Hee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.2
    • /
    • pp.85-89
    • /
    • 1996
  • In this paper, transition constrained Hidden Markov Model(HMM) in which the transition between states occur only within prescribed time slot is proposed and the performance is evaluated in the noisy environment. The transition constrained HMM can explicitly limit the state durations and accurately de scribe the temporal structure of speech signal simply and efficiently. The transition constrained HMM is not only superior to the conventional HMM but also require much less computation time. In order to evaluate the performance of the transition constrained HMM, speaker independent isolated word recognition experiments were conducted using semi-continuous HMM with the noisy speech for 20, 10, 0 dB SNR. Experiment results show that the proposed method is robust to the environmental noise. The 81.08% and 75.36% word recognition rates for conventional HMM was increased by 7.31% and 10.35%, respectively, by using transition constrained HMM when two kinds of noises are added with 10dB SNR.

  • PDF

Lip Reading Method Using CNN for Utterance Period Detection (발화구간 검출을 위해 학습된 CNN 기반 입 모양 인식 방법)

  • Kim, Yong-Ki;Lim, Jong Gwan;Kim, Mi-Hye
    • Journal of Digital Convergence
    • /
    • v.14 no.8
    • /
    • pp.233-243
    • /
    • 2016
  • Due to speech recognition problems in noisy environment, Audio Visual Speech Recognition (AVSR) system, which combines speech information and visual information, has been proposed since the mid-1990s,. and lip reading have played significant role in the AVSR System. This study aims to enhance recognition rate of utterance word using only lip shape detection for efficient AVSR system. After preprocessing for lip region detection, Convolution Neural Network (CNN) techniques are applied for utterance period detection and lip shape feature vector extraction, and Hidden Markov Models (HMMs) are then used for the recognition. As a result, the utterance period detection results show 91% of success rates, which are higher performance than general threshold methods. In the lip reading recognition, while user-dependent experiment records 88.5%, user-independent experiment shows 80.2% of recognition rates, which are improved results compared to the previous studies.