• Title/Summary/Keyword: Speaker independent

Search Result 235, Processing Time 0.021 seconds

A Study on the Removal of Unusual Feature Vectors in Speech Recognition (음성인식에서 특이 특징벡터의 제거에 대한 연구)

  • Lee, Chang-Young
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.8 no.4
    • /
    • pp.561-567
    • /
    • 2013
  • Some of the feature vectors for speech recognition are rare and unusual. These patterns lead to overfitting for the parameters of the speech recognition system and, as a result, cause structural risks in the system that hinder the good performance in recognition. In this paper, as a method of removing these unusual patterns, we try to exclude vectors whose norms are larger than a specified cutoff value and then train the speech recognition system. The objective of this study is to exclude as many unusual feature vectors under the condition of no significant degradation in the speech recognition error rate. For this purpose, we introduce a cutoff parameter and investigate the resultant effect on the speaker-independent speech recognition of isolated words by using FVQ(Fuzzy Vector Quantization)/HMM(Hidden Markov Model). Experimental results showed that roughly 3%~6% of the feature vectors might be considered as unusual, and therefore be excluded without deteriorating the speech recognition accuracy.

A Study on Isolated Word Recognition using Improved Multisection Vector Quantization Recognition System (개선된 MSVQ 인식 시스템을 이용한 단독어 인식에 관한 연구)

  • An, Tae-Ok;Kim, Nam-Joong;Song, Chul;Kim, Soon-Hyeob
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.16 no.2
    • /
    • pp.196-205
    • /
    • 1991
  • This paper is a study on the isolated word recognition of speaker independent which proposes to newly improved MSVQ(multisection vector quantization) recognition system which improve the classical MSVQ recognition system. It is a difference that test pattern has on more section than reference pattern in recognition system 146 DDD area names are selected as recognition vocabulary. 12th LPC cepstral coefficients is used as feature parameter. and when codebook is generated, MINSUM and MINMAX are used in finding the centroid. According to the experiment result. it is proved that this method is better than VQ(vector quantization) recognition methods, DTW(dynamic time warping) pattern matching methods and classical MSVQ methods for recognition rate and recognition time.

  • PDF

Speech Recognition and Its Learning by Neural Networks (신경회로망을 이용한 음성인식과 그 학습)

  • 이권현
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.16 no.4
    • /
    • pp.350-357
    • /
    • 1991
  • A speech recognition system based on a neural network, which can be used for telephon number services was tested. Because in Korea two different cardinal number systems, a koreanic one and a sinokoreanic one, are in use, it is necessary that the used systems is able to recognize 22 discret words. The structure of the neural network used had two layers, also a structure with 3 layers, one hidden layreformed of each 11, 22 and 44 hidden units was tested. During the learning phase of the system the so called BP-algorithm (back propagation) was applied. The process of learning can e influenced by using a different learning factor and also by the method of learning(for instance random or cycle). The optimal rate of speaker independent recognition by using a 2 layer neural network was 96%. A drop of recognition was observed by overtraining. This phenomen appeared more clearly if a 3 layer neural network was used. These phenomens are described in this paper in more detail. Especially the influence of the construction of the neural network and the several states during the learning phase are examined.

  • PDF

Performance Improvement of Mel-Cepstrum Through Optimzing Filter Banks (필터 뱅크 최적화에 의한 멜켑스트럼의 성능 향상)

  • 현동훈;이철희
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.1
    • /
    • pp.78-85
    • /
    • 1999
  • In this paper we propose a method to improve the performance of the mel-cepstrum that is widely used in speech recognition. Typically, the met-cepstrum is obtained by critical band filters that have fixed center spacing and bandwidth. However different filter characteristics produce a different mel-cepstrum, resulting in a different performance. In this paper we analyze triangular-shaped and rectangular-shaped filters. By changing the characteristics of filters such as center frequency and bandwidth, we analyze the performance of the met-cepstrum. Then utilizing the simplex method, we propose a method to optimize the critical band filters. Using the dynamic time warping, we performed speaker independent recognition experiments with Korean digit words pronounced by 10 males and 10 females. Experiments show that the rectangular-shaped filters show good performance and the mel-cepstrum obtained by the optimized filters shows better performance than filters that have fixed center spacing and bandwidth.

  • PDF

A Study on the Korean Broadcasting Speech Recognition (한국어 방송 음성 인식에 관한 연구)

  • 김석동;송도선;이행세
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.1
    • /
    • pp.53-60
    • /
    • 1999
  • This paper is a study on the korean broadcasting speech recognition. Here we present the methods for the large vocabuary continuous speech recognition. Our main concerns are the language modeling and the search algorithm. The used acoustic model is the uni-phone semi-continuous hidden markov model and the used linguistic model is the N-gram model. The search algorithm consist of three phases in order to utilize all available acoustic and linguistic information. First, we use the forward Viterbi beam search to find word end frames and to estimate related scores. Second, we use the backword Viterbi beam search to find word begin frames and to estimate related scores. Finally, we use A/sup */ search to combine the above two results with the N-grams language model and to get recognition results. Using these methods maximum 96.0% word recognition rate and 99.2% syllable recognition rate are achieved for the speaker-independent continuous speech recognition problem with about 12,000 vocabulary size.

  • PDF

HMM-based Speech Recognition using FSVQ, Fuzzy Concept and Doubly Spectral Feature (FSVQ, 퍼지 개념 및 이중 스펙트럼 특징을 이용한 HMM에 기초를 둔 음성 인식)

  • 정의봉
    • Journal of the Korea Computer Industry Society
    • /
    • v.5 no.4
    • /
    • pp.491-502
    • /
    • 2004
  • In this paper, we propose a HMM model using FSVQ(First Section VQ), fuzzy theory and doubly spectral feature, as study on the isolated word recognition system of speaker-independent. In the proposed paper, LPC cepstrum coefficients and regression coefficients of LPC cepstrum as doubly spectral feature be used. And, training data are divided several section and first section is generated codebook of VQ, and then is obtained multi-observation sequences by order of large propabilistic values based on fuzzy nile from the codebook of the first section. Thereafter, this observation sequences of first section is trained and is recognized a word to be obtained highest probaility by same concept. Besides the speech recognition experiments of proposed method, we experiment the other methods under the equivalent environment of data and conditions. In the whole experiment, it is proved that the proposed method is superior to the others in recognition rate.

  • PDF

Speech Recognition Based on VQ/NN using Fuzzy (Fuzzy를 이용한 VQ/NN에 기초를 둔 음성 인식)

  • Ann, Tae-Ock
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.6
    • /
    • pp.5-11
    • /
    • 1996
  • This paper is the study for recognizing single vowels of speaker-independent, and we suppose a method of speech recognition using VQ(Vector Quantization)/NN(Neural Network). This method makes a VQ codebook, which is used for obtaining the observation sequence, and then claculates the probability value by comparing each codeword with the data, finally uses these probability values for the input value of the neural network. Korean signle vowels are selected for our recognition experiment, and ten male speakers pronounced eight single vowels ten times. We compare the performance of our method with those of fuzzy VQ/HMM and conventional VQ/NN According to the experiment result, the recognition rate by VQ/NN is 92.3%, by VQ/HMM using fuzzy is 93.8% and by VQ/NN using fuzzy is 95.7%. Therefore, it is shown that recognition rate of speech recognition by fuzzy VQ/NN is better than those of fuzzy VQ/HMM and conventional VQ/HMM because of its excellent learning ability.

  • PDF

HMM-based Speech Recognition using DMS Model and Fuzzy Concept (DMS 모델과 퍼지 개념을 이용한 HMM에 기초를 둔 음성 인식)

  • Ann, Tae-Ock
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.9 no.4
    • /
    • pp.964-969
    • /
    • 2008
  • This paper proposes a HMM-based recognition method using DMSVQ(Dynamic Multi-Section Vector Quantization) codebook by DMS(Dynamic Multi-Section) model and fuzzy concept, as a study for speaker- independent speech recognition. In this proposed recognition method, training data are divided into several dynamic section and multi-observation sequences which are given proper probabilities by fuzzy rule according to order of short distance from DMSVQ codebook per each section are obtained. Thereafter, the HMM using this multi-observation sequences is generated, and in case of recognition, a word that has the most highest probability is selected as a recognized word. Other experiments to compare with the results of recognition experiments using proposed method are implemented as a data by the various conventional recognition methods under the equivalent environment. Through the experiment results, it is proved that the proposed method in this study is superior to the conventional recognition methods.

The Study of Korean Speech Recognition for Various Continue HMM (다양한 연속밀도 함수를 갖는 HMM에 대한 우리말 음성인식에 관한 연구)

  • Woo, In-Sung;Shin, Chwa-Cheul;Kang, Heung-Soon;Kim, Suk-Dong
    • Journal of IKEEE
    • /
    • v.11 no.2
    • /
    • pp.89-94
    • /
    • 2007
  • This paper is a study on continuous speech recognition in the Korean language using HMM-based models with continuous density functions. Here, we propose the most efficient method of continuous speech recognition for the Korean language under the condition of a continuous HMM model with 2 to 44 density functions. Two voice models were used CI-Model that uses 36 uni-phones and CD-Model that uses 3,000 tri-phones. Language model was based on N-gram. Using these models, 500 sentences and 6,486 words under speaker-independent condition were processed. In the case of the CI-Model, the maximum word recognition rate was 94.4% and sentence recognition rate was 64.6%. For the CD-Model, word recognition rate was 98.2% and sentence recognition rate was 73.6%. The recognition rate of CD-Model we obtained was stable.

  • PDF

Robust Speech Recognition Parameters for Emotional Variation (감정 변화에 강인한 음성 인식 파라메터)

  • Kim Weon-Goo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.6
    • /
    • pp.655-660
    • /
    • 2005
  • This paper studied the feature parameters less affected by the emotional variation for the development of the robust speech recognition technologies. For this purpose, the effect of emotional variation on the speech recognition system and robust feature parameters of speech recognition system were studied using speech database containing various emotions. In this study, LPC cepstral coefficient, met-cepstral coefficient, root-cepstral coefficient, PLP coefficient, RASTA met-cepstral coefficient were used as a feature parameters. And CMS and SBR method were used as a signal bias removal techniques. Experimental results showed that the HMM based speaker independent word recognizer using RASTA met-cepstral coefficient :md its derivatives and CMS as a signal bias removal showed the best performance of $7.05\%$ word error rate. This corresponds to about a $52\%$ word error reduction as compare to the performance of baseline system using met - cepstral coefficient.