• Title/Summary/Keyword: speech recognition rate improvement

Search Result 94, Processing Time 0.029 seconds

Implementation of Automatic Microphone Volume Controller and Recognition Rate Improvement (자동 입력레벨 조절기의 구현 및 인식 성능 향상)

  • 김상진;한민수
    • Proceedings of the IEEK Conference
    • /
    • 2001.09a
    • /
    • pp.503-506
    • /
    • 2001
  • In this paper, we describe the implementation of a microphone input level control algorithm and the speech improvement with this level controller in personal computer environment. The volume of speech obtained through a microphone affects the speech recognition rate directly. Therefore, proper input volume level control is desired fur better recognition. We considered some conditions for the successful volume controller implementation firstly, then checked its usefulness on our speech recognition system with common office environment speech database. Cepstral mean subtraction is also utilized far the channel-effect compensation of the database. Our implemented controller achieved approximately 50% reduction, i.e., improvement in speech recognition error rate.

  • PDF

A Study on the Improvement of DTW with Speech Silence Detection (음성의 묵음구간 검출을 통한 DTW의 성능개선에 관한 연구)

  • Kim, Jong-Kuk;Jo, Wang-Rae;Bae, Myung-Jin
    • Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.117-124
    • /
    • 2003
  • Speaker recognition is the technology that confirms the identification of speaker by using the characteristic of speech. Such technique is classified into speaker identification and speaker verification: The first method discriminates the speaker from the preregistered group and recognize the word, the second verifies the speaker who claims the identification. This method that extracts the information of speaker from the speech and confirms the individual identification becomes one of the most efficient technology as the service via telephone network is popularized. Some problems, however, must be solved for the real application as follows; The first thing is concerning that the safe method is necessary to reject the imposter because the recognition is not performed for the only preregistered customer. The second thing is about the fact that the characteristic of speech is changed as time goes by, So this fact causes the severe degradation of recognition rate and the inconvenience of users as the number of times to utter the text increases. The last thing is relating to the fact that the common characteristic among speakers causes the wrong recognition result. The silence parts being included the center of speech cause that identification rate is decreased. In this paper, to make improvement, We proposed identification rate can be improved by removing silence part before processing identification algorithm. The methods detecting speech area are zero crossing rate, energy of signal detect end point and starting point of the speech and process DTW algorithm by using two methods in this paper. As a result, the proposed method is obtained about 3% of improved recognition rate compare with the conventional methods.

  • PDF

An Improvement of Korean Speech Recognition Using a Compensation of the Speaking Rate by the Ratio of a Vowel length (모음길이 비율에 따른 발화속도 보상을 이용한 한국어 음성인식 성능향상)

  • 박준배;김태준;최성용;이정현
    • Proceedings of the IEEK Conference
    • /
    • 2003.11b
    • /
    • pp.195-198
    • /
    • 2003
  • The accuracy of automatic speech recognition system depends on the presence of background noise and speaker variability such as sex, intonation of speech, and speaking rate. Specially, the speaking rate of both inter-speaker and intra-speaker is a serious cause of mis-recognition. In this paper, we propose the compensation method of the speaking rate by the ratio of each vowel's length in a phrase. First the number of feature vectors in a phrase is estimated by the information of speaking rate. Second, the estimated number of feature vectors is assigned to each syllable of the phrase according to the ratio of its vowel length. Finally, the process of feature vector extraction is operated by the number that assigned to each syllable in the phrase. As a result the accuracy of automatic speech recognition was improved using the proposed compensation method of the speaking rate.

  • PDF

A Study on the Speech Recognition for Commands of Ticketing Machine using CHMM (CHMM을 이용한 발매기 명령어의 음성인식에 관한 연구)

  • Kim, Beom-Seung;Kim, Soon-Hyob
    • Journal of the Korean Society for Railway
    • /
    • v.12 no.2
    • /
    • pp.285-290
    • /
    • 2009
  • This paper implemented a Speech Recognition System in order to recognize Commands of Ticketing Machine (314 station-names) at real-time using Continuous Hidden Markov Model. Used 39 MFCC at feature vectors and For the improvement of recognition rate composed 895 tied-state triphone models. System performance valuation result of the multi-speaker-dependent recognition rate and the multi-speaker-independent recognition rate is 99.24% and 98.02% respectively. In the noisy environment the recognition rate is 93.91%.

Recognition Performance Improvement for Noisy-speech by Parallel Model Compensation Adaptation Using Frequency-variant added with ML (최대우도를 부가한 주파수 변이 PMC 방법의 잡음 음성 인식 성능개선)

  • Choi, Sook-Nam;Chung, Hyun-Yeol
    • Journal of Korea Multimedia Society
    • /
    • v.16 no.8
    • /
    • pp.905-913
    • /
    • 2013
  • The Parallel Model Compensation Using Frequency-variant: FV-PMC for noise-robust speech recognition is a method to classify the noises, which are expected to be intermixed with input speech when recognized, into several groups of noises by setting average frequency variant as a threshold value; and to recognize the noises depending on the classified groups. This demonstrates the excellent performance considering noisy speech categorized as good using the standard threshold value. However, it also holds a problem to decrease the average speech recognition rate with regard to unclassified noisy speech, for it conducts the process of speech recognition, combined with noiseless model as in the existing PMC. To solve this problem, this paper suggests a enhanced method of recognition to prevent the unclassified through improving the extent of rating scales with use of maximum likelihood so that the noise groups, including input noisy speech, can be classified into more specific groups, which leads to improvement of the recognition rate. The findings from recognition experiments using Aurora 2.0 database showed the improved results compared with those from the method of the previous FV-PMC.

A Study on Design and Implementation of Speech Recognition System Using ART2 Algorithm

  • Kim, Joeng Hoon;Kim, Dong Han;Jang, Won Il;Lee, Sang Bae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.4 no.2
    • /
    • pp.149-154
    • /
    • 2004
  • In this research, we selected the speech recognition to implement the electric wheelchair system as a method to control it by only using the speech and used DTW (Dynamic Time Warping), which is speaker-dependent and has a relatively high recognition rate among the speech recognitions. However, it has to have small memory and fast process speed performance under consideration of real-time. Thus, we introduced VQ (Vector Quantization) which is widely used as a compression algorithm of speaker-independent recognition, to secure fast recognition and small memory. However, we found that the recognition rate decreased after using VQ. To improve the recognition rate, we applied ART2 (Adaptive Reason Theory 2) algorithm as a post-process algorithm to obtain about 5% recognition rate improvement. To utilize ART2, we have to apply an error range. In case that the subtraction of the first distance from the second distance for each distance obtained to apply DTW is 20 or more, the error range is applied. Likewise, ART2 was applied and we could obtain fast process and high recognition rate. Moreover, since this system is a moving object, the system should be implemented as an embedded one. Thus, we selected TMS320C32 chip, which can process significantly many calculations relatively fast, to implement the embedded system. Considering that the memory is speech, we used 128kbyte-RAM and 64kbyte ROM to save large amount of data. In case of speech input, we used 16-bit stereo audio codec, securing relatively accurate data through high resolution capacity.

Performance Improvement of Connected Digit Recognition with Channel Compensation Method for Telephone speech (채널보상기법을 사용한 전화 음성 연속숫자음의 인식 성능향상)

  • Kim Min Sung;Jung Sung Yun;Son Jong Mok;Bae Keun Sung
    • MALSORI
    • /
    • no.44
    • /
    • pp.73-82
    • /
    • 2002
  • Channel distortion degrades the performance of speech recognizer in telephone environment. It mainly results from the bandwidth limitation and variation of transmission channel. Variation of channel characteristics is usually represented as baseline shift in the cepstrum domain. Thus undesirable effect of the channel variation can be removed by subtracting the mean from the cepstrum. In this paper, to improve the recognition performance of Korea connected digit telephone speech, channel compensation methods such as CMN (Cepstral Mean Normalization), RTCN (Real Time Cepatral Normalization), MCMN (Modified CMN) and MRTCN (Modified RTCN) are applied to the static MFCC. Both MCMN and MRTCN are obtained from the CMN and RTCN, respectively, using variance normalization in the cepstrum domain. Using HTK v3.1 system, recognition experiments are performed for Korean connected digit telephone speech database released by SITEC (Speech Information Technology & Industry Promotion Center). Experiments have shown that MRTCN gives the best result with recognition rate of 90.11% for connected digit. This corresponds to the performance improvement over MFCC alone by 1.72%, i.e, error reduction rate of 14.82%.

  • PDF

A Study on Word Selection Method and Device Improvement for Improving Speech Recognition Rate of Speech-Language-impaired in Severe Noise Environment (심한 소음환경에서 언어장애인 음성 인식률 향상을 위한 단어선정 방법 및 장치 개선에 관한 연구)

  • Yang, Ki-Woong;Lee, Hyung-keun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.5
    • /
    • pp.555-567
    • /
    • 2019
  • Speech recognition rate is lowered even in a noisy environment, and it is difficult for a person with a speech disability or an inconvenient language to use it in a social life. In addition to improving the inconvenience of using the language, 280 words were selected using the word selection method which was improved when the word was selected considering the pronunciation characteristics of the language impaired. The MEMS development device used in the experiment was made considering material, lead wire type, length and direction. We improved the speech recognition rate by using the developed word selection method and the MEMS device developed to improve the speech recognition rate due to incorrect pronunciation and severe noise. The new method of selecting words and the mems device were improved and the results were included.

Feature Extraction Based on Speech Attractors in the Reconstructed Phase Space for Automatic Speech Recognition Systems

  • Shekofteh, Yasser;Almasganj, Farshad
    • ETRI Journal
    • /
    • v.35 no.1
    • /
    • pp.100-108
    • /
    • 2013
  • In this paper, a feature extraction (FE) method is proposed that is comparable to the traditional FE methods used in automatic speech recognition systems. Unlike the conventional spectral-based FE methods, the proposed method evaluates the similarities between an embedded speech signal and a set of predefined speech attractor models in the reconstructed phase space (RPS) domain. In the first step, a set of Gaussian mixture models is trained to represent the speech attractors in the RPS. Next, for a new input speech frame, a posterior-probability-based feature vector is evaluated, which represents the similarity between the embedded frame and the learned speech attractors. We conduct experiments for a speech recognition task utilizing a toolkit based on hidden Markov models, over FARSDAT, a well-known Persian speech corpus. Through the proposed FE method, we gain 3.11% absolute phoneme error rate improvement in comparison to the baseline system, which exploits the mel-frequency cepstral coefficient FE method.

Performance Improvement of Microphone Array Speech Recognition Using Features Weighted Mahalanobis Distance (가중특징 Mahalanobis거리를 이용한 마이크 어레이 음석인식의 성능향상)

  • Nguyen, Dinh Cuong;Chung, Hyun-Yeol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.1E
    • /
    • pp.45-53
    • /
    • 2010
  • In this paper, we present the use of the Features Weighted Mahalanobis Distance (FWMD) in improving the performance of Likelihood Maximizing Beamforming (Limabeam) algorithm in speech recognition for microphone array. The proposed approach is based on the replacement of the traditional distance measure in a Gaussian classifier with adding weight for different features in the Mahalanobis distance according to their distances after the variance normalization. By using Features Weighted Mahalanobis Distance for Limabeam algorithm (FWMD-Limabeam), we obtained correct word recognition rate of 90.26% for calibrate Limabeam and 87.23% for unsupervised Limabeam, resulting in a higher rate of 3% and 6% respectively than those produced by the original Limabearn. By implementing a HM-Net speech recognition strategy alternatively, we could save memory and reduce computation complexity.