• Title/Summary/Keyword: speech recognition rate improvement

Search Result 94, Processing Time 0.035 seconds

Improvement of Speech Recognition System Using the Trained Model of Speech Feature (음성특성 학습 모델을 이용한 음성인식 시스템의 성능 향상)

  • 송점동
    • The Journal of Information Technology
    • /
    • v.3 no.4
    • /
    • pp.1-12
    • /
    • 2000
  • We can devide the speech into high frequency speech and low frequency speech according to the feature of the speech, However so far the construction of the recognizer without concerning this feature causes low recognition rate relatively and the needs of an amount of data in the research on the speech recognition. In this paper, we propose the method that can devide this feature of speaker's speech using the Formant frequency, and the method that can recognize the speech after constructing the recognizer model reflecting the feature of the high and low frequency of the speaker's speech, For the experiment we constructed the recognizer model using 47 mono-phone of Korean and trained the recognizer model using 20 women's and men's speech respectively. We divided the feature of speech using the Formant frequency Table, that had been consisted of the Formant frequency, and the value of pitch, and then We performed recognition using the trained model according to the feature of speech The proposed system outperformed the existing method in the recognition rate, as the result.

  • PDF

Performance Improvement of SPLICE-based Noise Compensation for Robust Speech Recognition (강인한 음성인식을 위한 SPLICE 기반 잡음 보상의 성능향상)

  • Kim, Hyung-Soon;Kim, Doo-Hee
    • Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.263-277
    • /
    • 2003
  • One of major problems in speech recognition is performance degradation due to the mismatch between the training and test environments. Recently, Stereo-based Piecewise LInear Compensation for Environments (SPLICE), which is frame-based bias removal algorithm for cepstral enhancement using stereo training data and noisy speech model as a mixture of Gaussians, was proposed and showed good performance in noisy environments. In this paper, we propose several methods to improve the conventional SPLICE. First we apply Cepstral Mean Subtraction (CMS) as a preprocessor to SPLICE, instead of applying it as a postprocessor. Secondly, to compensate residual distortion after SPLICE processing, two-stage SPLICE is proposed. Thirdly we employ phonetic information for training SPLICE model. According to experiments on the Aurora 2 database, proposed method outperformed the conventional SPLICE and we achieved a 50% decrease in word error rate over the Aurora baseline system.

  • PDF

An aerodynamic and acoustic characteristics of Clear Speech in patients with Parkinson's disease (파킨슨 환자의 클리어 스피치 전후 음향학적 공기역학적 특성)

  • Shin, Hee Baek;Ko, Do-Heung
    • Phonetics and Speech Sciences
    • /
    • v.9 no.3
    • /
    • pp.67-74
    • /
    • 2017
  • An increase in speech intelligibility has been found in Clear Speech compared to conversational speech. Clear Speech is defined by decreased articulation rates and increased frequency and length of pauses. The objective of the present study was to investigate improvement in immediate speech intelligibility in 10 patients with Parkinson's disease (age range: 46 to 75 years) using Clear Speech. This experiment has been performed using the Phonatory Aerodynamic System 6600 after the participants read the first sentence of a Sanchaek passage and the "List for Adults 1" in the Sentence Recognition Test (SRT) using casual speech and Clear Speech. Acoustic and aerodynamic parameters that affect speech intelligibility were measured, including mean F0, F0 range, intensity, speaking rate, mean airflow rate, and respiratory rate. In the Sanchaek passage, use of Clear Speech resulted in significant differences in mean F0, F0 range, speaking rate, and respiratory rate, compared with the use of casual speech. In the SRT list, significant differences were seen in mean F0, F0 range, and speaking rate. Based on these findings, it is claimed that speech intelligibility can be affected by adjusting breathing and tone in Clear Speech. Future studies should identify the benefits of Clear Speech through auditory-perceptual studies and evaluate programs that use Clear Speech to increase intelligibility.

Improving transformer-based speech recognition performance using data augmentation by local frame rate changes (로컬 프레임 속도 변경에 의한 데이터 증강을 이용한 트랜스포머 기반 음성 인식 성능 향상)

  • Lim, Seong Su;Kang, Byung Ok;Kwon, Oh-Wook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.2
    • /
    • pp.122-129
    • /
    • 2022
  • In this paper, we propose a method to improve the performance of Transformer-based speech recognizers using data augmentation that locally adjusts the frame rate. First, the start time and length of the part to be augmented in the original voice data are randomly selected. Then, the frame rate of the selected part is changed to a new frame rate by using linear interpolation. Experimental results using the Wall Street Journal and LibriSpeech speech databases showed that the convergence time took longer than the baseline, but the recognition accuracy was improved in most cases. In order to further improve the performance, various parameters such as the length and the speed of the selected parts were optimized. The proposed method was shown to achieve relative performance improvement of 11.8 % and 14.9 % compared with the baseline in the Wall Street Journal and LibriSpeech speech databases, respectively.

A Phonetics Based Design of PLU Sets for Korean Speech Recognition (한국어 음성인식을 위한 음성학 기반의 유사음소단위 집합 설계)

  • Hong, Hye-Jin;Kim, Sun-Hee;Chung, Min-Hwa
    • MALSORI
    • /
    • no.65
    • /
    • pp.105-124
    • /
    • 2008
  • This paper presents the effects of different phone-like-unit (PLU) sets in order to propose an optimal PLU set for the performance improvement of Korean automatic speech recognition (ASR) systems. The examination of 9 currently used PLU sets indicates that most of them include a selection of allophones without any sufficient phonetic base. In this paper, a total of 34 PLU sets are designed based on Korean phonetic characteristics arid the effects of each PLU set are evaluated through experiments. The results show that the accuracy rate of each phone is influenced by different phonetic constraint(s) which determine(s) the PLU sets, and that an optimal PLU set can be anticipated through the phonetic analysis of the given speech data.

  • PDF

A Study on Korean Digit Recognition by Using Phoneme Boundary Information (음소경계 정보를 이용한 한국어 숫자음 인식에 관한 연구)

  • Choi Goan Mook;Lim Dong Chul;Lee Haing Sei
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.117-120
    • /
    • 2001
  • Recognition rate of Korean digit is lower than that of other words because it is composed of similar phonemes. In this paper, a new method is proposed for the improvement of recognition rate by using the phoneme boundary information. In addition, the proposed method rarely increase cost because phoneme boundary is found by using simple method. We experimented with speech data of one man and then obtained results of enhanced speech recognition rate.

  • PDF

Voice Recognition Performance Improvement using the Convergence of Voice signal Feature and Silence Feature Normalization in Cepstrum Feature Distribution (음성 신호 특징과 셉스트럽 특징 분포에서 묵음 특징 정규화를 융합한 음성 인식 성능 향상)

  • Hwang, Jae-Cheon
    • Journal of the Korea Convergence Society
    • /
    • v.8 no.5
    • /
    • pp.13-17
    • /
    • 2017
  • Existing Speech feature extracting method in speech Signal, there are incorrect recognition rates due to incorrect speech which is not clear threshold value. In this article, the modeling method for improving speech recognition performance that combines the feature extraction for speech and silence characteristics normalized to the non-speech. The proposed method is minimized the noise affect, and speech recognition model are convergence of speech signal feature extraction to each speech frame and the silence feature normalization. Also, this method create the original speech signal with energy spectrum similar to entropy, therefore speech noise effects are to receive less of the noise. the performance values are improved in signal to noise ration by the silence feature normalization. We fixed speech and non speech classification standard value in cepstrum For th Performance analysis of the method presented in this paper is showed by comparing the results with CHMM HMM, the recognition rate was improved 2.7%p in the speech dependent and advanced 0.7%p in the speech independent.

Robust Speech Recognition with Car Noise based on the Wavelet Filter Banks (웨이블렛 필터뱅크를 이용한 자동차 소음에 강인한 고립단어 음성인식)

  • Lee, Dae-Jong;Kwak, Keun-Chang;Ryu, Jeong-Woong;Chun, Myung-Geun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.12 no.2
    • /
    • pp.115-122
    • /
    • 2002
  • This paper proposes a robust speech recognition algorithm based on the wavelet filter banks. Since the proposed algorithm adopts a multiple band decision-making scheme, it performs robustness for noise as the presence of noisy severely degrades the performance of speech recognition system. For evaluating the performance of the proposed scheme, we compared it with the conventional speech recognizer based on the VQ for the 10-isolated korean digits with car noise. Here, the proposed method showed more 9~27% improvement of the recognition rate than the conventional VQ algorithm for the various car noisy environments.

Improvement of Speech Recognition Performance in Running Car by Considering Wind Noise (바람잡음을 고려한 자동차에서의 음성인식 성능 향상)

  • Lee, Ki-Hoon;Lee, Chul-Hee;Kim, Chong-Kyo
    • Proceedings of the KSPS conference
    • /
    • 2004.05a
    • /
    • pp.231-234
    • /
    • 2004
  • This paper describes an efficient method for improving the noise-robustness in speech recognition in a running car by considering wind noise. In driving car, mainly three kind of noises engine noise, tire noise and wind noise, are severely affect recognition performance. Especially wind noise is an important factor in driving car with window opened. We analyzed wind noise in various driving conditions that are 60, 80, 100 km/h with window fully opened, window half opened. We clarified that the recognition rate is significantly degenerated when the wind noise components in the frequency range above 200 Hz are large. We developed a preprocessing method to improve the noise robustness despite of wind noise. We adaptively changed the cutoff frequency of the front-end high-pass filter from 100 through 200 Hz according to the level of the wind noise components. By this method, the recognition rate is considerably improved for all kind of driving conditions

  • PDF

Vector Quantizer Based Speaker Normalization for Continuos Speech Recognition (연속음성 인식기를 위한 벡터양자화기 기반의 화자정규화)

  • Shin Ok-keun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.8
    • /
    • pp.583-589
    • /
    • 2004
  • Proposed is a speaker normalization method based on vector quantizer for continuous speech recognition (CSR) system in which no acoustic information is made use of. The proposed method, which is an improvement of the previously reported speaker normalization scheme for a simple digit recognizer, builds up a canonical codebook by iteratively training the codebook while the size of codebook is increased after each iteration from a relatively small initial size. Once the codebook established, the warp factors of speakers are estimated by comparing exhaustively the warped versions of each speaker's utterance with the codebook. Two sets of phones are used to estimate the warp factors: one, a set of vowels only. and the other, a set composed of all the Phonemes. A Piecewise linear warping function which corresponds to the estimated warp factor is adopted to warp the power spectrum of the utterance. Then the warped feature vectors are extracted to be used to train and to test the speech recognizer. The effectiveness of the proposed method is investigated by a set of recognition experiments using the TIMIT corpus and HTK speech recognition tool kit. The experimental results showed comparable recognition rate improvement with the formant based warping method.