• Title/Summary/Keyword: Speech Rate

Search Result 1,241, Processing Time 0.035 seconds

Speech Rate and the Acoustic Features of Korean Segments (발화속도와 한국어 분절음의 음향학적 특성)

  • 이숙향;고현주
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.2
    • /
    • pp.162-172
    • /
    • 2004
  • This study investigates the following three things through a production experiment and acoustic analysis: 1) relationship between speech rate and the segment duration in Korean, 2) relationship between speech rate and spectral characteristics of vowels, i. e. undershoot, and 3) correlation between the vowel duration and undershoot. The results showed that the faster the speech rate nab, the shorter the duration of syllables and segments was. A few speakers were affected by speech rate in the durational ratios between closure and aspiration in a stop and between Towel and consonant in a syllable. Closure duration and vowel duration were more affected compared to aspiration and consonant duration, respectively. Speakers showed some differences in the extent to which speech rate affected vowel undershoot, implying that speakers used different production mechanisms for spectral characteristics of vowels: Some speakers speeded up movement of articulatory organs according to speech rate increase while some kept it constant regardless of speech rate change.

Speech Recognition Error Compensation using MFCC and LPC Feature Extraction Method (MFCC와 LPC 특징 추출 방법을 이용한 음성 인식 오류 보정)

  • Oh, Sang-Yeob
    • Journal of Digital Convergence
    • /
    • v.11 no.6
    • /
    • pp.137-142
    • /
    • 2013
  • Speech recognition system is input of inaccurate vocabulary by feature extraction case of recognition by appear result of unrecognized or similar phoneme recognized. Therefore, in this paper, we propose a speech recognition error correction method using phoneme similarity rate and reliability measures based on the characteristics of the phonemes. Phonemes similarity rate was phoneme of learning model obtained used MFCC and LPC feature extraction method, measured with reliability rate. Minimize the error to be unrecognized by measuring the rate of similar phonemes and reliability. Turned out to error speech in the process of speech recognition was error compensation performed. In this paper, the result of applying the proposed system showed a recognition rate of 98.3%, error compensation rate 95.5% in the speech recognition.

Google speech recognition of an English paragraph produced by college students in clear or casual speech styles (대학생들이 또렷한 음성과 대화체로 발화한 영어문단의 구글음성인식)

  • Yang, Byunggon
    • Phonetics and Speech Sciences
    • /
    • v.9 no.4
    • /
    • pp.43-50
    • /
    • 2017
  • These days voice models of speech recognition software are sophisticated enough to process the natural speech of people without any previous training. However, not much research has reported on the use of speech recognition tools in the field of pronunciation education. This paper examined Google speech recognition of a short English paragraph produced by Korean college students in clear and casual speech styles in order to diagnose and resolve students' pronunciation problems. Thirty three Korean college students participated in the recording of the English paragraph. The Google soundwriter was employed to collect data on the word recognition rates of the paragraph. Results showed that the total word recognition rate was 73% with a standard deviation of 11.5%. The word recognition rate of clear speech was around 77.3% while that of casual speech amounted to 68.7%. The reasons for the low recognition rate of casual speech were attributed to both individual pronunciation errors and the software itself as shown in its fricative recognition. Various distributions of unrecognized words were observed depending on each participant and proficiency groups. From the results, the author concludes that the speech recognition software is useful to diagnose each individual or group's pronunciation problems. Further studies on progressive improvements of learners' erroneous pronunciations would be desirable.

Enhanced source controlled variable bit-rate scheme in a waveform interpolation coder (Source controlled variable bit-rate scheme을 이용한 파형 보간 부호화기의 음질 개선 기법)

  • Cho, Keun-Seok;Yang, Hee-Sik;Jeong, Sang-Bae;Hahn, Min-Soo
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.315-318
    • /
    • 2007
  • This paper proposes the methods to enhance the speech quality of source controlled variable bit-rate coder based on the waveform interpolation. The methods are to estimate and generate the parameters that are not transmitted from encoder to decoder by the repetition and extrapolation schemes. For the performance evaluation, the PESQ(Perceptual Evaluation of Speech Quality) scores are measured. The experimental results shows that our proposed method outperforms the conventional source controlled variable bit-rate coder. Especially, the performance of the extrapolation method is better than that of the repetition method.

  • PDF

Coding Method of Variable Threshold Dual Rate ADPCM Speech Considering the Background Noise (배경 잡음환경에서 가변 임계값에 의한 Dual Rate ADPCM 음성 부호화 기법)

  • 한경호
    • Journal of the Korean Institute of Illuminating and Electrical Installation Engineers
    • /
    • v.17 no.6
    • /
    • pp.154-159
    • /
    • 2003
  • In this paper, we proposed variable threshold dual rate ADPCM coding method which adapts two coding rates of the standard ADPCM of ITU G.726 for speech quality improvement at a comparably low coding rates. The ZCR(Zero Crossing Rate) is computed for speecd data and under the noisy environment, noise data dominant region showed higher ZCR and speech data dominant region showed lower ZCR. The speech data with the higher ZCR is encoded by low coding rate for reduced coded data and the speech data with the lower ZCR is encoded by high coding rate for speech quality improvements. For coded data, 2 bits are assigned for low coding rate of 16[Kbps] and 5 bits are is assigned for high coding rate of 40[Kbps]. Through the simulation, the proposed idea is evaluated and shown that the variable dual rate ADPCM coding technique shows the qood speech quality at low coding rate.

Stuttering Reduction Rate during Sentence Reading: Choral Speech and Altered Auditory Feedback (문장읽기에서의 말더듬 감소율: 합독과 변조청각피드백)

  • Park, Jin;Park, Heeyoung
    • Phonetics and Speech Sciences
    • /
    • v.4 no.4
    • /
    • pp.109-115
    • /
    • 2012
  • This paper mainly aims to investigate how differently choral speech and altered auditory feedback (i.e., delayed auditory feedback, frequency-altered feedback) enhance speech fluency during sentence reading. To do this, a stuttering reduction rate was used and measured how much stuttering in frequency was reduced during each of the fluency enhancing conditions (i.e, typical choral reading, DAF, FAF) relative to typical solo reading. The results showed that stuttering frequency was reduced in the three fluency enhancing conditions and the highest mean value in stuttering reduction rate was observed during typical choral reading. Some discussion was provided in relation to the stuttering reduction rate observed during typical choral reading and its further speculation.

A Study on a Analysis and Comparison of Preprocessing Technique for the Speech Compression (음성압축을 위한 전처리기법의 비교 분석에 관한 연구)

  • Jang, Kyung-A;Min, So-Yeon;Bae, Myung-Jin
    • Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.125-136
    • /
    • 2003
  • Speech coding techniques have been studied to reduce the complexity and bit rate but also to improve the sound quality. CELP type vocoder, has used as a one of standard, supports the great sound quality even low bit rate. In this paper, the preprocessing of input speech to reduce the bit rate is the different with the conventional vocoder. The different kinds of parameter are used for the preprocessing so this paper is compared with theses parameters for finding the more appropriate parameter for the vocoder. The parameters are used to synthesize the speech not to encode or decode for coding technique so we proposed the simple algorithm not to have the influence on the processing time or the computation time. The parameters in used the preprocessing step are speaking rate, duration and PSOLA technique.

  • PDF

Speech Rate Analysis of Dysarthric Patients with Parkinson's Disease and Multiple System Atrophy (파킨슨병과 다계통위축증 환자군 간의 말속도 비교평가)

  • Kim, Hyang-Hee;Lee, Mi-Sook;Kim, Sun-Woo;Lee, Won-Yong
    • Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.221-227
    • /
    • 2003
  • Diadochokinetic (DDK) speech task has been utilized as an evaluating tool for speakers with dysarthria for many years. This study attempted to differently diagnose multiple system atrophy (MSA) from idiopathic Parkinson's disease (PD) using patients' performance of DDK (i.e., alternate motion rate (AMR)). The subjects included 11 cases of pathologically confirmed MSA and 16 IPD patients who commonly presented with parkinsonian syndrome. The speech sample of each patient was analyzed acoustically using the MSPTM(Motor Speech Profile, a module of CSL). The results showed that the average DDK rate was significantly faster in the IPD than the MSA groups in all three syllables (i.e., /puh/, /tuh/. and /kuh/). We propose the average DDK rate variable as a core clinical trait in differentiating the two pathological conditions.

  • PDF

Implementation of G.726 ADPCM Dual Rate Speech Codec of 16Kbps and 40Kbps (16Kbps와 40Kbps의 Dual Rate G.726 ADPCM 음성 codec구현)

  • Kim Jae-Oh;Han Kyong-Ho
    • Journal of IKEEE
    • /
    • v.2 no.2 s.3
    • /
    • pp.233-238
    • /
    • 1998
  • In this paper, the implementation of dual rate ADPCM using G.726 16Kbps and 40Kbps speech codec algorithm is handled. For small signals, the low rate 16Kbps coding algorithm shows almost the same SNR as the high rate 40Kbps coding algorithm , while the high rate 40Kbps coding algorithm shows the higher SNR than the low rate 16Kbps coding algorithm fur large signal. To obtain the good trade-off between the data rate and synthesized speech quality, we applied low rate 16Kbps for the small signal and high rate 40Kbps for the large signal. Various threshold values determining the rate are applied for good trade-off between data rate and speech quality. The simulation result shows the good speech quality at a low rate comparing with 16Kbps & 40Kbps.

  • PDF

Korean Broadcast News Transcription Using Morpheme-based Recognition Units

  • Kwon, Oh-Wook;Alex Waibel
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.1E
    • /
    • pp.3-11
    • /
    • 2002
  • Broadcast news transcription is one of the hardest tasks in speech recognition because broadcast speech signals have much variability in speech quality, channel and background conditions. We developed a Korean broadcast news speech recognizer. We used a morpheme-based dictionary and a language model to reduce the out-of·vocabulary (OOV) rate. We concatenated the original morpheme pairs of short length or high frequency in order to reduce insertion and deletion errors due to short morphemes. We used a lexicon with multiple pronunciations to reflect inter-morpheme pronunciation variations without severe modification of the search tree. By using the merged morpheme as recognition units, we achieved the OOV rate of 1.7% comparable to European languages with 64k vocabulary. We implemented a hidden Markov model-based recognizer with vocal tract length normalization and online speaker adaptation by maximum likelihood linear regression. Experimental results showed that the recognizer yielded 21.8% morpheme error rate for anchor speech and 31.6% for mostly noisy reporter speech.