• 제목/요약/키워드: Speech rate

검색결과 1,242건 처리시간 0.029초

음성 부호기용 채널 부호화기의 구현 및 성능 분석 (Channel Coder Implementation and Performance Analysis for Speech Coding: Considering bit Importance of Speech Information-part III)

  • 강법주;김선영;김상천;김영식
    • 대한전자공학회논문지
    • /
    • 제27권4호
    • /
    • pp.484-490
    • /
    • 1990
  • In speech coding scheme, because information bits have different error sensitivities over channel errors, the channel coder for combining with speech coding should be realized by the variable coding rate considering the bit importance of speech information bits. In realizing the 4 kbps channel coder for 12kbps speech, this paper have chosen the channel coding method by analyzing the hard-decision post-decoding error rate of RCPC(Rate Compatible Punctured Convolutional) codes and bit error sensitivity of 12 kbps speech. Under the coherent QPSK and Rayleigh fading channel, the performance analysis has showed that 10dB gain was obtained in speech SEGSNR by 4-level uneuqal error protection, which was compared with the caseof no channel coding at 7dB channel SNR.

  • PDF

분산음성인식 환경에서 서버에서의 스케일러블 고품질 음성복원 (Scalable High-quality Speech Reconstruction in Distributed Speech Recognition Environments)

  • 윤재삼;김홍국;강병옥
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2007년도 하계종합학술대회 논문집
    • /
    • pp.423-424
    • /
    • 2007
  • In this paper, we propose a scalable high-quality speech reconstruction method for distributed speech recognition (DSR). It is difficult to reconstruct speech of high quality with MFCCs at the DSR server. Depending on the bit-rate available by the DSR system, we can send additional information associated with speech coding to the DSR sorrel, where the bit-rate is variable from 4.8 kbit/s to 11.4 kbit/s. The experimental results show that the speech quality reproduced by the proposed method when the bit-rate is 11.4 kbit/s is comparable with that of ITU-T G.729 under both ideal channel and frame error channel conditions while the performance of DSR is maintained to that of wireline speech recognition.

  • PDF

치료 받은 말더듬 성인의 느린 구어에서 나타나는 휴지 특성 (Pauses Characteristics in Slowed Speech of Treated Stutterer)

  • 전희숙
    • 음성과학
    • /
    • 제15권4호
    • /
    • pp.189-197
    • /
    • 2008
  • In the process of speech therapy, fluency is acquired and speech rate increases in the process when the behavioral modification strategy, inducing speech fluency by making speech rate slower intentionally in an early stage, is applied. Therefore, the purpose of this study was to investigate the pause characteristics in slowed speech intentionally of treated stutterer. In this study, 10 developmental stutterers who had well established fluency in speech were involved. We had collected each 200 syllables sample of intentionally much slowed speech and a little slowed one in reading task. To measure the features of pause, total frequency of pauses, total durations of pauses, average duration of pauses and proportions of pause were investigated. The findings were as follows: Both the total durations and total frequency of pauses of much slowed speech were higher than that of a little slowed one. However, both the average duration and proportions of pauses of much slowed speech were not significantly higher than that of a little slowed one.

  • PDF

W-CDMA 시스템을 위한 가변율 음성코덱 설계 (Design of a variable rate speech codec for the W-CDMA system)

  • 정우성
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1998년도 제15회 음성통신 및 신호처리 워크샵(KSCSP 98 15권1호)
    • /
    • pp.142-147
    • /
    • 1998
  • Recently, 8 kb/s CS-ACELP coder of G.729 is atandardized by ITU-T SG15 and it has been reported that the speech quality of G729 is better than or equal to that of 32kb/s ADPCM. However G.729 is the fixed rate speech coder, and it does not consider the property of voice activity in mutual conversation. If we use the voice activity, we can reduce the average bit rate in half without any degradations of the speech quality. In this paper, we propose an efficient variable rate algorithm for G.729. The variable rate algorithm consists of two main subjects, the rate determination algorithm and algorithm, we combine the energy-thresholding method, the phonetic segmentation method by integration of various feature parameters obtained through the analysis procedure, and the variable hangover period method. Through the analysis of noise features, the 1 kb/s sub rate coder is designed for coding the background noise signal. So, we design the 4 kb/s sub rate coder for the unvoiced parts. The performance of the variable rate algorithm is evaluated by the comparison of speed quality and average bit rate with G.729. Subjective quality test is also done by MOS test. Conclusively, it is verified that the proposed variable rate CS-ACELP coder produced the same speech quality as G.729, at the average bit rate of 4.4 kb/s.

  • PDF

LPC 켑스트럼 거리 기반의 천이구간 정보를 이용한 음성의 가변적인 시간축 변환 (Variable Time-Scale Modification of Speech Using Transient Information based on LPC Cepstral Distance)

  • 이성주;김희동;김형순
    • 음성과학
    • /
    • 제3권
    • /
    • pp.167-176
    • /
    • 1998
  • Conventional time-scale modification methods have the problem that as the modification rate gets higher the time-scale modified speech signal becomes less intelligible, because they ignore the effect of articulation rate on speech characteristics. Results of research on speech perception show that the timing information of transient portions of a speech signal plays an important role in discriminating among different speech sounds. Inspired by this fact, we propose a novel scheme for modifying the time-scale of speech. In the proposed scheme, the timing information of the transient portions of speech is preserved, while the steady portions of speech are compressed or expanded somewhat excessively for maintaining overall time-scale change. In order to identify the transient and steady portions of a speech signal, we employ a simple method using LPC cepstral distance between neighboring frames. The result of the subjective preference test indicates that the proposed method produces performance superior to that of the conventional SOLA method, especially for very fast playback case.

  • PDF

Digital enhancement of pronunciation assessment: Automated speech recognition and human raters

  • Miran Kim
    • 말소리와 음성과학
    • /
    • 제15권2호
    • /
    • pp.13-20
    • /
    • 2023
  • This study explores the potential of automated speech recognition (ASR) in assessing English learners' pronunciation. We employed ASR technology, acknowledged for its impartiality and consistent results, to analyze speech audio files, including synthesized speech, both native-like English and Korean-accented English, and speech recordings from a native English speaker. Through this analysis, we establish baseline values for the word error rate (WER). These were then compared with those obtained for human raters in perception experiments that assessed the speech productions of 30 first-year college students before and after taking a pronunciation course. Our sub-group analyses revealed positive training effects for Whisper, an ASR tool, and human raters, and identified distinct human rater strategies in different assessment aspects, such as proficiency, intelligibility, accuracy, and comprehensibility, that were not observed in ASR. Despite such challenges as recognizing accented speech traits, our findings suggest that digital tools such as ASR can streamline the pronunciation assessment process. With ongoing advancements in ASR technology, its potential as not only an assessment aid but also a self-directed learning tool for pronunciation feedback merits further exploration.

끝점 검출 알고리즘에 관한 연구 (A Study on the Endpoint Detection Algorithm)

  • 양진우
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1984년도 추계학술발표회 논문집
    • /
    • pp.66-69
    • /
    • 1984
  • This paper is a study on the Endpoint Detection for Korean Speech Recognition. In speech signal process, analysis parameter was classification from Zero Crossing Rate(Z.C.R), Log Energy(L.E), Energy in the predictive error(Ep) and fundamental Korean Speech digits, /영/-/구/ are selected as date for the Recognition of Speech. The main goal of this paper is to develop techniques and system for Speech input ot machine. In order to detect the Endpoint, this paper makes choice of Log Energy(L.E) from various parameters analysis, and the Log Energy is very effective parameter in classifying speech and nonspeech segments. The error rate of 1.43% result from the analysis.

  • PDF

Acoustic characteristics of Motherese

  • Shim, Hee-Jeong;Lee, GeonJae;Hwang, JinKyung;Ko, Do-Heung
    • 말소리와 음성과학
    • /
    • 제6권4호
    • /
    • pp.189-194
    • /
    • 2014
  • Objective: This study aims to investigate the speech rate, the length of a pause, habitual pitch, and voice intensity of motherese. Subjects and Methods: The research participants comprised 20 mothers (mean age 33 years). Speech data were collected and analyzed using the Real-time Pitch software (KayPENTAX(R)). Results: The average speech rate was 5.33 syllables per second without their infant present and 4.26 syllables per second with their infant present. The average pause length was 1.09 s without their infant present and 1.56 s with their infant present. The average habitual pitch was 199.79 Hz without their infant present and 227.15 Hz with their infant present. The average voice loudness was 61.09 dB without their infant present and 64.49 dB with their infant present. Conclusion: This study presented clinical information for efficiently managing the speech therapy issues of infants and children. This includes proper acoustic and phonological information to recommend to main caregivers.

저 전송률 음성 부호화기를 위한 여기 신호 개선 알고리즘에 관한 연구 (Enhancement of Excitation in Low-bit-rate Speech Coders)

  • 이미숙;김홍국;최승호;김도영
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2003년도 신호처리소사이어티 추계학술대회 논문집
    • /
    • pp.57-60
    • /
    • 2003
  • In this paper, we propose a new excitation enhancement technique to improve the speech quality of low bit rate speech coders. The proposed technique is based on a harmonic model and it is employed only in the decoding process of speech coders without any additional bits. We develop the procedure of harmonic model parameters estimation and harmonic generation. and apply the technique to a current state of the art low bit rate speech coder, ITU-T G.729 Annex D. Also its performance is measured by using the ITU-T P.862 PESQ score and compared to those of the phase dispersion filter and the long-term postfilter applied to the decoded excitation. It is shown that the proposed excitation enhancement technique can improve the quality of decoded speech and provide better quality for male speech than other techniques.

  • PDF

마비성 조음장애의 임상적 양상에 관한 고찰 (Some Clinical Aspects of Dysarthria)

  • 김현기;김완호;서정환;홍기환;신효근;고도흥
    • 음성과학
    • /
    • 제3권
    • /
    • pp.38-49
    • /
    • 1998
  • Dysarthrias are a sort of neuromotor disorders because of the weakness of neuromotor controls. They are classified in six types on the basis of Mayo clinic research: flaccid, spastic, ataxic, hypokinetic, hypekinetic and mixed types. Five dysarthria types are investigated in this study. MRI, EMG, neuropathological tests are essential diagnostic processing. Visi-Pitch and Spectrgraphy, CSL are used for assessing dysarthria speech. Maximum phonation time, diadochokinetic rate, Voice Onset Time and substitution rate are the speech evaluation parameters. Maximum phonation time and diadochokinetic rates are the lowest in case of spastic and ataxic dysarthrias. Spastic dysarthria shows the substituted glottalized consonants. However, flaccid, ataxic and hypokinetic dysarthrias show the substituted aspirated consonants. VOT is the longest for hypokinetic dysarthria and the shortest for ataxic dysarthria. Jitter shows higher percentage in comparison with control group. Speech evaluation using experimental phonetic instruments help create on international standardization of speech evaluation for speech disorders.

  • PDF