Search | Korea Science

Channel Coder Implementation and Performance Analysis for Speech Coding: Considering bit Importance of Speech Information-part III (음성 부호기용 채널 부호화기의 구현 및 성능 분석)

강법주;김선영;김상천;김영식
- Journal of the Korean Institute of Telematics and Electronics
- /
- v.27 no.4
- /
- pp.484-490
- /
- 1990
In speech coding scheme, because information bits have different error sensitivities over channel errors, the channel coder for combining with speech coding should be realized by the variable coding rate considering the bit importance of speech information bits. In realizing the 4 kbps channel coder for 12kbps speech, this paper have chosen the channel coding method by analyzing the hard-decision post-decoding error rate of RCPC(Rate Compatible Punctured Convolutional) codes and bit error sensitivity of 12 kbps speech. Under the coherent QPSK and Rayleigh fading channel, the performance analysis has showed that 10dB gain was obtained in speech SEGSNR by 4-level uneuqal error protection, which was compared with the caseof no channel coding at 7dB channel SNR.
PDF

Scalable High-quality Speech Reconstruction in Distributed Speech Recognition Environments (분산음성인식 환경에서 서버에서의 스케일러블 고품질 음성복원)

Yoon, Jae-Sam;Kim, Hong-Kook;Kang, Byung-Ok
- Proceedings of the IEEK Conference
- /
- 2007.07a
- /
- pp.423-424
- /
- 2007
In this paper, we propose a scalable high-quality speech reconstruction method for distributed speech recognition (DSR). It is difficult to reconstruct speech of high quality with MFCCs at the DSR server. Depending on the bit-rate available by the DSR system, we can send additional information associated with speech coding to the DSR sorrel, where the bit-rate is variable from 4.8 kbit/s to 11.4 kbit/s. The experimental results show that the speech quality reproduced by the proposed method when the bit-rate is 11.4 kbit/s is comparable with that of ITU-T G.729 under both ideal channel and frame error channel conditions while the performance of DSR is maintained to that of wireline speech recognition.
PDF

Pauses Characteristics in Slowed Speech of Treated Stutterer (치료 받은 말더듬 성인의 느린 구어에서 나타나는 휴지 특성)

Jeon, Hee-Sook
- Speech Sciences
- /
- v.15 no.4
- /
- pp.189-197
- /
- 2008
In the process of speech therapy, fluency is acquired and speech rate increases in the process when the behavioral modification strategy, inducing speech fluency by making speech rate slower intentionally in an early stage, is applied. Therefore, the purpose of this study was to investigate the pause characteristics in slowed speech intentionally of treated stutterer. In this study, 10 developmental stutterers who had well established fluency in speech were involved. We had collected each 200 syllables sample of intentionally much slowed speech and a little slowed one in reading task. To measure the features of pause, total frequency of pauses, total durations of pauses, average duration of pauses and proportions of pause were investigated. The findings were as follows: Both the total durations and total frequency of pauses of much slowed speech were higher than that of a little slowed one. However, both the average duration and proportions of pauses of much slowed speech were not significantly higher than that of a little slowed one.
PDF

Design of a variable rate speech codec for the W-CDMA system (W-CDMA 시스템을 위한 가변율 음성코덱 설계)

정우성
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1998.08a
- /
- pp.142-147
- /
- 1998
Recently, 8 kb/s CS-ACELP coder of G.729 is atandardized by ITU-T SG15 and it has been reported that the speech quality of G729 is better than or equal to that of 32kb/s ADPCM. However G.729 is the fixed rate speech coder, and it does not consider the property of voice activity in mutual conversation. If we use the voice activity, we can reduce the average bit rate in half without any degradations of the speech quality. In this paper, we propose an efficient variable rate algorithm for G.729. The variable rate algorithm consists of two main subjects, the rate determination algorithm and algorithm, we combine the energy-thresholding method, the phonetic segmentation method by integration of various feature parameters obtained through the analysis procedure, and the variable hangover period method. Through the analysis of noise features, the 1 kb/s sub rate coder is designed for coding the background noise signal. So, we design the 4 kb/s sub rate coder for the unvoiced parts. The performance of the variable rate algorithm is evaluated by the comparison of speed quality and average bit rate with G.729. Subjective quality test is also done by MOS test. Conclusively, it is verified that the proposed variable rate CS-ACELP coder produced the same speech quality as G.729, at the average bit rate of 4.4 kb/s.
PDF

Variable Time-Scale Modification of Speech Using Transient Information based on LPC Cepstral Distance (LPC 켑스트럼 거리 기반의 천이구간 정보를 이용한 음성의 가변적인 시간축 변환)

Lee, Sung-Joo;Kim, Hee-Dong;Kim, Hyung-Soon
- Speech Sciences
- /
- v.3
- /
- pp.167-176
- /
- 1998
Conventional time-scale modification methods have the problem that as the modification rate gets higher the time-scale modified speech signal becomes less intelligible, because they ignore the effect of articulation rate on speech characteristics. Results of research on speech perception show that the timing information of transient portions of a speech signal plays an important role in discriminating among different speech sounds. Inspired by this fact, we propose a novel scheme for modifying the time-scale of speech. In the proposed scheme, the timing information of the transient portions of speech is preserved, while the steady portions of speech are compressed or expanded somewhat excessively for maintaining overall time-scale change. In order to identify the transient and steady portions of a speech signal, we employ a simple method using LPC cepstral distance between neighboring frames. The result of the subjective preference test indicates that the proposed method produces performance superior to that of the conventional SOLA method, especially for very fast playback case.
PDF

Digital enhancement of pronunciation assessment: Automated speech recognition and human raters

Miran Kim
- Phonetics and Speech Sciences
- /
- v.15 no.2
- /
- pp.13-20
- /
- 2023
This study explores the potential of automated speech recognition (ASR) in assessing English learners' pronunciation. We employed ASR technology, acknowledged for its impartiality and consistent results, to analyze speech audio files, including synthesized speech, both native-like English and Korean-accented English, and speech recordings from a native English speaker. Through this analysis, we establish baseline values for the word error rate (WER). These were then compared with those obtained for human raters in perception experiments that assessed the speech productions of 30 first-year college students before and after taking a pronunciation course. Our sub-group analyses revealed positive training effects for Whisper, an ASR tool, and human raters, and identified distinct human rater strategies in different assessment aspects, such as proficiency, intelligibility, accuracy, and comprehensibility, that were not observed in ASR. Despite such challenges as recognizing accented speech traits, our findings suggest that digital tools such as ASR can streamline the pronunciation assessment process. With ongoing advancements in ASR technology, its potential as not only an assessment aid but also a self-directed learning tool for pronunciation feedback merits further exploration.
https://doi.org/10.13064/KSSS.2023.15.2.013 인용 PDF

A Study on the Endpoint Detection Algorithm (끝점 검출 알고리즘에 관한 연구)

양진우
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1984.12a
- /
- pp.66-69
- /
- 1984
This paper is a study on the Endpoint Detection for Korean Speech Recognition. In speech signal process, analysis parameter was classification from Zero Crossing Rate(Z.C.R), Log Energy(L.E), Energy in the predictive error(Ep) and fundamental Korean Speech digits, /영/-/구/ are selected as date for the Recognition of Speech. The main goal of this paper is to develop techniques and system for Speech input ot machine. In order to detect the Endpoint, this paper makes choice of Log Energy(L.E) from various parameters analysis, and the Log Energy is very effective parameter in classifying speech and nonspeech segments. The error rate of 1.43% result from the analysis.
PDF

Acoustic characteristics of Motherese

Shim, Hee-Jeong;Lee, GeonJae;Hwang, JinKyung;Ko, Do-Heung
- Phonetics and Speech Sciences
- /
- v.6 no.4
- /
- pp.189-194
- /
- 2014
Objective: This study aims to investigate the speech rate, the length of a pause, habitual pitch, and voice intensity of motherese. Subjects and Methods: The research participants comprised 20 mothers (mean age 33 years). Speech data were collected and analyzed using the Real-time Pitch software (KayPENTAX(R)). Results: The average speech rate was 5.33 syllables per second without their infant present and 4.26 syllables per second with their infant present. The average pause length was 1.09 s without their infant present and 1.56 s with their infant present. The average habitual pitch was 199.79 Hz without their infant present and 227.15 Hz with their infant present. The average voice loudness was 61.09 dB without their infant present and 64.49 dB with their infant present. Conclusion: This study presented clinical information for efficiently managing the speech therapy issues of infants and children. This includes proper acoustic and phonological information to recommend to main caregivers.
https://doi.org/10.13064/KSSS.2014.6.4.189 인용 PDF KSCI

Enhancement of Excitation in Low-bit-rate Speech Coders (저 전송률 음성 부호화기를 위한 여기 신호 개선 알고리즘에 관한 연구)

이미숙;김홍국;최승호;김도영
- Proceedings of the IEEK Conference
- /
- 2003.11a
- /
- pp.57-60
- /
- 2003
In this paper, we propose a new excitation enhancement technique to improve the speech quality of low bit rate speech coders. The proposed technique is based on a harmonic model and it is employed only in the decoding process of speech coders without any additional bits. We develop the procedure of harmonic model parameters estimation and harmonic generation. and apply the technique to a current state of the art low bit rate speech coder, ITU-T G.729 Annex D. Also its performance is measured by using the ITU-T P.862 PESQ score and compared to those of the phase dispersion filter and the long-term postfilter applied to the decoded excitation. It is shown that the proposed excitation enhancement technique can improve the quality of decoded speech and provide better quality for male speech than other techniques.
PDF

Some Clinical Aspects of Dysarthria (마비성 조음장애의 임상적 양상에 관한 고찰)

Kim, H.G.;Kim, W.H.;Seo, J.H.;Hong, K.H.;Shin, H.K.;Ko, D.H.
- Speech Sciences
- /
- v.3
- /
- pp.38-49
- /
- 1998
Dysarthrias are a sort of neuromotor disorders because of the weakness of neuromotor controls. They are classified in six types on the basis of Mayo clinic research: flaccid, spastic, ataxic, hypokinetic, hypekinetic and mixed types. Five dysarthria types are investigated in this study. MRI, EMG, neuropathological tests are essential diagnostic processing. Visi-Pitch and Spectrgraphy, CSL are used for assessing dysarthria speech. Maximum phonation time, diadochokinetic rate, Voice Onset Time and substitution rate are the speech evaluation parameters. Maximum phonation time and diadochokinetic rates are the lowest in case of spastic and ataxic dysarthrias. Spastic dysarthria shows the substituted glottalized consonants. However, flaccid, ataxic and hypokinetic dysarthrias show the substituted aspirated consonants. VOT is the longest for hypokinetic dysarthria and the shortest for ataxic dysarthria. Jitter shows higher percentage in comparison with control group. Speech evaluation using experimental phonetic instruments help create on international standardization of speech evaluation for speech disorders.
PDF

Search Result 1,242, Processing Time 0.031 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)