• Title/Summary/Keyword: 음성명료도

Search Result 188, Processing Time 0.027 seconds

On a Processing Time Reduction of Cepstrum-Based Pitch Alteration in Time-Frequency Hybrid Domain (켑스트럼 기반 혼성영역 피치변경법의 처리시간 단축에 관한 연구)

  • Jo, Wang-Rae;Kim, Jong-Kuk;Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.1
    • /
    • pp.41-47
    • /
    • 2010
  • The pitch alteration technique for voice conversion is classified in time domain, frequency domain and hybrid domain. The Hybrid domain method has a merit of clearness and natural-ness of pitch altered speech but has the major drawback of long processing time. In this paper, we proposed a new method that can reduce the processing time of pitch alteration in time-frequency hybrid domain. We omitted the bit-reversing process of FFT and IFFT in changing the processing domain. Therefore we can reduce the processing time by 86.26% to the conventional method with same quality.

Minimum Classification Error Training to Improve Discriminability of PCMM-Based Feature Compensation (PCMM 기반 특징 보상 기법에서 변별력 향상을 위한 Minimum Classification Error 훈련의 적용)

  • Kim Wooil;Ko Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.1
    • /
    • pp.58-68
    • /
    • 2005
  • In this paper, we propose a scheme to improve discriminative property in the feature compensation method for robust speech recognition under noisy environments. The estimation of noisy speech model used in existing feature compensation methods do not guarantee the computation of posterior probabilities which discriminate reliably among the Gaussian components. Estimation of Posterior probabilities is a crucial step in determining the discriminative factor of the Gaussian models, which in turn determines the intelligibility of the restored speech signals. The proposed scheme employs minimum classification error (MCE) training for estimating the parameters of the noisy speech model. For applying the MCE training, we propose to identify and determine the 'competing components' that are expected to affect the discriminative ability. The proposed method is applied to feature compensation based on parallel combined mixture model (PCMM). The performance is examined over Aurora 2.0 database and over the speech recorded inside a car during real driving conditions. The experimental results show improved recognition performance in both simulated environments and real-life conditions. The result verifies the effectiveness of the proposed scheme for increasing the performance of robust speech recognition systems.

Speech syntheis engine for TTS (TTS 적용을 위한 음성합성엔진)

  • 이희만;김지영
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.23 no.6
    • /
    • pp.1443-1453
    • /
    • 1998
  • This paper presents the speech synthesis engine that converts the character strings kept in a computer memory into the synthesized speech sounds with enhancing the intelligibility and the naturalness by adapting the waveform processing method. The speech engine using demisyllable speech segments receives command streams for pitch modification, duration and energy control. The command based engine isolates the high level processing of text normalization, letter-to-sound and the lexical analysis and the low level processing of signal filtering and pitch processing. The TTS(Text-to-Speech) system implemented by using the speech synthesis engine has three independent object modules of the Text-Normalizer, the Commander and the said Speech Synthesis Engine those of which are easily replaced by other compatible modules. The architecture separating the high level and the low level processing has the advantage of the expandibility and the portability because of the mix-and-match nature.

  • PDF

Developing a Low Power BWE Technique Based on the AMR Coder (AMR 기반 저 전력 인공 대역 확장 기술 개발)

  • Koo, Bon-Kang;Park, Hee-Wan;Ju, Yeon-Jae;Kang, Sang-Won
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.4
    • /
    • pp.190-196
    • /
    • 2011
  • Bandwidth extension is a technique to improve speech quality and intelligibility, extending from 300-3400 Hz narrowband speech to 50-7000 Hz wideband speech. This paper designs an artificial bandwidth extension (ABE) module embedded in the AMR (adaptive multi-rate) decoder, reducing LPC/LSP analysis and algorithm delay of the ABE module. We also introduce a fast search codebook mapping method for ABE, and design a low power BWE technique based on the AMR decoder. The proposed ABE method reduces the computational complexity and the algorithm delay, respectively, by 28 % and 20 msec, compared to the traditional DTE (decode then extend) method. We also introduce a weighted classified codebook mapping method for constructing the spectral envelope of the wideband speech signal.

Assessment of Synthesized Speech by Text-to-Speech Conversion (Text-to-Speech 합성음 품질 평가)

  • 정유현
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1993.06a
    • /
    • pp.98-101
    • /
    • 1993
  • 본 논문은 한국전자통신연구소 음성응용연구실에서 개발한 문자-음성변환 시스팀(Text-to-Speech Conversion System)의 음질개선 연구의 일환으로 Phoneme-Balanced Words 110개에 대해서 개선전 시스팀(V.1)과 개선 후 시스팀(v.2)을 대상으로 각각 실시한 명료도 실험결과에 대하여 기술하고 있다. 본 실험의 목적은 연구개발자 입장에서 합성음 개선에 대한 정량적 성과 및 문제점 파악을 위한 진단형 평가이며 남자 5명, 여자 5명을 대상으로 1회 실시한 청취 실험결과 V.1에 대해서는 최저 37.3%(41개) ~ 최고 55.5%(61개)이고, V.2에 대해서는 최고 39.1%(43개) ~ 최고 60.9%(67개) 결과를 얻었다.

  • PDF

A comparison of techniques for measuring intelligibility of dysarthric speech : toward phonetic intelligibility testing in dysarthria. (뇌성마비 성인의 음소대조 낱말명료도와 문장명료도)

  • Kim Soo-Jin
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.141-144
    • /
    • 2002
  • The relations between words intelligibility and sentences intelligibility were tested on adults with cerebral palsy(athetoid type). Intelligibility is used as an important evaluation value in the field of diagnosis and therapy of dysarthric patients. In order to develop one syllable phonetic contrast intelligibility test using specific phonetic contrasts, the correlation with sentences intelligibility was tested to find out the validity. Pearson's simple correlation coefficient was .83 that shows a high correlation. Also, comparing the range and standard deviation given by seven evaluators on each subject, it was shown that when evaluating patients of moderate intelligibility, words intelligibility was more reliable than sentences intelligibility.

  • PDF

Effects of the Types of Noise and Signal-to-Noise Ratios on Speech Intelligibility in Dysarthria (소음 유형과 신호대잡음비가 마비말장애인의 말명료도에 미치는 영향)

  • Lee, Young-Mee;Sim, Hyun-Sub;Sung, Jee-Eun
    • Phonetics and Speech Sciences
    • /
    • v.3 no.4
    • /
    • pp.117-124
    • /
    • 2011
  • This study investigated the effects of the types of noise and signal to noise ratios (SNRs) on speech intelligibility of an adult with dysartrhia. Speech intelligibility was judged by 48 naive listeners using a word transcription task. Repeated measures design was used with the types of noise (multi-talker babble/environmental noise) and SNRs (0, +10 dB, +20 dB) as within-subject factors. The dependent measure was the percentage of correctly transcribed words. Results revealed that two main effects were statistically significant. Listeners performed significantly worse in the multi-talker babble condition than the environmental noise condition, and they performed significantly better at higher levels of SNRs. The current results suggested that the multi-talker babble and lower level of SNRs decreased the speech intelligibility of adults with dysarthria, and speech-language pathologists should consider environmental factors such as the types of noise and SNRs in evaluating speech intelligibility of adults with dysarthria.

  • PDF

The Effects of Speaking Mode on Intelligibility of Dysarthric Speech (뇌성마비 성인의 발화유형에 따른 명료도)

  • Kim, Soo-Jin;Ko, Hyun-Ju
    • Phonetics and Speech Sciences
    • /
    • v.1 no.4
    • /
    • pp.171-176
    • /
    • 2009
  • Intelligibility measurement is one criterion for the assessment of the severity of speech disorders especially of dysarthric persons. Rate control, usually rate reduction, is used with many dysarthric speakers to improve their intelligibility. The purpose of this study is to compare how change intelligibility of speech produced by cerebral palsic speakers according to three speaking conditions. Speech samples were collected from 10 adults with cerebral palsy were asked to speak under three speaking conditions-(1) naturally(control), (2) more slowly(rate control), (3) louder and accurately(clear speech). In a perception test, after listening to the speech samples, a group of three judges were to write down whatever they heard. The result showed that total cerebral palsic subjects were divided into two subgroups according to their intelligibility according to three speaking conditions. Some subjects showed that speech intelligibility increased greatly if asked to speak 'louder and more accurately'. and the others showed no difference of intelligibility according to the speaking conditions. This study suggested that it would be useful clinically to find out the best instruction to improve intelligibility suitable for each speaker with cerebral palsy.

  • PDF

Phonetic Contrasts of One-syllable Words and Speech Intelligibility in Adults with Hearing Impairments (청각장애 성인의 일음절 낱말대조 명료도 특성)

  • Kim Soo-Jin;Do Yeon-Ji
    • MALSORI
    • /
    • no.56
    • /
    • pp.1-13
    • /
    • 2005
  • This study examined the speech intelligibility of one-syllable words with phonetic contrasts and analyzed segmental factors that can predict the overall speech intelligibility in hearing-impaired adults. To identify the speech error characteristics, a Korean word list was audio-recorded by 7 hearing-impaired adults, and 35 listeners selected the heard word out of 5 choices. Based in part on previous studies of speech of the hearing impaired, the word list consisted of monosyllabic consonant-vowel-consonant (CVC) real word pairs. Stimulus words included 77 phonetic contrast pairs. The results showed that the percentage of errors in final position (coda) contrast was higher than in any other position in syllable. And the intelligibility deficit factors of phonetic contrast in the hearing-impaired were analyzed through stepwise regression analysis. The overall intelligibility was predicted by the error rate of manner contrast at coda, voicing contrast (homorganic triplets) at onset and high-low contrast at nucleus.

  • PDF

Comparing the Intelligibility of Spastic and Flaccid Types (경직형과 이완형 마비말장애의 명료도 비교)

  • Kim Soo-Jin
    • MALSORI
    • /
    • no.48
    • /
    • pp.1-17
    • /
    • 2003
  • Among the types of dysarthria, spastic and flaccid types are the most prominent manifestations. The objectives of the present research are (1) to discover the phonetic contrasts that differentiate spastic dysarthria from flaccid dysarthria, (2) to analyze the degrees of predictability of each phonetic contrast for intelligibility in spastic and flaccid dysarthrias and to compare them. The 'phonemic contrast word intelligibility pairs' for dysarthric speakers were tested and proved to be useful for clinical assessment of and research on dysarthria. In the group of spastic type, it showed that initial fricative vs. affricate and front vs. back vowel contrasts are transmitted relatively less effectively than flaccid type. In the group of flaccid type, initial glottal vs null contrast is transmitted less effectively than spastic type. The overall intelligibility of spastic dysarthria was predicted by multiple regression analysis with 88% accuracy by three phonetic contrasts(initial fricative vs. affricate; front vs. back vowels; initial consonant correlates). And the intelligibility of flaccid dysarthria was predicted by two phonetic contrasts(initial nasal vs. stop, front vs. back vowels) with 60% accuracy.

  • PDF