• Title/Summary/Keyword: speech quality

Search Result 807, Processing Time 0.028 seconds

Dynamic Excitation Modeling Scheme Applied for Variable Low Bit-Rate Homomorphic Vocoder (가변 저 전송율 호모몰픽 보코더에 응용된 동적 음원 모델링 기법)

  • 정재호
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.12
    • /
    • pp.2479-2488
    • /
    • 1994
  • In this paper, a new dynamic excitation modeling scheme is proposed. Based upon the proposed excitation modeling scheme, two variable bit rate homomorphic vocoders are designed, whose average bit rates are 3.8 Kbps and 4.4 Kbps. The performance of the proposed excitation modeling scheme is then evaluated through the subjective listening tests. In the tests, the performances of two speech coders designed in this paper ate compared with the one of 4.8 Kbps homomorphic vocoder designed by Chung and Schafer, in which conventional static excitation modeling scheme applied. The subjective listening tests show that proposed dynamic excitation modeling scheme improves synthesized speech quality while lowering the average bit rate of speech coders.

  • PDF

Diction Problem of Student Singers Based on the Vocal Tract Resonance (성도 공명을 중심으로 한 성악 전공 대학생의 발음법 연구)

  • Kim, Sun-Suk
    • Speech Sciences
    • /
    • v.7 no.4
    • /
    • pp.59-72
    • /
    • 2000
  • Vocal tract resonances are of paramount importance to voice sounds. Resonance frequencies determine vowel quality and the personal voice timber. The aim of this study was to make an effective diction program according to tuning formant frequencies by adjusting the vocal tract shape in professional voice users. Twelve male student singers and eleven female student singers participated in this study. The subjects repeated five simple vowels /a, e, i, o, u/ in normal speech and singing. The spoken vowels and sung vowels were measured by formant frequencies and the singer's formant frequencies using CSL and DSP Sona-Graph. Separately, Plot formants program was used to draw the vowel chart. The results were as follows. (1) Total formant frequencies of female singers were 11% higher than those of males singers in singing. (2) The F1 and F3 of sung vowels increased compared to F1 and F3 spoken vowels. However, The F2 of sung vowels decreased in comparison with F2 of spoken vowels. (3) Posterior vowel /u/ were moved anteriorly. This phenomenon seemed to be due to head voice singing training. (4) Singer's formant frequencies in student singers appeared according to the part: 2560 Hz for baritone, 2760 Hz for Tenor, 2821 Hz for Mezzo soprano and 3420 Hz for soprano.

  • PDF

Implementation of A Morphological Analyzer Based on Pseudo-morpheme for Large Vocabulary Speech Recognizing (대어휘 음성인식을 위한 의사형태소 분석 시스템의 구현)

  • 양승원
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.4 no.2
    • /
    • pp.102-108
    • /
    • 1999
  • It is important to decide processing unit in the large vocabulary speech recognition system we propose a Pseudo-Morpheme as the recognition unit to resolve the problems in the recognition systems using the phrase or the general morpheme. We implement a morphological analysis system and tagger for Pseudo-Morpheme. The speech processing system using this pseudo-morpheme can get better result than other systems using the phrase or the general morpheme. So, the quality of the whole spoken language translation system can be improved. The analysis-ratio of our implemented system is similar to the common morphological analysis systems.

  • PDF

Real-Time H/W Implementation of RPE-LTP Speech Coder for Digital Mobile Communications (디지틀 이동 통신용 RPE-LTP 음성 부호화기의 실시간 H/W 구현)

  • 김선영;김재공
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.16 no.1
    • /
    • pp.85-100
    • /
    • 1991
  • In the discussion of digital mobile communication systems the speech coder based on the high quality low bit rate is an essential part of topics to overcome the limited availability of radio spectrum, which will enhance the communication services. In this paper we present the implementation and performance evaluation of 13kbps RPE LTP speech coder. An implementation of a real time full duplex coder with 75% of DSP loading rate using a single DSP chip has been shown, and also the fixed point simulations for H/W implementation has been performed. Finally, analysis result for relative bit importance of each transmitting parameter has been shown for channel coding.

  • PDF

Acoustic analysis of fricatives in dysarthric speakers with cerebral palsy

  • Hernandez, Abner;Lee, Ho-young;Chung, Minhwa
    • Phonetics and Speech Sciences
    • /
    • v.11 no.3
    • /
    • pp.23-29
    • /
    • 2019
  • This study acoustically examines the quality of fricatives produced by ten dysarthric speakers with cerebral palsy. Previous similar studies tend to focus only on sibilants, but to obtain a better understanding of how dysarthria affects fricatives we selected a range of samples with different places of articulation and voicing. The Universal Access (UA) Speech database was used to select thirteen words beginning with one of the English fricatives (/f/, /v/, /s/, /z/, /ʃ/, /ð/). The following four measurements were taken for both dysarthric and healthy speakers: phoneme duration, mean spectral peak, variance and skewness. Results show that even speakers with mild dysarthria have significantly longer fricatives and a lower mean spectral peak than healthy speakers. Furthermore, mean spectral peak and variance showed significant group effects for both healthy and dysarthric speakers. Mean spectral peak and variance was also useful for discriminating several places of articulation for both groups. Lastly, spectral measurements displayed important group differences when taking severity into account. These findings show that in general there is a degradation in the production of fricatives for dysarthric speakers, but difference will depend on the severity of dysarthria along with the type of measurement taken.

Guidance to the Praat, a Software for Speech and Acoustic Analysis (음성 및 음향분석 프로그램 Praat의 임상적 활용법)

  • Seong, Cheol Jae
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.33 no.2
    • /
    • pp.64-76
    • /
    • 2022
  • Praat is a useful analysis tool for linguists, engineers, doctors, speech-language pathologits, music majors, and natural scientists. Basic parameters including duration, pitch, energy and perturbation parameters such as jitter and shimmer can be easily measured and manipulated in the sound editor. When a more in-depth analysis is needed, it is recommended to understand the advanced menus of the object window and learn how to use them. Among the object window menus, vowel formant analysis, spectrum analysis, and cepstrum analysis can be cited as useful ones in the clinical field. The spectrum object can be usefully used for voice quality measurement and diagnosis of patients with voice disorders by showing the energy distribution according to frequency axis (domain). A cepstrum object is useful for speech analysis when periodicity of the sound object is not measurable. The low to high ratio obtained from the spectral object and the CPPs measured from the cepstrum object have attracted many researchers, and it has been proven that the CPPs measured in Praat are relatively excellent.

A study on the improvement of generation speed and speech quality for a granularized emotional speech synthesis system (세밀한 감정 음성 합성 시스템의 속도와 합성음의 음질 개선 연구)

  • Um, Se-Yun;Oh, Sangshin;Jang, Inseon;Ahn, Chung-hyun;Kang, Hong-Goo
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2020.07a
    • /
    • pp.453-455
    • /
    • 2020
  • 본 논문은 시각 장애인을 위한 감정 음성 자막 서비스를 생성하는 종단 간(end-to-end) 감정 음성 합성 시스템(emotional text-to-speech synthesis system, TTS)의 음성 합성 속도를 높이면서도 합성음의 음질을 향상시키는 방법을 제안한다. 기존에 사용했던 전역 스타일 토큰(Global Style Token, GST)을 이용한 감정 음성 합성 방법은 다양한 감정을 표현할 수 있는 장점을 갖고 있으나, 합성음을 생성하는데 필요한 시간이 길고 학습할 데이터의 동적 영역을 효과적으로 처리하지 않으면 합성음에 클리핑(clipping) 현상이 발생하는 등 음질이 저하되는 양상을 보였다. 이를 보안하기 위해 본 논문에서는 새로운 데이터 전처리 과정을 도입하였고 기존의 보코더(vocoder)인 웨이브넷(WaveNet)을 웨이브알엔엔(WaveRNN)으로 대체하여 생성 속도와 음질 측면에서 개선됨을 보였다.

  • PDF

A Study on the Performance of Companding Algorithms for Digital Hearing Aid Users (디지털 보청기 사용자를 위한 압신 알고리즘의 성능 연구)

  • Hwang, Y.S.;Han, J.H.;Ji, Y.S.;Hong, S.H.;Lee, S.M.;Kim, D.W.;Kim, In-Young;Kim, Sun-I.
    • Journal of Biomedical Engineering Research
    • /
    • v.32 no.3
    • /
    • pp.218-229
    • /
    • 2011
  • Companding algorithms have been used to enhance speech recognition in noise for cochlea implant users. The efficiency of using companding for digital hearing aid users is not yet validated. The purpose of this study is to evaluate the performance of the companding for digital hearing aid users in the various hearing loss cases. Using HeLPS, a hearing loss simulator, two different sensorinerual hearing loss conditions were simulated; mild gently sloping hearing loss(HL1) and moderate to steeply sloping hearing loss(HL2). In addition, a non-linear compression was simulated to compensate for hearing loss using national acoustic laboratories-non-linear version 1(NAL-NL1) in HeLPS. In companding, the following four different companding strategies were used changing Q values(q1, q2) of pre-filter(F filter) and post filter(G filter). Firstly, five IEEE sentences which were presented with speech-shaped noise at different SNRs(0, 5, 10, 15 dB) were processed by the companding. Secondly, the processed signals were applied to HeLPS. For comparison, signals which were not processed by companding were also applied to HeLPS. For the processed signals, log-likelihood ratio(LLR) and cepstral distance(CEP) were measured for evaluation of speech quality. Also, fourteen normal hearing listeners performed speech reception threshold(SRT) test for evaluation of speech intelligibility. As a result of this study, the processed signals with the companding and NAL-NL1 have performed better than that with only NAL-NL1 in the sensorineural hearing loss conditions. Moreover, the higher ratio of Q values showed better scores in LLR and CEP. In the SRT test, the processed signals with companding(SRT = -13.33 dB SPL) showed significantly better speech perception in noise than those processed using only NAL-NL1(SRT = -11.56 dB SPL).

Classification of muscle tension dysphonia (MTD) female speech and normal speech using cepstrum variables and random forest algorithm (켑스트럼 변수와 랜덤포레스트 알고리듬을 이용한 MTD(근긴장성 발성장애) 여성화자 음성과 정상음성 분류)

  • Yun, Joowon;Shim, Heejeong;Seong, Cheoljae
    • Phonetics and Speech Sciences
    • /
    • v.12 no.4
    • /
    • pp.91-98
    • /
    • 2020
  • This study investigated the acoustic characteristics of sustained vowel /a/ and sentence utterance produced by patients with muscle tension dysphonia (MTD) using cepstrum-based acoustic variables. 36 women diagnosed with MTD and the same number of women with normal voice participated in the study and the data were recorded and measured by ADSVTM. The results demonstrated that cepstral peak prominence (CPP) and CPP_F0 among all of the variables were statistically significantly lower than those of control group. When it comes to the GRBAS scale, overall severity (G) was most prominent, and roughness (R), breathiness (B), and strain (S) indices followed in order in the voice quality of MTD patients. As these characteristics increased, a statistically significant negative correlation was observed in CPP. We tried to classify MTD and control group using CPP and CPP_F0 variables. As a result of statistic modeling with a Random Forest machine learning algorithm, much higher classification accuracy (100% in training data and 83.3% in test data) was found in the sentence reading task, with CPP being proved to be playing a more crucial role in both vowel and sentence reading tasks.

Prevalence of Voice Disorders and Characteristics of Korean Voice Handicap Index in the Elderly (노인 음성장애 출현율 및 음성장애지수 특성)

  • Song, Yun-Kyung
    • Phonetics and Speech Sciences
    • /
    • v.4 no.3
    • /
    • pp.151-159
    • /
    • 2012
  • The purpose of this study is to evaluate the prevalence of voice disorders and the Korean voice handicap index in the elderly. For this study, 169 elderly performed two types of questionnaires and vowel /a/ prolongation. Self-reported voice symptoms and the Korean voice handicap index were analyzed and acoustic voice evaluation was performed by MDVP. The results showed that the prevalence of voice disorders in the elderly are significantly higher than that of adults in self-reports. In acoustic evaluation, 32.2% of the male elderly and 40.9% of the female elderly exceeded the thresholds of Jitter (%), Shimmer (%) and NHR. In addition, Korean voice handicap index scores of the female elderly are significantly higher than those of female adults. These findings indicate the high frequency of voice disorders in the elderly and the need to focus on this group. Additional studies on the voice related quality of life for the elderly are needed.