• Title/Summary/Keyword: 음성명료도

Search Result 189, Processing Time 0.031 seconds

A Study on Improving Voice Quality and Pitch Searching of the VSELP Coder (VSELP 부호화기의 음질 및 주기탐색 개선에 관한 연구)

  • 성기철;문상재
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.4
    • /
    • pp.740-749
    • /
    • 1994
  • This paper presents method for improving the performance of the VSELP speech coder. The hybrid method is employed for pitch period searching. Pitch searching time is reduced and pitch detection error, caused by quantization error of excitation signal of encoder in VSELP coder, is reduced by this method. This paper also adopts a pitch period enhancement filter and an adaptive first order filter. In this result, pitch period searching time is reduced to 26%, and MOS of reconstructed speech signal is increased by 3.19 to 4.04.

  • PDF

The Comparison of Room Acoustics of Elementary School Classrooms (초등학교 교실의 음환경 비교연구)

  • Moon Kyu-Chun;Park Kye-Kyun;Yeon Chul-Ho;Haan Chan-Hoon
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.293-296
    • /
    • 2001
  • 1990년대 중반 이후 우리나라 초등학교 교실은 열린교육과 다양성을 추구하는 새로운 열린교실을 표방하고 진행되고 있다. 또한 건축설계에 있어서도 과거의 종합 표준설계에 의하지 않고 각 학교마다 별도의 건축설계에 따라 지어지고 있다. 본 논문의 목적은 현재의 초등학교 교서의 차음환경을 조사하고 새로이 지어진 교사의 교실과 열린교실의 실내 음환경을 조사하여 현재 상황을 분석평가하고 이에 대한 대안을 제시하고자 하는 것이다. 이를 위하여 건축시공연대가 다른 초등학교 3개 학교를 선택하여 단위교실에서의 실내음향인자(RT60, C80, D50, RASTI)와 인접교실과의 차음성능(TL)을 측정하였다. 또한 해당교실의 학생과 교사에 대하여 설문조사를 실시하였다. 각 측정은 외부와 복도에 인접한 창문의 개폐여부와 실내의 위치별로 이루어 졌다. 실험결과, 열린교육에 따라 가변적 교실의 구성은 기존의 교실보다 차음성능이 평균$5\sim6dB$ 낮은 것으로 나타났다. 또한 신축교사의 RASTI값(0.73)이 10년 이상 된 학교(0.70)나 열린교실(0.64)보다 높은 것으로 나타났다. 음성명료도(D50) 역시 일반 신축교사가 열린교실 보다 높게 나타났다. 이것은 실의 기밀과 내부바닥의 마감자재로 비롯되었으며 최근의 열린교실은 음의 차음과 실내명료도에서 열악한 것으로 결론되었다.

  • PDF

Characteristics of voice quality on clear versus casual speech in individuals with Parkinson's disease (명료발화와 보통발화에서 파킨슨병환자 음성의 켑스트럼 및 스펙트럼 분석)

  • Shin, Hee-Baek;Shim, Hee-Jeong;Jung, Hun;Ko, Do-Heung
    • Phonetics and Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.77-84
    • /
    • 2018
  • The purpose of this study is to examine the acoustic characteristics of Parkinsonian speech, with respect to different utterance conditions, by employing acoustic/auditory-perceptual analysis. The subjects of the study were 15 patients (M=7, F=8) with Parkinson's disease who were asked to read out sentences under different utterance conditions (clear/casual). The sentences read out by each subject were recorded, and the recorded speech was subjected to cepstrum and spectrum analysis using Analysis of Dysphonia in Speech and Voice (ADSV). Additionally, auditory-perceptual evaluation of the recorded speech was conducted with respect to breathiness and loudness. Results indicate that in the case of clear speech, there was a statistically significant increase in the cepstral peak prominence (CPP), and a decrease in the L/H ratio SD (ratio of low to high frequency spectral energy SD) and CPP F0 SD values. In the auditory-perceptual evaluation, a decrease in breathiness and an increase in loudness were noted. Furthermore, CPP was found to be highly correlated to breathiness and loudness. This provides objective evidence of the immediate usefulness of clear speech intervention in improving the voice quality of Parkinsonian speech.

The Effects of Voice and Speech Intelligibility Improvements in Parkinson Disease by Training Loudness and Pitch: A Case Study (강도 및 음도 조절을 이용한 훈련이 파킨슨병 환자의 음성 및 발화명료도 개선에 미치는 효과: 사례연구)

  • Lee, Ok-Bun;Jeong, Ok-Ran;Ko, Do-Heung
    • Speech Sciences
    • /
    • v.8 no.3
    • /
    • pp.173-184
    • /
    • 2001
  • The purpose of this study was to examine the effects of manipulating loudness and pitch in terms of speech intelligibility and voice of a patient with Parkinson's Disease. The subject, who was diagnosed as a patient with Parkinson's disease 11 years ago, demonstrated a severely breath voice with low intensity. The accuracy of articulation in consonants was intelligible only at the single word level, and the overall intelligibility in continuous speech was low. The results showed that the subject's articulation accuracy and speech intelligibility was significantly improved after having loudness and pitch training. Habitual Fo, Jitter, Shimmer, Fo tremor, Amp tremor were decreased after training. In addition, the value of HNR also increased after training. It was shown that the changes of these acoustic parameters were closely related to the decrease of breathiness in Parkinson's voice, and this decrease of breathiness affected speech intelligibility considerably. Based on the experimental results, it was claimed that the vocal training by manipulating the loudness and pitch could be highly effective in improving the voice quality and speech intelligibility in Parkinson's Disease.

  • PDF

Artificial speech bandwidth extension technique based on opus codec using deep belief network (심층 신뢰 신경망을 이용한 오푸스 코덱 기반 인공 음성 대역 확장 기술)

  • Choi, Yoonsang;Li, Yaxing;Kang, Sangwon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.36 no.1
    • /
    • pp.70-77
    • /
    • 2017
  • Bandwidth extension is a technique to improve speech quality, intelligibility and naturalness, extending from the 300 ~ 3,400 Hz narrowband speech to the 50 ~ 7,000 Hz wideband speech. In this paper, an Artificial Bandwidth Extension (ABE) module embedded in the Opus audio decoder is designed using the information of narrowband speech to reduce the computational complexity of LPC (Linear Prediction Coding) and LSF (Line Spectral Frequencies) analysis and the algorithm delay of the ABE module. We proposed a spectral envelope extension method using DBN (Deep Belief Network), one of deep learning techniques, and the proposed scheme produces better extended spectrum than the traditional codebook mapping method.

Adaptive Noise Reduction using Standard Deviation of Wavelet Coefficients in Speech Signal (웨이브렛 계수의 표준편차를 이용한 음성신호의 적응 잡음 제거)

  • 황향자;정광일;이상태;김종교
    • Science of Emotion and Sensibility
    • /
    • v.7 no.2
    • /
    • pp.141-148
    • /
    • 2004
  • This paper proposed a new time adapted threshold using the standard deviations of Wavelet coefficients after Wavelet transform by frame scale. The time adapted threshold is set up using the sum of standard deviations of Wavelet coefficient in cA3 and weighted cDl. cA3 coefficients represent the voiced sound with low frequency and cDl coefficients represent the unvoiced sound with high frequency. From simulation results, it is demonstrated that the proposed algorithm improves SNR and MSE performance more than Wavelet transform and Wavelet packet transform does. Moreover, the reconstructed signals by the proposed algorithm resemble the original signal in terms of plosive sound, fricative sound and affricate sound but Wavelet transform and Wavelet packet transform reduce those sounds seriously.

  • PDF

Speech Intelligibility of Alaryngeal Voices and Pre/Post Operative Evaluation of Voice Quality using the Speech Recognition Program(HUVOIS) (음성인식프로그램을 이용한 무후두 음성의 말 명료도와 병적 음성의 수술 전후 개선도 측정)

  • Kim, Han-Su;Choi, Seong-Hee;Kim, Jae-In;Lee, Jae-Yol;Choi, Hong-Shik
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.15 no.2
    • /
    • pp.92-97
    • /
    • 2004
  • Background and Objectives : The purpose of this study was to examine objectively pre and post operative voice quality evaluation and intelligibility of alaryngeal voice using speech recognition program, HUVOIS. Materials and Methods : 2 laryngologists and 1 speech pathologist were evaluated 'G', 'R', 'B' in the GRBAS sclae and speech intelligibility using NTID rating scale from standard paragraph. And also acoustic estimates such as jitter, shimmer, HNR were obtained from Lx Speech Studio. Results : Speech recognition rate was not significantly different between pre and post operation for pathological vocie samples though voice quality(G, B) and acoustic values(Jitter, HNR) were significantly improved after post operation. In Alaryngeal voices, reed type electrolarynx 'Moksori' was the highest both speech intelligibility and speech recognition rate, whereas esophageal speech was the lowest. Coefficient correlation of speech intelligibility and speech recognition rate was found in alaryngeal voices, but not in pathological voices. Conclusion : Current study was not proved speech recognition program, HUVOIS during telephone program was not objective and efficient method for assisting subjective GRBAS scale.

  • PDF

Speech Reinforcement Based on G.729A Speech Codec Parameter Under Near-End Background Noise Environments (근단 배경 잡음 환경에서 G.729A 음성부호화기 파라미터에 기반한 새로운 음성 강화 기법)

  • Choi, Jae-Hun;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.4
    • /
    • pp.392-400
    • /
    • 2009
  • In this paper, we propose an effective speech reinforcement technique base on ITU-T G.729A CS-ACELP codec under the near-end background noise environments. In general, since the intelligibility of the far-end speech for the near-end listener is significantly reduced under near-end noise environments, we require a far-end speech reinforcement approach to avoid this phenomena. In contrast to the conventional speech reinforcement algorithm, we reinforce the excitation signal of the codec's parameters received from the far-end speech signal based on the G.729A speech codec under various background noise environments. Specifically, we first estimate the excitation signal of ambient noise at the near-end through the encoder of the G.729A speech codec, reinforcing the excitation signal of the far-end speech transmitted from the far-end. we specially propose a novel approach to directly reinforce the excitation signal of far-end speech signal based on the decoder of the G.729A. The performance of the proposed algorithm is evaluated by the CCR (Comparison Category Rating) test of the method for subjective determination of transmission quality in ITU-T P.800 under various noise environments and shows better performances compared with conventional SNR Recovery methods.

Corpus-based Korean Text-to-speech Conversion System (콜퍼스에 기반한 한국어 문장/음성변환 시스템)

  • Kim, Sang-hun; Park, Jun;Lee, Young-jik
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.3
    • /
    • pp.24-33
    • /
    • 2001
  • this paper describes a baseline for an implementation of a corpus-based Korean TTS system. The conventional TTS systems using small-sized speech still generate machine-like synthetic speech. To overcome this problem we introduce the corpus-based TTS system which enables to generate natural synthetic speech without prosodic modifications. The corpus should be composed of a natural prosody of source speech and multiple instances of synthesis units. To make a phone level synthesis unit, we train a speech recognizer with the target speech, and then perform an automatic phoneme segmentation. We also detect the fine pitch period using Laryngo graph signals, which is used for prosodic feature extraction. For break strength allocation, 4 levels of break indices are decided as pause length and also attached to phones to reflect prosodic variations in phrase boundaries. To predict the break strength on texts, we utilize the statistical information of POS (Part-of-Speech) sequences. The best triphone sequences are selected by Viterbi search considering the minimization of accumulative Euclidean distance of concatenating distortion. To get high quality synthesis speech applicable to commercial purpose, we introduce a domain specific database. By adding domain specific database to general domain database, we can greatly improve the quality of synthetic speech on specific domain. From the subjective evaluation, the new Korean corpus-based TTS system shows better naturalness than the conventional demisyllable-based one.

  • PDF

Automatic severity classification of dysarthria using voice quality, prosody, and pronunciation features (음질, 운율, 발음 특징을 이용한 마비말장애 중증도 자동 분류)

  • Yeo, Eun Jung;Kim, Sunhee;Chung, Minhwa
    • Phonetics and Speech Sciences
    • /
    • v.13 no.2
    • /
    • pp.57-66
    • /
    • 2021
  • This study focuses on the issue of automatic severity classification of dysarthric speakers based on speech intelligibility. Speech intelligibility is a complex measure that is affected by the features of multiple speech dimensions. However, most previous studies are restricted to using features from a single speech dimension. To effectively capture the characteristics of the speech disorder, we extracted features of multiple speech dimensions: voice quality, prosody, and pronunciation. Voice quality consists of jitter, shimmer, Harmonic to Noise Ratio (HNR), number of voice breaks, and degree of voice breaks. Prosody includes speech rate (total duration, speech duration, speaking rate, articulation rate), pitch (F0 mean/std/min/max/med/25quartile/75 quartile), and rhythm (%V, deltas, Varcos, rPVIs, nPVIs). Pronunciation contains Percentage of Correct Phonemes (Percentage of Correct Consonants/Vowels/Total phonemes) and degree of vowel distortion (Vowel Space Area, Formant Centralized Ratio, Vowel Articulatory Index, F2-Ratio). Experiments were conducted using various feature combinations. The experimental results indicate that using features from all three speech dimensions gives the best result, with a 80.15 F1-score, compared to using features from just one or two speech dimensions. The result implies voice quality, prosody, and pronunciation features should all be considered in automatic severity classification of dysarthria.