• Title/Summary/Keyword: speech quality

Search Result 807, Processing Time 0.029 seconds

Voice Analysis before and after Swallowing a Raw Egg in Professional Voice Users (직업적 음성사용자에서 날달걀 먹기 전과 후의 음성 변화)

  • Kim, Kyung-A;Kwon, Soon-Bok;Kim, Sung-Won;Lee, Hyung-Shin;Hong, Jong-Cheol;Kim, Yong-Rok;Lee, Bong-Joo;Han, Yung-Jin;Yu, Tae-Hyun;Lee, Kang-Dae
    • Speech Sciences
    • /
    • v.14 no.2
    • /
    • pp.43-53
    • /
    • 2007
  • The purpose of this study was to observe the effect of eating a raw egg by professional or nonprofessional voice users on their voice quality and the duration of the effect. 20 professional voice users and 20 nonprofessional voice users participated in the experiment and they had gone through stroboscopy to have no vocal or laryngeal diseases. The voice exam was performed three times: before eating a raw egg (1st period), right after eating it (2nd period), and 10 minutes later (3rd period). By using Multi-dimensional Voice Program which is a software of Computerized Speech Lab 4500 as a voice analysis instrument, the authors checked the F0, Jitter, Shimmer, Noise to harmonic ratio (NHR), and Voice Range Profile (VRP). Results showed as follows: Firstly, vocal hygiene was good in 57.5% of the total subjects and was poor in 42.5%. 40% of professional voice users and 75% of nonprofessional voice users hand good quality. 77.5% of the total subjects had the vocal fatigue while 22.5% of the subjects did not. 95% of the professional voice users and 60% of nonprofessional voice users complained the vocal fatigue. 60% of the total subjects reported a subjective vocal symptom. 65.0% professional voice users and 70.0% of nonprofessional voice users reported a voice symptom. From the results above, we suggest that eating a raw egg may lead to imporve voice quality of the professional voice users.

  • PDF

A study on sound source segregation of frequency domain binaural model with reflection (반사음이 존재하는 양귀 모델의 음원분리에 관한 연구)

  • Lee, Chai-Bong
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.15 no.3
    • /
    • pp.91-96
    • /
    • 2014
  • For Sound source direction and separation method, Frequency Domain Binaural Model(FDBM) shows low computational cost and high performance for sound source separation. This method performs sound source orientation and separation by obtaining the Interaural Phase Difference(IPD) and Interaural Level Difference(ILD) in frequency domain. But the problem of reflection occurs in practical environment. To reduce this reflection, a method to simulate the sound localization of a direct sound, to detect the initial arriving sound, to check the direction of the sound, and to separate the sound is presented. Simulation results show that the direction is estimated to lie close within 10% from the sound source and, in the presence of the reflection, the level of the separation of the sound source is improved by higher Coherence and PESQ(Perceptual Evaluation of Speech Quality) and by lower directional damping than those of the existing FDBM. In case of no reflection, the degree of separation was low.

Robust Tree Coding Combined with Harmonic Scaling of Speech at 4.8 Kbps (견실한 배음 축척과 결합된 4.8KBPS 트리 음성부호기)

  • 강상원;이인성;한경호
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.12
    • /
    • pp.1806-1814
    • /
    • 1993
  • Efficient speech coders using tree coding combined with harmonic scaling are designed at the rate of 4.8 kilobitts/sec (kbps). A time domain harmonic scaling algorithm (TDHS) is used to compress input speech by a factor of two. This process allows the tree coder have 1.5 bits/sample for 4.8 kbps in the case of a 6.4 kHz sampling rate. In the backward adaptive tree coder, there are three components of the code generator, including a hybrid adaptive quantizer, a short-term predictor and a pitch predictor. The robustness of the tree coder is achieved by carefully choosing the input of the short term predictor adaptation. Also, inclusion of a smoother in the pitch predictor improves the error performance of tree coder in the noisy channel. Subjectively, tree coding combined with TDHS provides good quality speech at 4.8 kbps.

  • PDF

On a Pitch Alteration Technique in Time-Frequency Hybrid Domain for High Quality Prosody Control of Speech Signal (고음질 운율조절용 시간-주파수 혼성영역 피치변경법)

  • Lee, Sang-Hyo;Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.4
    • /
    • pp.106-109
    • /
    • 1997
  • In the area of the speech synthesis techniques, the waveform coding methods maintain the intelligibility and naturalness of synthetic speech. In order to apply the waveform coding techniques to synthesis by rule, however, we must be able to alter the pitches for prosody control of synthetic speech. In this paper, we propose a new pitch alteration technique in time-frequency hybrid domain, that compensates phase distortion of the cepstral pitch alteration method with time scaling method in the time domain. This method can remove some phase spectrum distortion which is occurred in conjunction point between the waveforms in continued frames. Also, we can obtain little magnitude spectrum distortion below 1.18% for pitch alteration of 200%.

  • PDF

Improvement of Overlapped Codebook Search in QCELP (QCELP에서 중첩된 코드북 검색의 개선)

  • 박광철;한승진;이정현
    • The KIPS Transactions:PartC
    • /
    • v.8C no.1
    • /
    • pp.105-112
    • /
    • 2001
  • In this paper, we present the advanced QCELP codebook search improving the qualification of speech, which can make QCELP vocoder used in noise robust system. While conventional QCELP usually searches stochastic codebook once, we can find that two times search is the most suitable for improving the quality of speech after we did 2-5 times search. Consequently, the advanced QCELP vocoder represents excitation signal in detail using two times precise quantization and so improve the qualification of speech. In our experiment, we use the speeches collected from circumstance (such as lecture room, house, street, laboratory etc.) without regarding noise as input dat and measure the speech Qualification using SNR, segSNR. As the result of the experiment, we find that the advanced QCELP makes SNR and segSNR improved by 38.35% and 65.51% respectively compared with conventional QCELP.

  • PDF

Formant frequency changes of female voice /a/, /i/, /u/ in real ear (실이에서 여자 음성 /ㅏ/, /ㅣ/, /ㅜ/의 포먼트 주파수 변화)

  • Heo, Seungdeok;Kang, Huira
    • Phonetics and Speech Sciences
    • /
    • v.9 no.1
    • /
    • pp.49-53
    • /
    • 2017
  • Formant frequencies depend on the position of tongue, the shape of lips, and larynx. In the auditory system, the external ear canal is an open-end resonator, which can modify the voice characteristics. This study investigates the effect of the real ear on formant frequencies. Fifteen subjects ranging from 22 to 30 years of age participated in the study. This study employed three corner vowels: the low central vowel /a/, the high front vowel /i/, and the high back vowel /u/. For this study, the voice of a well-educated undergraduate who majored in speech-language pathology, was recorded with a high performance condenser microphone placed in the upper pinna and in the ear canal. Paired t-test showed that there were significant difference in the formant frequencies of F1, F2, F3, and F4 between the free field and the real ear. For /a/, all formant frequencies decreased significantly in the real ear. For /i/, F2 increased and F3 and F4 decreased. For /u/, F1 and F2 increased, but F3 and F4 decreased. It seems that these voice modifications in the real ear contribute to interpreting voice quality and understanding speech, timbre, and individual characteristics, which are influenced by the shape of the outer ear and external ear canal in such a way that formant frequencies become centralized in the vowel space.

On a Pitch Change of the Waveform Coding by the Cepstrum Analysis of Speech Waveforms (켑스트럼 분석에 의한 파형부호화의 피치변경에 관한 연구)

  • Bae, Myung-Jin;Lee, Mi-Suk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.11 no.4
    • /
    • pp.14-21
    • /
    • 1992
  • The waveform coding is concerned with simply preserving the wave shape of speech signal through a redundancy reduction process. In area of the speech synthesis, the waveform codings with high quality are mainly used to the synthesis by analysis. However, because the parameters of this coding are not classified as either excitation parameters and vocal tract parameters, it is difficult to applying the waveform coding to the synthesis by rule. In this paper, we proposed a new pitch alternation method that can change the pitch periods in the waveform coding by using the cepstrum analysis. Thus, it is possible that the waveform coding is carried out the synthesis by rule in speech processing.

  • PDF

Characteristics of Phoniatrics in Patients with Spastic Dysarthria (경직형 마비말장애의 음성언어의학적 특성)

  • Kim, Sook-Hee;Kim, Hyun-Gi
    • Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.159-170
    • /
    • 2008
  • The purpose of this study was to find out the ability of coordination of the articulatory motor and the ability of control of the respiration and laryngeal for spastic dysarthria by acoustic analysis. The sustained of vowel /a/ and repetition of syllable /pa/ in 15 normal and 10 spastic dysarthria were measured. Multi-Speech, MDVP, and MSP were used for data recording and analysis. As a result, the mean DDK rate in the spastic group was significantly slower than in the normal. The maximum phonation time in the spastic group ($4.80{\pm}1.94$) was shorter than in the normal ($11.20{\pm}3.72$). The DDKjit in the spastic group was significantly higher than in the normal. The DDKsla was reduced in the spastic group. The mean syllable duration in the spastic group (146.2ms) was significantly longer than in the normal (75.8ms). The mean energy was reduced in the spastic group. The range of Fo was greater than in the normal. The frequency perturbation (jitter, vFo) and amplitude perturbation (shimmer, vAm) were higher than in the normal group. The NHR was higher than in the normal group. The parameters of this were significantly difference between the spastic dysarthria and the normal (p<0.05). Finally, the spastic dysarthria has short respiration, slow speech rate, and voice quality problem. The these results will help to establish a plan and the intervention of treatment.

  • PDF

The Phoneme Synthesis of Korean CV Mono-Syllables (한국어 CV단음절의 음소합성)

  • 안점영;김명기
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.11 no.2
    • /
    • pp.93-100
    • /
    • 1986
  • We analyzed Korean CV mono-syllables consisted of concatenation of consonants/k, t, p, g/, their fortis and rough sound and vowels/a, e, o, u, I/by the PARCOR technique, and then we synthesized those speech by means of the phoneme synthesis controlling the analyzed data. In the speech analysis, the duration of consonants decreases in the rough sound, the lenis and the fortis in turns. And also the gain of them decreases in the same tendency. The pitch period increases more and more in vowels following the rough sound, the fortis and the lenis in turns. We synthesized the lenis and the fortis by controlling the duration and the gain of the rough sound, and vowels following the fortis and the rough sound by controlling the pitch period and the duration of vowels following the lenis. As the results, the synthesized speech quality is good and we make certain it is possible to make a rule to the phonome synthesis in Korea speech.

  • PDF

Two-Channel Noise Reduction Using Beamforming and DOA-Based Masking (빔포밍 및 DOA 기반의 마스킹을 이용한 2채널 잡음제거)

  • Kim, Youngil;Jeong, Sangbae
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.1
    • /
    • pp.32-40
    • /
    • 2013
  • In this paper, we propose a multi-channel speech enhancement algorithm using beamforming and direction-of-arrival (DOA)-based masking. The proposed algorithm enhances noisy speech basically by the linearly constrained minimum variance (LCMV) algorithm and then a mel-scale Wiener filter designed using DOA-based masking is applied to remove still remaining noises. To improve the performance, we optimize the learning rate of the adaptive filters in LCMV and the DOA threshold to detect target speech spectrum. As performance indices, the perceptual evaluation of speech quality (PESQ) score and output SNRs are measured. Experimantal results show that the proposed algorithm outperforms the conventional LCMV beamformer by 0.09 in PESQ score and 5.75 dB in output SNR, respectively.