• Title/Summary/Keyword: speech quality

Search Result 808, Processing Time 0.026 seconds

Noise Reduction Using the Standard Deviation of the Time-Frequency Bin and Modified Gain Function for Speech Enhancement in Stationary and Nonstationary Noisy Environments

  • Lee, Soo-Jeong;Kim, Soon-Hyob
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.3E
    • /
    • pp.87-96
    • /
    • 2007
  • In this paper we propose a new noise reduction algorithm for stationary and nonstationary noisy environments. Our algorithm classifies the speech and noise signal contributions in time-frequency bins, and is not based on a spectral algorithm or a minimum statistics approach. It relies on calculating the ratio of the standard deviation of the noisy power spectrum in time-frequency bins to its normalized time-frequency average. We show that good quality can be achieved for enhancement speech signal by choosing appropriate values for ${\delta}_t\;and\;{\delta}_f$. The proposed method greatly reduces the noise while providing enhanced speech with lower residual noise and somewhat higher mean opinion score (MOS), background intrusiveness (BAK) and signal distortion (SIG) scores than conventional methods.

Speech Enhancement Using the Adaptive Noise Canceling Technique with a Recursive Time Delay Estimator (재귀적 지연추정기를 갖는 적응잡음제거 기법을 이용한 음성개선)

  • 강해동;배근성
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.31B no.7
    • /
    • pp.33-41
    • /
    • 1994
  • A single channel adaptive noise canceling (ANC) technique with a recursive time delay estimator (RTDE) is presented for removing effects of additive noise on the speech signal. While the conventional method makes a reference signal for the adaptive filter using the pitch estimated on a frame basis from the input speech, the proposed method makes the reference signal using the delay estimated recursively on a sample-by-sample basis. As the RTDEs, the recursion formulae of autocorrelation function (ACF) and average magnitude difference function (AMDF) are derived. The normalized least mean square (NLMS) and recursive least square (RLS) algorithms are applied for adaptation of filter coefficients. Experimental results with noisy speech demonstrate that the proposed method improves the perceived speech quality as well as the signal-to-noise ratio and cepstral distance when compared with the conventional method.

  • PDF

Tandemless Transcoding for AMR and EVRC Speech Coders (AMR과 EVRC 음성 부호화기간의 비탠덤 방식을 이용한 상호 부호화)

  • 이선일;유창동
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.6
    • /
    • pp.531-542
    • /
    • 2002
  • Novel tandemless transcoding method for AMR and EVRC speech coders is proposed in this paper. In contrast to conventional tandem method, the parameters which is used commonly in speech coder where CELP algorithm is adapted are directly transcoded. The proposed algorithm is composed of LSP transcoding, pitch delay transcoding, gains transcoding and fixed codebook vector transcoding Evaluation results show that the novel algorithm achieves better speech quality than tandem method and reduce computational complexity and delay.

An acoustic study on the alaryngeal voice using the Multi-Speech (Multi-Speech를 통한 후두적출자의 발성에 대한 음향학적 분석)

  • Noh Dongwoo;Paik Euna;Kang Sookyoon
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.133-137
    • /
    • 2003
  • The purpose of this study was to provide acoustic data on the voice of the laryngectomized patients for more scientific and efficient voice rehabilitation. The phonation of prolonged /a/ of 9 electronic artificial larynx(AL) users, 5 esophageal(EP) speech users, and 2 tracheo-esophageal(TEP) voice users were recorded and analyzed using Multi-Speech. Habitual f0, mean f0, sd f0, max f0, min f0, jitter, shimmer, and NHR were compared among groups of subjects using t-test. The EP and TEP groups exhibited higher f0 compared to the AL group. The AL and TEP groups showed more stable f0 than the EP group. In addition, the quality of TEP and EP voices were comparatively better in terms of jitter, shimmer, and NHR.

  • PDF

Speech Production Characteristics of Congenitally Deaf Children with Cochlear Implant (선천성심도 청각장애 아동의 와우이식 후 말산출 특성)

  • Yoon, Mi-Sun
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.302-304
    • /
    • 2007
  • The purpose of this study was to evaluate speech production ability of congenitally deaf children with cochlear implant. Forty children were participated in the study. The results are following: (1) mean of speech intelligibility score was 3.05 in 5 point scale, (2) mean of percent of correct vowels was 86.19%, and mean of percent of correct consonants was 74.89%, and (3) voice profiles showed their voice were high pitched, hypernasal, and breathy. But 12.5% of the children were evaluated as having normal voice quality. Overall speech production abilities of children with cochlear implant were superior than the deaf children's result reported in literatures. Meanwhile their abilities were not same as children with normal hearing.

  • PDF

Multi-Pulse Amplitude and Location Estimation by Maximum-Likelihood Estimation in MPE-LPC Speech Synthesis (MPE-LPC음성합성에서 Maximum- Likelihood Estimation에 의한 Multi-Pulse의 크기와 위치 추정)

  • 이기용;최홍섭;안수길
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.26 no.9
    • /
    • pp.1436-1443
    • /
    • 1989
  • In this paper, we propose a maximum-likelihood estimation(MLE) method to obtain the location and the amplitude of the pulses in MPE( multi-pulse excitation)-LPC speech synthesis using multi-pulses as excitation source. This MLE method computes the value maximizing the likelihood function with respect to unknown parameters(amplitude and position of the pulses) for the observed data sequence. Thus in the case of overlapped pulses, the method is equivalent to Ozawa's crosscorrelation method, resulting in equal amount of computation and sound quality with the cross-correlation method. We show by computer simulation: the multi-pulses obtained by MLE method are(1) pseudo-periodic in pitch in the case of voicde sound, (2) the pulses are random for unvoiced sound, (3) the pulses change from random to periodic in the interval where the original speech signal changes from unvoiced to voiced. Short time power specta of original speech and syunthesized speech obtained by using multi-pulses as excitation source are quite similar to each other at the formants.

  • PDF

A Study on Multi-Pulse Speech Coding Method by Using Individual Pitch Information (개별 피치정보를 이용한 멀티펄스 음성부호화 방식에 관한 연구)

  • Lee, See-Woo
    • The Journal of the Korea Contents Association
    • /
    • v.6 no.2
    • /
    • pp.59-64
    • /
    • 2006
  • In this paper, 1 propose a new method of Multi-Pulse Coding(IP-MPC) use individual pitch pulses in order to accommodate the changes in each pitch interval and reduce pitch errors. The extraction rate of individual pitch pulses was $85\%$ for female voice and $96\%$ for male voice respectively, 1 evaluate the MPC by using pitch information of autocorrelation method and the IP-MPC by using individual pitch pulses. As a result, 1 knew that synthesis speech of the IP-MPC was better in speech quality than synthesis speech of the MPC.

  • PDF

A Speech Coder using the Simplified Multi-mode Method (단순화된 다중 모드 방법을 이용한 음성 부호화기)

  • 강홍구
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1995.06a
    • /
    • pp.146-149
    • /
    • 1995
  • This paper proposes a SM-CELP speech coder which applies different excitation signal according to the characteristic of speech segment at bit-rate below 4 kbps. Speech signal is divided with 2 modes such as stationary voice and etc. using the parameters of average energy of the short-time speech and the residual signal after long term prediction. Structured multi-pulse method is used for the excitation of mode-A and gaussian or pulse-like codebook for mode-B. 4.8kbps DoD-CELP are used to evaluate the performance of the proposed coder. As a result, the propose method shows 1~2 dB higher segmental signal to noise ratio and better subjectional quality without increasing the computational amount.

  • PDF

Pitch Modification based on a Voice Source Model (음원 모델에 기초한 합성음의 피치 조절)

  • Choi, Yong-Jin;Yeo, Su-Jin;Kim, Jin-Young;Sung, Koeng-Mo
    • Speech Sciences
    • /
    • v.3
    • /
    • pp.132-147
    • /
    • 1998
  • Previously developed methods for pitch modification have not been based on the voice source model. Therefore, the synthesized speech often sounds unnatural although it may be highly intelligible. The purpose of this paper is to analyze the alteration of a voice source signal with pitch period and to establish the pitch-modification rule based on the result of this analysis. We examine the alteration of the interval of closing phase, closed phase and open phase using the excitation waveform as the pitch increases. In comparison to the previous methods which performed directly on the speech signal, the pitch modification method based on a voice source model shows high intelligibility and naturalness. This study might benefit the application to the speaker identification and the voice color conversion. Therefore the proposed method will provide high quality synthetic speech.

  • PDF

Machine Scoring Methods Highly-correlated with Human Ratings in Speech Recognizer Detecting Mispronunciation of Foreign Language (한국인의 외국어 발화오류검출 음성인식기에서 청취판단과 상관관계가 높은 기계 스코어링 기법)

  • Bae, Min-Young;Kwon, Chul-Hong
    • Speech Sciences
    • /
    • v.11 no.2
    • /
    • pp.217-226
    • /
    • 2004
  • An automatic pronunciation correction system provides users with correction guidelines for each pronunciation error. For this purpose, we develop a speech recognition system which automatically classifies pronunciation errors when Koreans speak a foreign language. In this paper, we propose a machine scoring method for automatic assessment of pronunciation quality by the speech recognizer. Scores obtained from an expert human listener are used as the reference to evaluate the different machine scores and to provide targets when training some of algorithms. We use a log-likelihood score and a normalized log-likelihood score as machine scoring methods. Experimental results show that the normalized log-likelihood score had higher correlation with human scores than that obtained using the log-likelihood score.

  • PDF