• 제목/요약/키워드: Speech quality

검색결과 803건 처리시간 0.027초

The Optimum Fuzzy Vector Quantizer for Speech Synthesis

  • Lee, Jin-Rhee-;Kim, Hyung-Seuk-;Ko, Nam-kon;Lee, Kwang-Hyung-
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 1993년도 Fifth International Fuzzy Systems Association World Congress 93
    • /
    • pp.1321-1325
    • /
    • 1993
  • This paper investigates the use of Fuzzy vector quantizer(FVQ) in speech synthesis. To compress speech data, we employ K-means algorithm to design codebook and then FVQ technique is used to analysize input speech vectors based on the codebook in an analysis part. In FVQ synthesis part, analysis data vectors generated in FVQ analysis is used to synthesize the speech. We have fined that synthesized speech quality depends on Fuzziness values in FVQ, and the optimum fuzziness values maximized synthesized speech SQNR are related with variance values of input speech vectors. This approach is tested on a sentence, and we compare synthesized speech by a convensional VQ with synthesized speech by a FVQ with optimum Fuzziness values.

  • PDF

반향 음성 신호의 하모닉 모델링을 이용한 음질 예측 알고리즘 (Speech Quality Estimation Algorithm using a Harmonic Modeling of Reverberant Signals)

  • 양재모;강홍구
    • 방송공학회논문지
    • /
    • 제18권6호
    • /
    • pp.919-926
    • /
    • 2013
  • 실내 환경에서 음성 신호는 음향 전달 함수에 의한 반향 신호를 포함한다. 이때 반향의 정도나 반향에 의한 음질 변화를 예측하는 것은 반향 제거 알고리즘 등에서 중요한 정보를 제공한다. 본 논문은 음성 신호의 하모닉 모델링 기법을 이용한 반향 환경에서의 자동 음질 예측 기법을 제안하다. 제안한 방법에서는 반향을 포함하는 음성 신호에 대한 하모닉 모델링 기법이 가능함을 보이고, 모델링된 하모닉 성분과 나머지 성분 사이의 통계적인 비율을 예측한다. 예측된 비율은 일반적인 방 환경에서의 음질 측정 표준 파라미터와 비 교하였다. 실험 결과 제안된 방법은 다양한 반향 환경 (반향 시간 0.2~1.0초)에서 표준 음질 파라미터를 정확하게 예측할 수 있음을 증명하였다.

Enhanced Spectral Hole Substitution for Improving Speech Quality in Low Bit-Rate Audio Coding

  • Lee, Chang-Heon;Kang, Hong-Goo
    • The Journal of the Acoustical Society of Korea
    • /
    • 제29권3E호
    • /
    • pp.131-139
    • /
    • 2010
  • This paper proposes a novel spectral hole substitution technique for low bit-rate audio coding. The spectral holes frequently occurring in relatively weak energy bands due to zero bit quantization result in severe quality degradation, especially for harmonic signals such as speech vowels. The enhanced aacPlus (EAAC) audio codec artificially adjusts the minimum signal-to-mask ratio (SMR) to reduce the number of spectral holes, but it still produces noisy sound. The proposed method selectively predicts the spectral shapes of hole bands using either intra-band correlation, i.e. harmonically related coefficients nearby or inter-band correlation, i.e. previous frames. For the bands that have low prediction gain, only the energy term is quantized and spectral shapes are replaced by pseudo random values in the decoding stage. To minimize perceptual distortion caused by spectral mismatching, the criterion of the just noticeable level difference (JNLD) and spectral similarity between original and predicted shapes are adopted for quantizing the energy term. Simulation results show that the proposed method implemented into the EAAC baseline coder significantly improves speech quality at low bit-rates while keeping equivalent quality for mixed and music contents.

The Effects of Pitch Increasing Training (PIT) on Voice and Speech of a Patient with Parkinson's Disease: A Pilot Study

  • Lee, Ok-Bun;Jeong, Ok-Ran;Shim, Hong-Im;Jeong, Han-Jin
    • 음성과학
    • /
    • 제13권1호
    • /
    • pp.95-105
    • /
    • 2006
  • The primary goal of therapeutic intervention in dysarthric speakers is to increase the speech intelligibility. Decision of critical features to increase the intelligibility is very important in speech therapy. The purpose of this study is to know the effects of pitch increasing training (PIT) on speech of a subject with Parkinson's disease (PD). The PIT program is focused on increasing pitch while a vowel is sustained with the same loudness. The loudness level is somewhat higher than that of the habitual loudness. A 67-year-old female with PD participated in the study. Speech therapy was conducted for 4 sessions (200 minutes) for one week. Before and after the treatment, acoustic, perceptual and speech naturalness evaluation was peformed for data analysis. Speech and voice satisfaction index (SVSI) was obtained after the treatment. Results showed Improvements in voice quality and speech naturalness. In addition, the patient's satisfaction ratings (SVSI) indicated a positive relationship between improved speech production and their (the patient and care-givers) satisfaction.

  • PDF

Classical Tamil Speech Enhancement with Modified Threshold Function using Wavelets

  • Indra., J;Kasthuri., N;Navaneetha Krishnan., S
    • Journal of Electrical Engineering and Technology
    • /
    • 제11권6호
    • /
    • pp.1793-1801
    • /
    • 2016
  • Speech enhancement is a challenging problem due to the diversity of noise sources and their effects in different applications. The goal of speech enhancement is to improve the quality and intelligibility of speech by reducing noise. Many research works in speech enhancement have been accomplished in English and other European Languages. There has been limited or no such works or efforts in the past in the context of Tamil speech enhancement in the literature. The aim of the proposed method is to reduce the background noise present in the Tamil speech signal by using wavelets. New modified thresholding function is introduced. The proposed method is evaluated on several speakers and under various noise conditions including White Gaussian noise, Babble noise and Car noise. The Signal to Noise Ratio (SNR), Mean Square Error (MSE) and Mean Opinion Score (MOS) results show that the proposed thresholding function improves the speech enhancement compared to the conventional hard and soft thresholding methods.

Determining the Relative Differences of Emotional Speech Using Vocal Tract Ratio

  • Wang, Jianglin;Jo, Cheol-Woo
    • 음성과학
    • /
    • 제13권1호
    • /
    • pp.109-116
    • /
    • 2006
  • In this paper, our study focuses on obtaining the differences of emotional speech in three different vocal tract sections. The vocal tract area was computed from the area function of the emotional speech. The total vocal tract was divided into 3 sections (vocal fold section, middle section and lip section) to acquire the differences in each vocal tract section of emotional speech. The experiment data include 6 emotional speeches from 3 males and 3 females. The 6 emotions consist of neutral, happiness, anger, sadness, fear and boredom. The measured difference is computed by the ratio through comparing each emotional speech with the normal speech. The experimental results present that there is not a remarkable difference at lip section, but the fear and sadness have a great change at the vocal fold part.

  • PDF

MPEG-4TTS 현황 및 전망

  • 한민수
    • 전자공학회지
    • /
    • 제24권9호
    • /
    • pp.91-98
    • /
    • 1997
  • Text-to-Speech(WS) technology has been attracting a lot of interest among speech engineers because of its own benefits. Namely, the possible application areas of talking computers, emergency alarming systems in speech, speech output devices for speech-impaired, and so on. Hence, many researchers have made significant progresses in the speech synthesis techniques in the sense of their own languages and as a result, the quality of current speech synthesizers are believed to be acceptable to normal users. These are partly why the MPEG group had decided to include the WS technology as one of its MPEG-4 functionalities. ETRI has made major contributions to the current MPEG-4 775 appearing in various MPEG-4 documents with relatively minor contributions from AT&T and NW. Main MPEG-4 functionalities presently available are; 1) use of original prosody for synthesized speech output, 2) trick mode functions for general users without breaking synthesized speech prosody, 3) interoperability with Facial Animation(FA) tools, and 4) dubbing a moving/anlmated picture with lip-shape pattern informations.

  • PDF

난청인의 주파수 선택도와 비대칭적 청각 필터를 고려한 난청 시뮬레이터 개발에 관한 연구 (A Study on Development of a Hearing Impairment Simulator considering Frequency Selectivity and Asymmetrical Auditory Filter of the Hearing Impaired)

  • 주상익;강현덕;송영록;이상민
    • 전기학회논문지
    • /
    • 제59권4호
    • /
    • pp.831-840
    • /
    • 2010
  • In this paper, we propose a hearing impairment simulator considering reduced frequency selectivity and asymmetrical auditory filter of the hearing impaired, and we verified the reduced frequency selectivity and asymmetrical auditory filter affected in speech perception through experiments. The reduced frequency selectivity has made embodied by spectral smearing using LPC(linear prediction coding). The shapes of auditory filter are asymmetrical different with each center frequency. Hearing impaired person which has hearing loss was differently changed with that of normal hearing people and it has different value for speech of quality through auditory filter. The experiments confirmed subjective test and objective test. The subjective experiments are composed of 4 kinds of tests: pure tone test, SRT(speech reception threshold) test, and WRS(word recognition score) test without spectral smearing, and WRS test with spectral smearing. The experiment of the hearing impairment simulator was performed from 9 subjects who have normal ears. The amount of spectral smearing was controlled by LPC order. The asymmetrical auditory filter of proposed hearing impairment simulator was simulated and then some tests to estimate the filter's performance objectively were performed. The objective experiment as simulated auditory filter's performance evaluation method used PESQ(perceptual evaluation of speech quality) and LLR(log likelihood ratio) for speech through auditory filter. The processed speech was evaluated objective speech quality and distortion using PESQ and LLR value. When hearing loss processed, PESQ and LLR value have big difference according to asymmetrical auditory filter in hearing impairment simulator.

G.723.1 음성부호화기와 EVRC 음성부호화기의 상호 부호화 알고리듬 (An Efficient Transcoding Algorithm For G.723.1 and EVRC Speech Coders)

  • 김경태;정성교;윤성완;박영철;윤대희;최용수;강태익
    • 한국통신학회논문지
    • /
    • 제28권5C호
    • /
    • pp.548-554
    • /
    • 2003
  • 서로 다른 음성 부호화기를 사용하는 유/무선 통신망의 연동에서 각 음성 패킷간 효율적인 변환 과정이 필요하다. 이러한 패킷 변환 가정을 위해서 과거에는 이중 부/복호화 방식을 이용하였다. 그러나, 두 음성 부호화기가 이중 부/복호화 방식으로 연동될 경우, 음질 저하 및 계산량 증가, 부가적인 전달 지연 등의 문제가 발생한다. 이 논문에서는 유/무선 통신 시스템에서 널리 사용되는 ITU-T G.723.1[1]과 TIA IS-127 EVRC(Enhanced-Variable-Rate-Codec)[2]음성부호화기 간의 효과적인 연동을 위한 상호부호화 알고리듬을 제안하였다. 제안된 상호부호화 알고리듬은 크게 LSP(Line-Spectrum-Pairs) 변환, 개회로 피치 변환, 고속 적응코드북 검색, 고속 고정코드북 검색의 네 부분으로 나뉘어 진다. TMS320C62x DSP를 사용하여 구현해 본 결과, 제안된 상호부호화 알고리듬이 기존의 이중 부/복호화 과정에 비해 30%∼35% 정도 계산량을 개선하며, 적은 지연 시간으로 동등한 주/객관적 음질을 제공함을 확인하였다.

다중 주파수 밴드 간섭함수와 스펙트럼 차감법을 이용한 음성 향상 시스템 (Speech enhancement system using the multi-band coherence function and spectral subtraction method)

  • 오인규;이인성
    • 한국음향학회지
    • /
    • 제38권4호
    • /
    • pp.406-413
    • /
    • 2019
  • 본 논문은 두 개의 마이크로폰 환경에서 다중 주파수 대역 이득함수와 주파수 차감법을 결합하여 배경잡음을 억제하는 방법을 제안하였다. 다중 주파수 대역 신호대잡음비 추정을 통해 이득 함수를 얻는 음성 향상 방법은 두 채널 간 잡음신호의 상관성이 큰 경우 잡음 제거 성능이 떨어지는 단점을 가지고 있다. 하나의 채널 에서 스펙트럼 차감법을 통해 얻은 이득함수와 간섭함수 기반의 신호대잡음비 추정을 통해서 얻은 이득함수를 결합하여 가중된 이득함수를 사용하는 음성 향상 방법을 제안하였다. 제안된 방법은 ITU-T(International Telecommunications Union Telecommunication)의 객관적인 품질 평가 방법인 PESQ(Perceptual Evaluation of Speech Quality) 시험과 스펙트로그램을 사용하여 성능 평가 되어졌고 PESQ시험에서 최대 MOS 0.217의 음질 향상을 얻을 수 있었다.