• Title/Summary/Keyword: speech distortion

Search Result 227, Processing Time 0.024 seconds

An ACLMS-MPC Coding Method Integrated with ACFBD-MPC and LMS-MPC at 8kbps bit rate. (8kbps 비트율을 갖는 ACFBD-MPC와 LMS-MPC를 통합한 ACLMS-MPC 부호화 방식)

  • Lee, See-woo
    • Journal of Internet Computing and Services
    • /
    • v.19 no.6
    • /
    • pp.1-7
    • /
    • 2018
  • This paper present an 8kbps ACLMS-MPC(Amplitude Compensation and Least Mean Square - Multi Pulse Coding) coding method integrated with ACFBD-MPC(Amplitude Compensation Frequency Band Division - Multi Pulse Coding) and LMS-MPC(Least Mean Square - Multi Pulse Coding) used V/UV/S(Voiced / Unvoiced / Silence) switching, compensation in a multi-pulses each pitch interval and Unvoiced approximate-synthesis by using specific frequency in order to reduce distortion of synthesis waveform. In integrating several methods, it is important to adjust the bit rate of voiced and unvoiced sound source to 8kbps while reducing the distortion of the speech waveform. In adjusting the bit rate of voiced and unvoiced sound source to 8 kbps, the speech waveform can be synthesized efficiently by restoring the individual pitch intervals using multi pulse in the representative interval. I was implemented that the ACLMS-MPC method and evaluate the SNR of APC-LMS in coding condition in 8kbps. As a result, SNR of ACLMS-MPC was 15.0dB for female voice and 14.3dB for male voice respectively. Therefore, I found that ACLMS-MPC was improved by 0.3dB~1.8dB for male voice and 0.3dB~1.6dB for female voice compared to existing MPC, ACFBD-MPC and LMS-MPC. These methods are expected to be applied to a method of speech coding using sound source in a low bit rate such as a cellular phone or internet phone. In the future, I will study the evaluation of the sound quality of 6.9kbps speech coding method that simultaneously compensation the amplitude and position of multi-pulse source.

Speech Animation Synthesis based on a Korean Co-articulation Model (한국어 동시조음 모델에 기반한 스피치 애니메이션 생성)

  • Jang, Minjung;Jung, Sunjin;Noh, Junyong
    • Journal of the Korea Computer Graphics Society
    • /
    • v.26 no.3
    • /
    • pp.49-59
    • /
    • 2020
  • In this paper, we propose a speech animation synthesis specialized in Korean through a rule-based co-articulation model. Speech animation has been widely used in the cultural industry, such as movies, animations, and games that require natural and realistic motion. Because the technique for audio driven speech animation has been mainly developed for English, however, the animation results for domestic content are often visually very unnatural. For example, dubbing of a voice actor is played with no mouth motion at all or with an unsynchronized looping of simple mouth shapes at best. Although there are language-independent speech animation models, which are not specialized in Korean, they are yet to ensure the quality to be utilized in a domestic content production. Therefore, we propose a natural speech animation synthesis method that reflects the linguistic characteristics of Korean driven by an input audio and text. Reflecting the features that vowels mostly determine the mouth shape in Korean, a coarticulation model separating lips and the tongue has been defined to solve the previous problem of lip distortion and occasional missing of some phoneme characteristics. Our model also reflects the differences in prosodic features for improved dynamics in speech animation. Through user studies, we verify that the proposed model can synthesize natural speech animation.

Designing a Quantizer of LPC Parameters for the Narrowband Speech Coder using Block-Constrained Trellis Coded Quantization (블록 제한 트렐리스 부호화 양자화 기법을 이용한 협대역 음성 부호화기용 LPC 계수 양자화기 설계)

  • Jun, Ja-Kyoung;Park, Sang-Kuk;Kang, Sang-Won
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.3C
    • /
    • pp.234-240
    • /
    • 2007
  • In this paper, low complexity block constrained trellis coded quantization (BC-TCQ) structures are introduced, and a predictive BC TCQ encoding method is developed for quantization of line spectrum frequencies (LSF) parameters for narrowband speech coding applications. Trellis-coded quantization(TCQ) is a form of VQ that builds the VQ codebook from interleaved constituent scalar quantization codebooks. The performance is compared to the other VQ, demonstrating reduction in spectral distortion and significant reduction in encoding complexity. The predictive BC-TCQ is about 0.47107 dB superior to the IS-641 split-VQ, 26bits/frame, in spectral distortion sense. The BC-TCQ is 64.54%, 76.93%, 2.35% of the IS-641 split-VQ, respectively, in the complexity of the additions, multiplies, comparisons.

A Study on the Audio Compensation System (음향 보상 시스템에 관한 연구)

  • Jeoung, Byung-Chul;Won, Chung-Sang
    • The Journal of the Acoustical Society of Korea
    • /
    • v.32 no.6
    • /
    • pp.509-517
    • /
    • 2013
  • In this paper, we researched a method that makes a good acoustic-speech system using a digital signal processing technique with dynamic microphone as a transducer. Good acoustic-speech system should deliver the original sound input to electric signal without distortion. By measuring the frequency response of the microphone, adjustment factors are obtained by comparing measured data and standard frequency response of microphone for each frequency band. The final sound levels are obtained using the developed adjustment factors of frequency responses from the microphone and speaker to match the original sound levels using the digital signal processing technique. Then, we minimize the changes in the frequency response and level due to the variation of the distance from source to microphone, where the frequency responses were measured according to the distance changes.

Acoustic-phonetic characteristics of fricatives distortion in functional articulation disorders (기능적 조음음운장애아동의 치조 마찰음 왜곡의 음향음성학적 특성)

  • Yang, Minkyo;Choi, Yaelin;Kim, Eun Yeon;Yoo, Hyun Ji
    • Phonetics and Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.127-134
    • /
    • 2018
  • This study aims to explain the difficulties children with articulation and phonological disorders have in producing alveolar fricative sounds. The study will perform a comparative analysis revealing how ordinary children produce alveolar fricative sounds through five different acoustic variables, and consequently identifying objective differences, compared to children with articulation and phonological disorders. Therefore, this study compared and analyzed the differences between 10 children with articulation and phonological disorders and 10 ordinary children according to a phonation type of alveolar fricative sounds (/s/ and /$s^*$), a type of vowel (/i/, /ε/, /u/, /o/, /ɯ/, /ʌ/, /ɑ/), and a structure of syllables (CV, VCV) through acoustic variables including a central moment, skewness, kurtosis, a center of gravity and variance. That is, children with articulation and phonological disorders, when compared to ordinary children, have difficulties with concentrating an agile and momentary friction with strength when articulating alveolar fricative sounds, which uses strong energy and accompany tension. Furthermore, the values of alveolar fricative sounds of children with articulation and phonological disorders appeared to spread evenly over the average range, which means that the range of overall the standard deviation values for children with functional phonological disorders is wider than that of ordinary children. For a future study, if the mispronounced sounds relating to omission, substitution, and addition can be compared and analyzed for various target groups, it could be used effectively to help children with functional phonological disorders.

An Objective Estimation for Simulating of Asymmetrical Auditory Filter of the Hearing Impaired According to Hearing Loss Degree (난청인의 난청 정도에 따른 비대칭 청각 필터 구현의 객관적 평가)

  • Joo, S.I.;Jeon, Y.Y.;Song, Y.R.;Lee, S.M.
    • Journal of rehabilitation welfare engineering & assistive technology
    • /
    • v.3 no.1
    • /
    • pp.27-34
    • /
    • 2009
  • Hearing impaired person's hearing loss has personally various shape, so existing symmetrical auditory filter of frequency band method wasn't properly simulated the hearing impaired person's various hearing loss shape. The shapes of auditory filter are asymmetrical different with each center frequency and each input level. Hearing impaired person which has hearing loss was differently changed with that of normal hearing people and it has different value for speech of quality through auditory filter. In this study, the asymmetrical auditory filter was simulated and then some tests to estimate the filter's performance objectively were performed. The experiment as simulated auditory filter's performance evaluation method used perceptual evaluation of speech quality (PESQ) and log likelihood ratio (LLR) for speech through auditory filter. In the test, processed speech was evaluated objective speech quality and distortion using PESQ and LLR value. When hearing loss processed, PESQ and LLR value have big difference between symmetrical and asymmetrical auditory filter. It means that the difference of the shape auditory filter may affect to speech quality. Especially, when hearing loss existed, auditory filter changing according to asymmetrical shape for each center frequency affected to perceive speech quality of the hearing impaired.

  • PDF

Perceptual and Adaptive Quantization of Line Spectral Frequency Parameters (선 스펙트럼 주파수의 청각 적응 부호화)

  • 한우진;김은경;오영환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.8
    • /
    • pp.68-77
    • /
    • 2000
  • Line special frequency (LSF) parameters have been widely used in low bit-rate speech coding due to their efficiency for representing the short-time speech spectrum. In this paper, a new distance measure based on the masking properties of human ear is proposed for quantizing LSF parameters whereas most conventional quantization methods are based on the weighted Euclidean distance measure. The proposed method derives the perceptual distance measure from the definition of noise-to-mask ratio (NMR) which has high correspondence with the actual distortion received in the human ear and uses it for quantizing LSF parameters. In addition, we propose an adaptive bit allocation scheme, which allocates minimal bits to LSF parameters maintaining the perceptual transparency of given speech frame for reducing the average bit-rates. For the performance evaluation, we has shown the ratio of perceptually transparent frames and the corresponding average bit-rates for the conventional and proposed methods. By jointly combining the proposed distance measure and adaptive bit allocation scheme, the proposed system requires only 770 bps for obtaining 95.5% perceptually transparent frames, while the conventional systems produce 89.9% at even 1800 bps.

  • PDF

Improvement of Synthetic Speech Quality using a New Spectral Smoothing Technique (새로운 스펙트럼 완만화에 의한 합성 음질 개선)

  • 장효종;최형일
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.11
    • /
    • pp.1037-1043
    • /
    • 2003
  • This paper describes a speech synthesis technique using a diphone as an unit phoneme. Speech synthesis is basically accomplished by concatenating unit phonemes, and it's major problem is discontinuity at the connection part between unit phonemes. To solve this problem, this paper proposes a new spectral smoothing technique which reflects not only formant trajectories but also distribution characteristics of spectrum and human's acoustic characteristics. That is, the proposed technique decides the quantity and extent of smoothing by considering human's acoustic characteristics at the connection part of unit phonemes, and then performs spectral smoothing using weights calculated along a time axis at the border of two diphones. The proposed technique reduces the discontinuity and minimizes the distortion which is caused by spectral smoothing. For the purpose of performance evaluation, we tested on five hundred diphones which are extracted from twenty sentences using ETRI Voice DB samples and individually self-recorded samples.

A Study on APC-MPC in 8kbps of Convergence System (융복합 시스템의 8kbps에 있어서 APC-MPC에 관한 연구)

  • Lee, See-Woo
    • Journal of Digital Convergence
    • /
    • v.13 no.7
    • /
    • pp.177-182
    • /
    • 2015
  • In a MPC(Multi-Pulse Coding) using excitation source of voiced and unvoiced, it would be a distortion of voice waveform. This is caused by normalization of synthesis speech waveform of voiced in the process of restoration. To solve this problem, this paper present APC-MPC of amplitude-position compensation in a multi-pulses each pitch interval in order to reduce distortion of synthesis waveform. Also, I was implemented that the APC-MPC in coding system. And I evaluate the SNRseg of APC-MPC in 8kbps coding condition of convergence system. As a result, SNRseg of APC-MPC was 13.9dB for female voice and 14.3dB for male voice respectively. And so, I expect to be able to this method for cellular phone and smart phone using excitation source of low bit rate.

An Autoregressive Parameter Estimation from Noisy Speech Using the Adaptive Predictor (적응예측기를 이용하여 잡음섞인 음성신호로부터 autoregressive 계수를 추산하는 방법)

  • Koo, Bon-Eung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.3
    • /
    • pp.90-96
    • /
    • 1995
  • A new method for autoregressive parameter estimation from noisy observation sequence is presented. This method, termed the AP method, is a result of an attempt to make use of the adaptive predictor which is a simple and reliable way of parameter estimation. It is shown theoretically that, for noisy input, the parameter vector computed from the prediction sequence is closer to that of the original sequence than the noisy input sequence is, under the spectral distortion criterion. Simulation results with the Kalman filter as a noise reduction filter and real speech data supported the theory. Roughly speaking, the performance of the parameter set obtained by the AP method is better than noisy one but worse than the EM iteration results. When the simplicity is considered, it could provide a useful alternative to more complicated parameter estimation methods in some applications.

  • PDF