• Title/Summary/Keyword: speech distortion

Search Result 227, Processing Time 0.02 seconds

Consonant Confusions Matrices in Adults with Dysarthria Associated with Cerebral Palsy (뇌성마비로 인한 마비말장애 성인의 자음 오류 분석)

  • Lee, Youngmee;Sung, JeeEun;Sim, HyunSub
    • Phonetics and Speech Sciences
    • /
    • v.5 no.1
    • /
    • pp.47-54
    • /
    • 2013
  • The aim of this study was to analyze consonant articulation errors produced by 90 speakers with cerebral palsy (CP). Phonetic transcriptions were made for 37 single-word utterances containing 70 phonemes: 48 initial consonants and 22 final consonants. Errors of substitution, omission, and distortion were analyzed using a confusion matrix paradigm showing the visualization of error patterns. Results showed that substitution errors in initial and final consonants were most frequent, followed by omission and distortion. Consonant omission occurred more frequently on final consonants. In both initial and final consonants, the within-place errors were more prominent than the within-manner errors. The current results suggest that consonant confusion matrices for dysarthric speech may provide useful information for evaluating speech intelligibility and developing automatic speech recognition system of adults with CP associated dysarthria.

A SPECTRAL SUBTRACTION USING PHONEMIC AND AUDITORY PROPERTIES

  • Kang, Sun-Mee;Kim, Woo-Il;Ko, Han-Seok
    • Speech Sciences
    • /
    • v.4 no.2
    • /
    • pp.5-15
    • /
    • 1998
  • This paper proposes a speech state-dependent spectral subtraction method to regulate the blind spectral subtraction for improved enhancement. In the proposed method, a modified subtraction rule is applied over the speech selectively contingent to the speech state being voiced or unvoiced, in an effort to incorporate the acoustic characteristics of phonemes. In particular, the objective of the proposed method is to remedy the subtraction induced signal distortion attained by two state-dependent procedures, spectrum sharpening and minimum spectral bound. In order to remove the residual noise, the proposed method employs a procedure utilizing the masking effect. Proposed spectral subtraction including state-dependent subtraction and residual noise reduction using the masking threshold shows effectiveness in compensation of spectral distortion in the unvoiced region and residual noise reduction.

  • PDF

On a robust text-dependent speaker identification over telephone channels (전화음성에 강인한 문장종속 화자인식에 관한 연구)

  • Jung, Eu-Sang;Choi, Hong-Sub
    • Speech Sciences
    • /
    • v.2
    • /
    • pp.57-66
    • /
    • 1997
  • This paper studies the effects of the method, CMS(Cepstral Mean Subtraction), (which compensates for some of the speech distortion. caused by telephone channels), on the performance of the text-dependent speaker identification system. This system is based on the VQ(Vector Quantization) and HMM(Hidden Markov Model) method and chooses the LPC-Cepstrum and Mel-Cepstrum as the feature vectors extracted from the speech data transmitted through telephone channels. Accordingly, we can compare the correct recognition rates of the speaker identification system between the use of LPC-Cepstrum and Mel-Cepstrum. Finally, from the experiment results table, it is found that the Mel-Cepstrum parameter is proven to be superior to the LPC-Cepstrum and that recognition performance improves by about 10% when compensating for telephone channel using the CMS.

  • PDF

Nonlinear Speech Enhancement Method for Reducing the Amount of Speech Distortion According to Speech Statistics Model (음성 통계 모형에 따른 음성 왜곡량 감소를 위한 비선형 음성강조법)

  • Choi, Jae-Seung
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.16 no.3
    • /
    • pp.465-470
    • /
    • 2021
  • A robust speech recognition technology is required that does not degrade the performance of speech recognition and the quality of the speech when speech recognition is performed in an actual environment of the speech mixed with noise. With the development of such speech recognition technology, it is necessary to develop an application that achieves stable and high speech recognition rate even in a noisy environment similar to the human speech spectrum. Therefore, this paper proposes a speech enhancement algorithm that processes a noise suppression based on the MMSA-STSA estimation algorithm, which is a short-time spectral amplitude method based on the error of the least mean square. This algorithm is an effective nonlinear speech enhancement algorithm based on a single channel input and has high noise suppression performance. Moreover this algorithm is a technique that reduces the amount of distortion of the speech based on the statistical model of the speech. In this experiment, in order to verify the effectiveness of the MMSA-STSA estimation algorithm, the effectiveness of the proposed algorithm is verified by comparing the input speech waveform and the output speech waveform.

Entropy-Constrained Temporal Decomposition (엔트로피 제한 조건을 갖는 시간축 분할)

  • Lee Ki-Seung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.5
    • /
    • pp.262-270
    • /
    • 2005
  • In this paper, a new temporal decomposition method is proposed. where not oniy distortion but also entropy are involved in segmentation. The interpolation functions and the target feature vectors are determined by a dynamic Programing technique. where both distortion and entropy are simultaneously minimized. The interpolation functions are built by using a training speech corpus. An iterative method. where segmentation and estimation are iteratively performed. finds the locally optimum Points in the sense of minimizing both distortion and entropy. Simulation results -3how that in terms of both distortion and entropy. the Proposed temporal decomposition method Produced superior results to the conventional split vector-quantization method which is widely employed in the current speech coding methods. According to the results from the subjective listening test, the Proposed method reveals superior Performance in terms of qualify. comparing to the Previous vector quantization method.

On a Pitch Alteration Technique in Time-Frequency Hybrid Domain for High Quality Prosody Control of Speech Signal (고음질 운율조절용 시간-주파수 혼성영역 피치변경법)

  • Lee, Sang-Hyo;Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.4
    • /
    • pp.106-109
    • /
    • 1997
  • In the area of the speech synthesis techniques, the waveform coding methods maintain the intelligibility and naturalness of synthetic speech. In order to apply the waveform coding techniques to synthesis by rule, however, we must be able to alter the pitches for prosody control of synthetic speech. In this paper, we propose a new pitch alteration technique in time-frequency hybrid domain, that compensates phase distortion of the cepstral pitch alteration method with time scaling method in the time domain. This method can remove some phase spectrum distortion which is occurred in conjunction point between the waveforms in continued frames. Also, we can obtain little magnitude spectrum distortion below 1.18% for pitch alteration of 200%.

  • PDF

A Phase-related Feature Extraction Method for Robust Speaker Verification (열악한 환경에 강인한 화자인증을 위한 위상 기반 특징 추출 기법)

  • Kwon, Chul-Hong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.3
    • /
    • pp.613-620
    • /
    • 2010
  • Additive noise and channel distortion strongly degrade the performance of speaker verification systems, as it introduces distortion of the features of speech. This distortion causes a mismatch between the training and recognition conditions such that acoustic models trained with clean speech do not model noisy and channel distorted speech accurately. This paper presents a phase-related feature extraction method in order to improve the robustness of the speaker verification systems. The instantaneous frequency is computed from the phase of speech signals and features from the histogram of the instantaneous frequency are obtained. Experimental results show that the proposed technique offers significant improvements over the standard techniques in both clean and adverse testing environments.

Perceptual Characteristics of Korean Consonants Distorted by the Frequency Band Limitation (주파수 대역 제한에 의한 한국어 자음의 지각 특성 분석)

  • Kim, YeonWhoa;Choi, DaeLim;Lee, Sook-Hyang;Lee, YongJu
    • Phonetics and Speech Sciences
    • /
    • v.6 no.1
    • /
    • pp.95-101
    • /
    • 2014
  • This paper investigated the effects of frequency band limitation on perceptual characteristics of Korean consonants. Monosyllabic speech (144 syllables of CV type, 56 syllables of VC type, 8 syllables of V type) produced by two announcers were low- and high-pass filtered with cutoff frequencies ranging from 300 to 5000 Hz. Six listeners with normal hearing performed perception test by types of filter and cutoff frequencies. We reported phoneme recognition rates and types of perception error of band-limited Korean consonants to examine how frequency distortion in the process of speech transmission affect listener's perception. The results showed that recognition rates varied with the following factors: position in a syllable, manner of articulation, place of articulation, and phonation types. Consonants in the final position were stronger to the frequency band limitation than those in the initial position. Fricatives and Affricates are stronger than stops. Fortis consonants were less stronger than their lenis or aspirated counterparts. Types of perception error also varied depending on such factors as consonant's place of articulation: In case of bilabial stops, they were perceived as alveolar stops with while in cases of alveolar and velar stops, there were changes in phonation types without any change in the place of articulation.

A Study on Compensation of Amplitude in Multi Pulse (멀티펄스의 진폭보정에 관한 연구)

  • Lee, See-Woo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.9
    • /
    • pp.4119-4124
    • /
    • 2011
  • In a MPC coding using excitation source of voiced and unvoiced, it would be a distortion of speech waveform in case of increasing or decreasing of speech signal amplitude in a frame. This is caused by normalization of synthesis speech signal in the process of restoration the multi-pulses of representation section. To solve this problem, this paper present a method of amplitude compensation(AC-MPC) in a multi-pulses each pitch interval in order to reduce distortion of speech waveform. I was confirmed that the method can be synthesized close to the original speech waveform. And I evaluate the MPC and AC-MPC using amplitude compensation method. As a result, SNRseg of AC-MPC was improved 0.7dB for female voice and 0.7dB for male voice respectively. Compared to the MPC, SNRseg of AC-MPC has been improved that I was able to control the distortion of the speech waveform finally. And so, I expect to be able to this method for cellular phone and smart phone using excitation source of low bit rate.

An Adaptive Wind Noise Reduction Method Based on a priori SNR Estimation for Speech Eenhancement (음성 강화를 위한 a priori SNR 추정기반 적응 바람소리 저감 방법)

  • Seo, Ji-Hun;Lee, Seok-Pil
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.64 no.12
    • /
    • pp.1756-1760
    • /
    • 2015
  • This paper focuses on a priori signal to noise ratio (SNR) estimation method for the speech enhancement. There are many researches for speech enhancement with several ambient noise cancellation methods. The method based on spectral subtraction (SS) which is widely used in noise reduction has a trade-off between the performance and the distortion of the signals. So the need of adaptive method like an estimated a priori SNR being able to making a high performance and low distortion is increasing. The decision directed (DD) approach is used to determine a priori SNR in noisy speech signals. A priori SNR is estimated by using only the magnitude components and consequently follows a posteriori SNR with one frame delay. We propose a modified a priori SNR estimator and the weighted rational transfer function for speech enhancement with wind noises. The experimental result shows the performance of our proposed estimator is better Perceptual Evaluation of Speech Quality scores (PESQ, ITU-T P.862) compare to the conventional DD approach-based systems and different noise reduction methods.