• Title/Summary/Keyword: Speech spectrum

Search Result 307, Processing Time 0.027 seconds

A SPECTRAL SUBTRACTION USING PHONEMIC AND AUDITORY PROPERTIES

  • Kang, Sun-Mee;Kim, Woo-Il;Ko, Han-Seok
    • Speech Sciences
    • /
    • v.4 no.2
    • /
    • pp.5-15
    • /
    • 1998
  • This paper proposes a speech state-dependent spectral subtraction method to regulate the blind spectral subtraction for improved enhancement. In the proposed method, a modified subtraction rule is applied over the speech selectively contingent to the speech state being voiced or unvoiced, in an effort to incorporate the acoustic characteristics of phonemes. In particular, the objective of the proposed method is to remedy the subtraction induced signal distortion attained by two state-dependent procedures, spectrum sharpening and minimum spectral bound. In order to remove the residual noise, the proposed method employs a procedure utilizing the masking effect. Proposed spectral subtraction including state-dependent subtraction and residual noise reduction using the masking threshold shows effectiveness in compensation of spectral distortion in the unvoiced region and residual noise reduction.

  • PDF

Statistical Error Compensation Techniques for Spectral Quantization

  • Choi, Seung-Ho;Kim, Hong-Kook
    • Speech Sciences
    • /
    • v.11 no.4
    • /
    • pp.17-28
    • /
    • 2004
  • In this paper, we propose a statistical approach to improve the performance of spectral quantization of speech coders. The proposed techniques compensate for the distortion in a decoded line spectrum pairs (LSP) vector based on a statistical mapping function between a decoded LSP vector and its corresponding original LSP vector. We first develop two codebook-based probabilistic matching (CBPM) methods based on linear mapping functions according to different assumption of distribution of LSP vectors. In addition, we propose an iterative procedure for the two CBPMs. We apply the proposed techniques to a predictive vector quantizer used for the IS-641 speech coder. The experimental results show that the proposed techniques reduce average spectral distortion by around 0.064dB.

  • PDF

Nonlinear Speech Enhancement Method for Reducing the Amount of Speech Distortion According to Speech Statistics Model (음성 통계 모형에 따른 음성 왜곡량 감소를 위한 비선형 음성강조법)

  • Choi, Jae-Seung
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.16 no.3
    • /
    • pp.465-470
    • /
    • 2021
  • A robust speech recognition technology is required that does not degrade the performance of speech recognition and the quality of the speech when speech recognition is performed in an actual environment of the speech mixed with noise. With the development of such speech recognition technology, it is necessary to develop an application that achieves stable and high speech recognition rate even in a noisy environment similar to the human speech spectrum. Therefore, this paper proposes a speech enhancement algorithm that processes a noise suppression based on the MMSA-STSA estimation algorithm, which is a short-time spectral amplitude method based on the error of the least mean square. This algorithm is an effective nonlinear speech enhancement algorithm based on a single channel input and has high noise suppression performance. Moreover this algorithm is a technique that reduces the amount of distortion of the speech based on the statistical model of the speech. In this experiment, in order to verify the effectiveness of the MMSA-STSA estimation algorithm, the effectiveness of the proposed algorithm is verified by comparing the input speech waveform and the output speech waveform.

Nose Estimation and Suppression methods based on Normalized Variance in Time-Frequency for Speech Enhancement (음성강화를 위한 시간 및 주파수 도메인의 분산정규화 기반 잡음예측 및 저감방법)

  • Lee, Soo-Jeong;Kim, Soon-Hyob
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.46 no.1
    • /
    • pp.87-94
    • /
    • 2009
  • Noise estimation and suppression are a crucial factor of many speech communication and recognition systems. In this paper, proposed algorithm is based on the ratio of variance normalized of noisy power spectrum in time-frequency domain. Our proposed algorithm tracks the threshold and controls the trade-off between residual noise and distortion. This algorithm is evaluated by the ITU-T P.835 signal distortion (SIG) and segment signal to noise ratio (SNR), and is superior to the conventional methods.

Comparison of English and Korean speakers for the nasalization of English stops

  • Yun, Ilsung
    • Phonetics and Speech Sciences
    • /
    • v.7 no.3
    • /
    • pp.3-11
    • /
    • 2015
  • This study compared English and Korean speakers with regard to the nasalization of the English stops /b, d, g, p, t, k/before a nasal within and across a word boundary. Nine English and thirty Korean speakers participated in the experiment. We used 37 speech items with different grammatical structures. Overall the English informants rarely nasalized the stops while the Korean informants generally greatly nasalized them though widely varying from no nasalization to almost complete nasalization. In general, voiced stops were more likely to be nasalized than voiceless stops. Also, the alveolar stops /d, t/tended to be nasalized the most, the bilabial stops /b, p/ the second most, and the velar stops /g, k/ the least. Besides, the closer the grammatical relationship between neighboring words, the more likely the stop nasalization occurred. In contrast, the Korean syllabification - the addition of the vowel /i/ to the final stops - worked against the stop nasalization. On the other hand, different stress (accent) or rhythm effects of the two languages are assumed to contribute to the significantly different nasalization between English and Korean speakers. The spectrum of stop nasalization obtained from this study can be used as an index to measure how close a certain Korean speaker's stop nasalization is to English speakers'.

On a Pitch Alteration Method using Scaling the Harmonics Compensated with the Phase for Speech Synthesis (위상 보상된 고조파 스케일링에 의한 음성합성용 피치변경법)

  • Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.6
    • /
    • pp.91-97
    • /
    • 1994
  • In speech processing, the waveform codings are concerned with simply preserving the waveform of signal through a redundancy reduction process. In the case of speech synthesis, the waveform codings with high quality are mainly used to the synthesis by analysis. Because the parameters of this coding are not classified as both excitation and vocal tract, it is difficult to apply the waveform coding to the synthesis by rule. Thus, in order to apply the waveform coding to synthesis by rule, it is necessary to alter the pitches. In this paper, we proposed a new pitch alteration method that can change the pitch period in waveform coding by dividing the speech signals into the vocal tract and excitation parameters. This method is a time-frequency domain method preserving the phase component of the waveform in time domain and the magnitude component in frequency domain. Thus, it is possible that the waveform coding is carried out the synthesis by rule in speech processing. In case of using the algorithm, we can obtain spectrum distortion with $2.94\%$. That is, the spectrum distortion is decreased more $5.06\%$ than that of the pitch alteration method in time domain.

  • PDF

Comparison of Classification Performance Between Adult and Elderly Using Acoustic and Linguistic Features from Spontaneous Speech (자유대화의 음향적 특징 및 언어적 특징 기반의 성인과 노인 분류 성능 비교)

  • SeungHoon Han;Byung Ok Kang;Sunghee Dong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.8
    • /
    • pp.365-370
    • /
    • 2023
  • This paper aims to compare the performance of speech data classification into two groups, adult and elderly, based on the acoustic and linguistic characteristics that change due to aging, such as changes in respiratory patterns, phonation, pitch, frequency, and language expression ability. For acoustic features we used attributes related to the frequency, amplitude, and spectrum of speech voices. As for linguistic features, we extracted hidden state vector representations containing contextual information from the transcription of speech utterances using KoBERT, a Korean pre-trained language model that has shown excellent performance in natural language processing tasks. The classification performance of each model trained based on acoustic and linguistic features was evaluated, and the F1 scores of each model for the two classes, adult and elderly, were examined after address the class imbalance problem by down-sampling. The experimental results showed that using linguistic features provided better performance for classifying adult and elderly than using acoustic features, and even when the class proportions were equal, the classification performance for adult was higher than that for elderly.

On the Center Pitch Estimation by using the Spectrum Leakage Phenomenon for the Noise Corrupted Speech Signals (배경 잡음하에서 스펙트럼 누설현상을 이용한 음성신호의 중심 피치 검출)

  • Kang, Dong-Kyu;Bae, Myung-Jin;Ann, Sou-Guil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.10 no.1
    • /
    • pp.37-46
    • /
    • 1991
  • The pitch estimation algorithms witch have proposed until now are difficult to detect wide range pitches regardless of age or sex. A little deviation are observed with reference to the center pitch in the distribution diagram of pitches, since pitches are characterized by a physical limitation of the coarticulation mechanism. If the center pitches are refered to the accurate pitch extraction procedure, the algorithms will be not only simplified in procedure but also improved in accuracy. In this paper, we proposed an algorithm that the center pitches are accurately detected by using the spectrum leakage phenomenon for the noise speech signals.

  • PDF

Long Term Average Spectral Analysis for Acoustical Description of Korean Nasal Consonants (한국어 비음의 음향학적 세부 기술을 위한 장구간 스펙트럼(LTAS) 분석)

  • Choi, Soo-Nai;Seong, Cheol-Jae
    • Proceedings of the KSPS conference
    • /
    • 2006.11a
    • /
    • pp.92-95
    • /
    • 2006
  • The purpose of this study is to find the acoustic parameters on frequency domain to distinguish the Korean nasals, /m, n, ng/ from each other. Since it is not easy to characterize the antiformant on frequency domain, we suggest the new parameters that are calculated by LTAS(Long term average spectrum). Maximum energy value and its frequency and minimum energy and its frequency of zero are obtained from the spectrum respectively. In addition, slope1, slope2, total energy value, centroid, skewness, and kurtosis are suggested as new parameters as well. The parameters that are revealed as to be statistically signigicant difference are roughly peak1_a, zero_f, slope_1, slope_2, highENG, zero_ENG, and centroid.

  • PDF

Voice Recognition Performance Improvement using the Convergence of Voice signal Feature and Silence Feature Normalization in Cepstrum Feature Distribution (음성 신호 특징과 셉스트럽 특징 분포에서 묵음 특징 정규화를 융합한 음성 인식 성능 향상)

  • Hwang, Jae-Cheon
    • Journal of the Korea Convergence Society
    • /
    • v.8 no.5
    • /
    • pp.13-17
    • /
    • 2017
  • Existing Speech feature extracting method in speech Signal, there are incorrect recognition rates due to incorrect speech which is not clear threshold value. In this article, the modeling method for improving speech recognition performance that combines the feature extraction for speech and silence characteristics normalized to the non-speech. The proposed method is minimized the noise affect, and speech recognition model are convergence of speech signal feature extraction to each speech frame and the silence feature normalization. Also, this method create the original speech signal with energy spectrum similar to entropy, therefore speech noise effects are to receive less of the noise. the performance values are improved in signal to noise ration by the silence feature normalization. We fixed speech and non speech classification standard value in cepstrum For th Performance analysis of the method presented in this paper is showed by comparing the results with CHMM HMM, the recognition rate was improved 2.7%p in the speech dependent and advanced 0.7%p in the speech independent.