• Title/Summary/Keyword: speech enhancement

Search Result 340, Processing Time 0.028 seconds

Enhancement of speech with time-variant and colored noise

  • Mine, Katsutoshi;Kitazaki, Masato;Wakabayashi, Katsuyoshi;Morimoto, Yuji
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1990.10b
    • /
    • pp.1098-1102
    • /
    • 1990
  • We consider a method for enhancement of speech signal degraded by additive random noise with time-variant and/or colored natures. For enhancement of speech signal with such noise, it is effective to utilize the natures of speech and noise. The objective of enhancement of speech is to improve the overall quality and the articulation of speech degraded by the time-variant and/or colored random noise. In the proposed method the distribution model of speech spectrum is given as information to noise reduction system. The proposed system can improve about lOdB in SNR when the input SNR is 0 dB.

  • PDF

Excitation Enhancement Based on a Selective-Band Harmonic Model for Low-Bit-Rate Code-Excited Linear Prediction Coders (저전송률 코드여기 선형 예측 부호화기를 위한 선택적 대역 하모닉 모델 기반 여기신호 개선 알고리즘)

  • Lee, Mi-Suk;Kim, Hong-Kook;Choi, Seung-Ho;Kim, Do-Young
    • Speech Sciences
    • /
    • v.11 no.2
    • /
    • pp.259-269
    • /
    • 2004
  • In this paper, we propose a new excitation enhancement technique to improve the speech quality of low bit-rate code-excited linear prediction (CELP) coders. The proposed technique is based on a harmonic model and it is employed only in the decoding process of speech coders without any additional bits. We develop the procedure of harmonic model parameter estimation and harmonic generation, and apply this technique to a current state-of-the-art low bit rate speech coder, ITU-T G.729 Annex D. Also, its performance is measured by using the ITU-T P.862 PESQ score and compared to those of the phase dispersion filter and the long-term postfilter applied to the decoded excitation. It is shown that the proposed excitation enhancement technique can improve the quality of decoded speech and provide better quality for male speech than other techniques.

  • PDF

Performance Analysis of Noisy Speech Recognition Depending on Parameters for Noise and Signal Power Estimation in MMSE-STSA Based Speech Enhancement (MMSE-STSA 기반의 음성개선 기법에서 잡음 및 신호 전력 추정에 사용되는 파라미터 값의 변화에 따른 잡음음성의 인식성능 분석)

  • Park Chul-Ho;Bae Keun-Sung
    • MALSORI
    • /
    • no.57
    • /
    • pp.153-164
    • /
    • 2006
  • The MMSE-STSA based speech enhancement algorithm is widely used as a preprocessing for noise robust speech recognition. It weighs the gain of each spectral bin of the noisy speech using the estimate of noise and signal power spectrum. In this paper, we investigate the influence of parameters used to estimate the speech signal and noise power in MMSE-STSA upon the recognition performance of noisy speech. For experiments, we use the Aurora2 DB which contains noisy speech with subway, babble, car, and exhibition noises. The HTK-based continuous HMM system is constructed for recognition experiments. Experimental results are presented and discussed with our findings.

  • PDF

Global Soft Decision Using Probabilistic Outputs of Support Vector Machine for Speech Enhancement (SVM의 확률 출력을 이용한 새로운 Global Soft Decision 기반의 음성 향상 기법)

  • Jo, Q-Haing;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.27 no.2
    • /
    • pp.75-79
    • /
    • 2008
  • In this paper, we propose a novel speech enhancement technique using global soft decision (GSD) based on the probabilistic outputs of support vector machine (SVM). Generally, speech enhancement algorithms applied soft decision gain modification and noise power estimation have bettor performance than those employing hard decision. Especially, global speech absence probability (GSAP), which is known as an effective measure of the speech absence in each frame, has been adopted to SD-based speech enhancement methods. For this reason, we introduce a new GSAP estimated from the probabilistic output of SVM using sigmoid function. The performance of the proposed algorithm is evaluated by the PESQ and MOS test under various noise environments and yields better results compared with the conventional GSD scheme.

Comparison of Two Speech Estimation Algorithms Based on Generalized-Gamma Distribution Applied to Speech Recognition in Car Noisy Environment (자동차 잡음환경에서의 음성인식에 적용된 두 종류의 일반화된 감마분포 기반의 음성추정 알고리즘 비교)

  • Kim, Hyoung-Gook;Lee, Jin-Ho
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.8 no.4
    • /
    • pp.28-32
    • /
    • 2009
  • This paper compares two speech estimators under a generalized Gamma distribution for DFT-based single-microphone speech enhancement methods. For the speech enhancement, the noise estimation based on recursive averaging spectral values by spectral minimum noise is applied to two speech estimators based on the generalized Gamma distribution using $\kappa$=1 or $\kappa$=2. The performance of two speech enhancement algorithms is measured by recognition accuracy of automatic speech recognition(ASR) in car noisy environment.

  • PDF

Probabilistic Target Speech Detection and Its Application to Multi-Input-Based Speech Enhancement (확률적 목표 음성 검출을 통한 다채널 입력 기반 음성개선)

  • Lee, Young-Jae;Kim, Su-Hwan;Han, Seung-Ho;Han, Min-Soo;Kim, Young-Il;Jeong, Sang-Bae
    • Phonetics and Speech Sciences
    • /
    • v.1 no.3
    • /
    • pp.95-102
    • /
    • 2009
  • In this paper, an efficient target speech detection algorithm is proposed for the performance improvement of multi-input speech enhancement. Using the normalized cross correlation value between two selected channels, the proposed algorithm estimates the probabilistic distribution function of the value from the pure noise interval. Then, log-likelihoods are calculated with the function and the normalized cross correlation value to detect the target speech interval precisely. The detection results are applied to the generalized sidelobe canceller-based algorithm. Experimental results show that the proposed algorithm significantly improves the speech recognition performance and the signal-to-noise ratios.

  • PDF

Speech Enhancement Using Lip Information and SFM (입술정보 및 SFM을 이용한 음성의 음질향상알고리듬)

  • Baek, Seong-Joon;Kim, Jin-Young
    • Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.77-84
    • /
    • 2003
  • In this research, we seek the beginning of the speech and detect the stationary speech region using lip information. Performing running average of the estimated speech signal in the stationary region, we reduce the effect of musical noise which is inherent to the conventional MlMSE (Minimum Mean Square Error) speech enhancement algorithm. In addition to it, SFM (Spectral Flatness Measure) is incorporated to reduce the speech signal estimation error due to speaking habit and some lacking lip information. The proposed algorithm with Wiener filtering shows the superior performance to the conventional methods according to MOS (Mean Opinion Score) test.

  • PDF

Complex nested U-Net-based speech enhancement model using a dual-branch decoder (이중 분기 디코더를 사용하는 복소 중첩 U-Net 기반 음성 향상 모델)

  • Seorim Hwang;Sung Wook Park;Youngcheol Park
    • The Journal of the Acoustical Society of Korea
    • /
    • v.43 no.2
    • /
    • pp.253-259
    • /
    • 2024
  • This paper proposes a new speech enhancement model based on a complex nested U-Net with a dual-branch decoder. The proposed model consists of a complex nested U-Net to simultaneously estimate the magnitude and phase components of the speech signal, and the decoder has a dual-branch decoder structure that performs spectral mapping and time-frequency masking in each branch. At this time, compared to the single-branch decoder structure, the dual-branch decoder structure allows noise to be effectively removed while minimizing the loss of speech information. The experiment was conducted on the VoiceBank + DEMAND database, commonly used for speech enhancement model training, and was evaluated through various objective evaluation metrics. As a result of the experiment, the complex nested U-Net-based speech enhancement model using a dual-branch decoder increased the Perceptual Evaluation of Speech Quality (PESQ) score by about 0.13 compared to the baseline, and showed a higher objective evaluation score than recently proposed speech enhancement models.

Nonlinear Speech Enhancement Method for Reducing the Amount of Speech Distortion According to Speech Statistics Model (음성 통계 모형에 따른 음성 왜곡량 감소를 위한 비선형 음성강조법)

  • Choi, Jae-Seung
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.16 no.3
    • /
    • pp.465-470
    • /
    • 2021
  • A robust speech recognition technology is required that does not degrade the performance of speech recognition and the quality of the speech when speech recognition is performed in an actual environment of the speech mixed with noise. With the development of such speech recognition technology, it is necessary to develop an application that achieves stable and high speech recognition rate even in a noisy environment similar to the human speech spectrum. Therefore, this paper proposes a speech enhancement algorithm that processes a noise suppression based on the MMSA-STSA estimation algorithm, which is a short-time spectral amplitude method based on the error of the least mean square. This algorithm is an effective nonlinear speech enhancement algorithm based on a single channel input and has high noise suppression performance. Moreover this algorithm is a technique that reduces the amount of distortion of the speech based on the statistical model of the speech. In this experiment, in order to verify the effectiveness of the MMSA-STSA estimation algorithm, the effectiveness of the proposed algorithm is verified by comparing the input speech waveform and the output speech waveform.

Multi-level Skip Connection for Nested U-Net-based Speech Enhancement (중첩 U-Net 기반 음성 향상을 위한 다중 레벨 Skip Connection)

  • Seorim, Hwang;Joon, Byun;Junyeong, Heo;Jaebin, Cha;Youngcheol, Park
    • Journal of Broadcast Engineering
    • /
    • v.27 no.6
    • /
    • pp.840-847
    • /
    • 2022
  • In a deep neural network (DNN)-based speech enhancement, using global and local input speech information is closely related to model performance. Recently, a nested U-Net structure that utilizes global and local input data information using multi-scale has bee n proposed. This nested U-Net was also applied to speech enhancement and showed outstanding performance. However, a single skip connection used in nested U-Nets must be modified for the nested structure. In this paper, we propose a multi-level skip connection (MLS) to optimize the performance of the nested U-Net-based speech enhancement algorithm. As a result, the proposed MLS showed excellent performance improvement in various objective evaluation metrics compared to the standard skip connection, which means th at the MLS can optimize the performance of the nested U-Net-based speech enhancement algorithm. In addition, the final proposed m odel showed superior performance compared to other DNN-based speech enhancement models.