• Title/Summary/Keyword: Speech presence probability

Search Result 19, Processing Time 0.02 seconds

Speech Enhancement based on Minima Controlled Recursive Averaging Technique Incorporating Second-order Conditional Maximum a posteriori Criterion (2차 조건 사후 최대 확률 기반 최소값 제어 재귀평균기법을 이용한 음성향상)

  • Kum, Jong-Mo;Chang, Joon-Hyuk
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.46 no.4
    • /
    • pp.132-138
    • /
    • 2009
  • In this paper, we propose a novel approach to improve the performance of minima controlled recursive averaging (MCRA) which is based on the second-order conditional maximum a posteriori (CMAP). From an investigation of the MCRA scheme, it is discovered that the MCRA method cannot take full consideration of the inter-frame correlation of voice activity since the noise power estimate is adjusted by the speech presence probability depending on an observation of the current frame. To avoid this phenomenon, the proposed MCRA approach incorporates the second-order CMAP criterion in which the noise power estimate is obtained using the speech presence probability conditioned on both the current observation and the speech activity decisions in the previous two frames. Experimental results show that the proposed MCRA technique based on second-order conditional MAP yields better results compared to the conventional MCRA method.

Double-Talk Detection Based on Soft Decision for Acoustic Echo Suppression (음향학적 반향 제거를 위한 Soft Decision 기반의 동시통화 검출)

  • Park, Yun-Sik;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.3
    • /
    • pp.285-289
    • /
    • 2009
  • In this paper, we propose a novel double-talk detection (DTD) technique based on soft decision in the frequency domain. In the proposed method, global near-end speech presence probability (GNSPP) considering the statistical model assumption and voice activity detection (VAD) decision of the near-end and far-end signal are applied to the DTD algorithm in the frequency domain instead of the traditional hard decision scheme using cross-correlation coefficients. The performance of the proposed algorithm is evaluated by the objective test under various environments, and yields better results compared with the conventional scheme.

Intelligibility Enhancement of Multimedia Contents Using Spectral Shaping (스펙트럼 성형기법을 이용한 멀티미디어 콘텐츠의 명료도 향상)

  • Ji, Youna;Park, Young-cheol;Hwang, Young-su
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.11
    • /
    • pp.82-88
    • /
    • 2016
  • In this paper, we propose an intelligibility enhancement algorithm for multimedia contents using spectral shaping. The dialogue signals is essential to understand the plot of audio-visual media contents such as movie and TV. However, the non-dialogue components as like sound effects and background music often degrade the dialogue clarity. To overcome this problem, this paper tries to improves the dialogue clarity of audio soundtracks which contain important cues for the visual scenes. In the proposed method, the dialogue components are first detected by soft masker based on speech presence probability (SPP) which is widely used in speech enhancement field. Then, extracted dialogue signals are applied to the spectral shaping method. It reallocate the spectral-temporal energy of speech to enhanced the intelligibility. The total energy is maintained as unchanged via a loudness normalization process to prevent saturation. The algorithm was evaluated using the modeled and real movie soundtracks and it was shown that the proposed algorithm enhances the dialogue clarity while preserving the total audio power.

Voice Activity Detection Based on SNR and Non-Intrusive Speech Intelligibility Estimation

  • An, Soo Jeong;Choi, Seung Ho
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.11 no.4
    • /
    • pp.26-30
    • /
    • 2019
  • This paper proposes a new voice activity detection (VAD) method which is based on SNR and non-intrusive speech intelligibility estimation. In the conventional SNR-based VAD methods, voice activity probability is obtained by estimating frame-wise SNR at each spectral component. However these methods lack performance in various noisy environments. We devise a hybrid VAD method that uses non-intrusive speech intelligibility estimation as well as SNR estimation, where the speech intelligibility score is estimated based on deep neural network. In order to train model parameters of deep neural network, we use MFCC vector and the intrusive speech intelligibility score, STOI (Short-Time Objective Intelligent Measure), as input and output, respectively. We developed speech presence measure to classify each noisy frame as voice or non-voice by calculating the weighted average of the estimated STOI value and the conventional SNR-based VAD value at each frame. Experimental results show that the proposed method has better performance than the conventional VAD method in various noisy environments, especially when the SNR is very low.

Low-Complexity Speech Enhancement Algorithm Based on IMCRA Algorithm for Hearing Aids (보청기를 위한 IMCRA 기반 저연산 음성 향상 알고리즘)

  • Jeon, Yuyong;Lee, Sangmin
    • Journal of rehabilitation welfare engineering & assistive technology
    • /
    • v.11 no.4
    • /
    • pp.363-370
    • /
    • 2017
  • In this paper, we proposed a low-complexity speech enhancement algorithm based on a improved minima controlled recursive averaging (IMCRA) and log minimum mean square error (logMMSE). The IMCRA algorithm track the minima value of input power within buffers in local window and identify the speech presence using ratio between input power and its minima value. In this process, many number of operations are required. To reduce the number of operations of IMCRA algorithm, minima value is tracked using time-varying frequency-dependent smoothing based on speech presence probability. The proposed algorithm enhanced speech quality by 2.778%, 3.481%, 2.980% and 2.162% in 0, 5, 10 and 15dB SNR respectively and reduced computational complexity by average 9.570%.

Speech Enhancement Based on IMCRA Incorporating noise classification algorithm (잡음 환경 분류 알고리즘을 이용한 IMCRA 기반의 음성 향상 기법)

  • Song, Ji-Hyun;Park, Gyu-Seok;An, Hong-Sub;Lee, Sang-Min
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.61 no.12
    • /
    • pp.1920-1925
    • /
    • 2012
  • In this paper, we propose a novel method to improve the performance of the improved minima controlled recursive averaging (IMCRA) in non-stationary noisy environment. The conventional IMCRA algorithm efficiently estimate the noise power by averaging past spectral power values based on a smoothing parameter that is adjusted by the signal presence probability in frequency subbands. Since the minimum of smoothing parameter is defined as 0.85, it is difficult to obtain the robust estimates of the noise power in non-stationary noisy environments that is rapidly changed the spectral characteristics such as babble noise. For this reason, we proposed the modified IMCRA, which adaptively estimate and updata the noise power according to the noise type classified by the Gaussian mixture model (GMM). The performances of the proposed method are evaluated by perceptual evaluation of speech quality (PESQ) and composite measure under various environments and better results compared with the conventional method are obtained.

A Statistical Model-Based Voice Activity Detection Employing the Conditional MAP Criterion with Spectral Deviation (조건 사후 최대 확률과 음성 스펙트럼 변이 조건을 이용한 통계적 모델 기반의 음성 검출기)

  • Kim, Sang-Kyun;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.6
    • /
    • pp.324-329
    • /
    • 2011
  • In this paper, we propose a novel approach to improve the performance of a statistical model-based voice activity detection (VAD) which is based on the conditional maximum a posteriori (CMAP) with deviation. In our approach, the VAD decision rule is expressed as the geometric mean of likelihood ratios (LRs) based on adapted threshold according to the speech presence probability conditioned on both the speech activity decisions and spectral deviation in the pervious frame. Experimental results show that the proposed approach yields better results compared to the CMAP-based VAD using the LR test.

Statistical Model-Based Voice Activity Detection Using the Second-Order Conditional Maximum a Posteriori Criterion with Adapted Threshold (적응형 문턱값을 가지는 2차 조건 사후 최대 확률을 이용한 통계적 모델 기반의 음성 검출기)

  • Kim, Sang-Kyun;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.1
    • /
    • pp.76-81
    • /
    • 2010
  • In this paper, we propose a novel approach to improve the performance of a statistical model-based voice activity detection (VAD) which is based on the second-order conditional maximum a posteriori (CMAP). In our approach, the VAD decision rule is expressed as the geometric mean of likelihood ratios (LRs) based on adapted threshold according to the speech presence probability conditioned on both the current observation and the speech activity decisions in the pervious two frames. Experimental results show that the proposed approach yields better results compared to the statistical model-based and the CMAP-based VAD using the LR test.

Voice Activity Detection Based on SVM Classifier Using Likelihood Ratio Feature Vector (우도비 특징 벡터를 이용한 SVM 기반의 음성 검출기)

  • Jo, Q-Haing;Kang, Sang-Ki;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.8
    • /
    • pp.397-402
    • /
    • 2007
  • In this paper, we apply a support vector machine(SVM) that incorporates an optimized nonlinear decision rule over different sets of feature vectors to improve the performance of statistical model-based voice activity detection(VAD). Conventional method performs VAD through setting up statistical models for each case of speech absence and presence assumption and comparing the geometric mean of the likelihood ratio (LR) for the individual frequency band extracted from input signal with the given threshold. We propose a novel VAD technique based on SVM by treating the LRs computed in each frequency bin as the elements of feature vector to minimize classification error probability instead of the conventional decision rule using geometric mean. As a result of experiments, the performance of SVM-based VAD using the proposed feature has shown better results compared with those of reported VADs in various noise environments.