• Title/Summary/Keyword: Speech activity detection

Search Result 85, Processing Time 0.018 seconds

Voice Packet Processing Scheme for Voice Quality and Bandwidth Efficiency in VoIP (VoIP의 음성품질/대역효율 개선을 위한 음성패킷 처리)

  • Kim, Jae-Won;Sohn, Dong-Chul
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.7
    • /
    • pp.896-904
    • /
    • 2004
  • In this paper, We present an efficient variable rate speech coder for spectral efficiency and packet processing technique for packet loss compensation of a voice codec with 10msec frame in VoIP service. Through disconnecting the users from the spectral resource during silence interval of about 60% period, a variable rate voice coder based on a voice activity detection(VAD) can increase spectral gain by two times. The performance of the method was analyzed by variation of detected voice activity factor and degraded speech frame ratio under various background noise level, and compared those of G.729B of ITU-T 8kbps standard speech codec. A method to compensate lost packets utilized addition of recovery data to a main stream and error concealment scheme for speech quality enhancement, the performance is verified by reconstructed speech quality. The proposed scheme can achieve spectral gain by two times or enhance speech quality by 3dB through reserved bandwidth of VAD. Therefore, the proposed method can enhance a spectral efficiency or speech quality of VoIP.

  • PDF

Speech Enhancement Based on Modified IMCRA Using Spectral Minima Tracking with Weighted Subband Selection (서브밴드 가중치를 적용한 스펙트럼 최소값 추적을 이용하는 수정된 IMCRA 기반의 음성 향상 기법)

  • Park, Yun-Sik;Park, Gyu-Seok;Lee, Sang-Min
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.49 no.3
    • /
    • pp.89-97
    • /
    • 2012
  • In this paper, we propose a novel approach to noise power estimation for speech enhancement in noisy environments. The method based on IMCRA (improved minima controlled recursive averaging) which is widely used in speech enhancement utilizes a rough VAD (voice activity detection) algorithm which excludes speech components during speech periods in order to improves the performance of the noise power estimation by reducing the speech distortion caused by the conventional algorithm based on the minimum power spectrum derived from the noisy speech. However, since the VAD algorithm is not sufficient to distinguish speech from noise at non-stationary noise and low SNRs (signal-to-noise ratios), the speech distortion resulted from the minimum tracking during speech periods still remained. In the proposed method, minimum power estimate obtained by IMCRA is modified by SMT (spectral minima tracking) to reduce the speech distortion derived from the bias of the estimated minimum power. In addition, in order to effectively estimate minimum power by considering the distribution characteristic of the speech and noise spectrum, the presented method combines the minimum estimates provided by IMCRA and SMT depending on the weighting factor based on the subband. Performance of the proposed algorithm is evaluated by subjective and objective quality tests under various environments and better results compared with the conventional method are obtained.

Statistical Voice Activity Defector Based on Signal Subspace Model (신호 준공간 모델에 기반한 통계적 음성 검출기)

  • Ryu, Kwang-Chun;Kim, Dong-Kook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.27 no.7
    • /
    • pp.372-378
    • /
    • 2008
  • Voice activity detectors (VAD) are important in wireless communication and speech signal processing, In the conventional VAD methods, an expression for the likelihood ratio test (LRT) based on statistical models is derived in discrete Fourier transform (DFT) domain, Then, speech or noise is decided by comparing the value of the expression with a threshold, This paper presents a new statistical VAD method based on a signal subspace approach, The probabilistic principal component analysis (PPCA) is employed to obtain a signal subspace model that incorporates probabilistic model of noisy signal to the signal subspace method, The proposed approach provides a novel decision rule based on LRT in the signal subspace domain, Experimental results show that the proposed signal subspace model based VAD method outperforms those based on the widely used Gaussian distribution in DFT domain.

VAD By Neural Network Under Wireless Communication Systems (Neural Network을 이용한 무선 통신시스템에서의 VAD)

  • Lee Hosun;Kim Sukyung;Park Sung-Kwon
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.12C
    • /
    • pp.1262-1267
    • /
    • 2005
  • Elliptical basis function (EBF) neural network works stably under high-level background noise environment and makes the nonlinear processing possible. It can be adapted real time VAD with simple design. This paper introduces VAD implementation using EBF and the experimental results show that EBF VAD outperforms G729 Annex B and RBF neural networks. The best error rates achieved by the EBF networks were improved more than $70\%$ in speech and $50\%$ in silence while that achieved by G.729 Annex B and RBF networks respectively.

Statistical Voice Activity Detection Using Probabilistic Non-Negative Matrix Factorization (확률적 비음수 행렬 인수분해를 사용한 통계적 음성검출기법)

  • Kim, Dong Kook;Shin, Jong Won;Kwon, Kisoo;Kim, Nam Soo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.41 no.8
    • /
    • pp.851-858
    • /
    • 2016
  • This paper presents a new statistical voice activity detection (VAD) based on the probabilistic interpretation of nonnegative matrix factorization (NMF). The objective function of the NMF using Kullback-Leibler divergence coincides with the negative log likelihood function of the data if the distribution of the data given the basis and encoding matrices is modeled as Poisson distributions. Based on this probabilistic NMF, the VAD is constructed using the likelihood ratio test assuming that speech and noise follow Poisson distributions. Experimental results show that the proposed approach outperformed the conventional Gaussian model-based and NMF-based methods at 0-15 dB signal-to-noise ratio simulation conditions.

Voice Activity Detection Based on SVM Classifier Using Likelihood Ratio Feature Vector (우도비 특징 벡터를 이용한 SVM 기반의 음성 검출기)

  • Jo, Q-Haing;Kang, Sang-Ki;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.8
    • /
    • pp.397-402
    • /
    • 2007
  • In this paper, we apply a support vector machine(SVM) that incorporates an optimized nonlinear decision rule over different sets of feature vectors to improve the performance of statistical model-based voice activity detection(VAD). Conventional method performs VAD through setting up statistical models for each case of speech absence and presence assumption and comparing the geometric mean of the likelihood ratio (LR) for the individual frequency band extracted from input signal with the given threshold. We propose a novel VAD technique based on SVM by treating the LRs computed in each frequency bin as the elements of feature vector to minimize classification error probability instead of the conventional decision rule using geometric mean. As a result of experiments, the performance of SVM-based VAD using the proposed feature has shown better results compared with those of reported VADs in various noise environments.

A Variable Step-Size Adaptive Feedback Cancellation Algorithm based on GSAP in Digital Hearing Aids (가변 스텝 크기 적응 필터와 음성 검출기를 이용한 보청기용 피드백 제거 알고리즘)

  • An, Hongsub;Park, Gyuseok;Song, Jihyun;Lee, Sangmin
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.62 no.12
    • /
    • pp.1744-1749
    • /
    • 2013
  • Acoustic feedback is perceived as whistling or howling, which is a major complaint of hearing-aids users. Acoustic feedback cancellation is important in hearing-aids because acoustic feedback degrades performance of the hearing aid device by reducing maximum insertion gain. Adaptive systems for estimate acoustic feedback path and feedback suppression algorithms have been proposed in order to solve this problem. A typical feedback cancellation algorithm is LMS(least mean squares) because of its computational efficiency. However it has problem of convergence performance in high correlated input signal. In this paper, we propose a new variable step-size normalized LMS(least mean squares) algorithm using VAD(voice activity detection) to overcome the limitation of the LMS algorithm. The VAD algorithm is GSAP(global speech absence probability) and the feedback cancellation algorithm is normalized LMS. The proposed algorithm applies different step-size between voice and non-voice using VAD, for high stability, fast convergence speed and low misalignment when correlated inputs, such as speech. The result of simulation with white noise mixed speech signal, the proposed algorithm shows high performance then traditional algorithm in terms of stability, convergence speed and misalignment.

Unproved Speech Enhancement Algorithm employing Multi-band Power Subtraction and Wavelet Packets Decomposition (Multi-band Power Subtraction과 Wavelet Packets Decomposition을 이용한 개선된 음성 향상 방법)

  • Lee Yoon-Chang;Kwak Jeong-Hoon;Ahn Sang-Sik
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.6C
    • /
    • pp.589-602
    • /
    • 2006
  • 잡음은 음성과 관련된 시스템의 성능을 제한하는 주된 원인이기 때문에 음성향상과 관련된 연구는 꾸준히 계속되어왔다. 전통적인 음성향상 방법은 무성음과 잡음을 구분하지 알기 때문에 잡음제거 과정에서 무성음이 함께 제거되는 단점이 있으며, 웨이블릿 기반의 전통적인 잡음제거 방법은 각 대역마다 동일한 문턱값을 사용하기 때문에 시변 환경에서 성능이 떨어지는 단점이 있다. 이 단점들을 개선하기위해 다중대역 파워 차감법과 Perceptual 웨이블릿 패킷 분해를 이용한 웨이블릿 기반의 개선된 음성향상 방법을 제안한다. 전처리 과정으로 다중대역 파워 차감법을 사용하여 광대역 잡음을 제거하고 뮤지컬 잡음의 발생을 줄이며, psycho-acoustic 모델 기반 Perceptual 웨이블릿 패킷으로 신호를 분해한 후 각 웨이블릿 노드의 엔트로피 비율과 음성검출을 이용하여 무성음/유성음/잡음을 구분한다. 구분된 신호에 따라 각 웨이블릿 노드마다의 문턱값을 기준으로 웨이블릿 Shrinkage를 적용하여 잡음을 제거하고 무성음이나 파워가 작은 유성음이 제거되는 오류를 최소화한다. 또한 잡음 파워 추정 과정에 적응적으로 망각 계수를 선택하여 잡음 파워 추정 오류를 최소화한다.

Design of Voice Activity Detection Algorithm for Variable Rate Speech Coders (가변전송률 음성부호화기 적용을 위한 음성활성도 측정 알고리즘 설계)

  • 김재원
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.26 no.9A
    • /
    • pp.1451-1458
    • /
    • 2001
  • 디지털 이동통신 시스템에서 가장 빈번하게 발생하는 음성 서비스의 궁극적인 목표는 양호한 음성 품질과 높은 주파수 효율의 제공에 있다. 음성은 묵음 구간에 의하여 구분되어진 짧고 간헐적인 음성 에너지의 반복으로 표현 가능하며 실제 음성 통화중 활성 음성이 존재하는 구간은 약 40%, 나머지 60% 구간은 묵음 또는 상대방의 음성을 듣는 구간이다. 이 묵음 구간을 효율적으로 활용함에 의해 시스템의 스펙트럼 이득을 얻을 수 있다. 본 논문에서는 디지털 이동통신 시스템과 같이 다양하게 변화하는 주변 잡음 환경에서도 강건하게 동작 가능하여 10msec 프레임 크기를 갖는 음성부호화기에 적용 가능한 음성 활성도 측정 방안을 설계하였다. 설계된 알고리즘은 음성에너지, 스펙트럼 분포, 영교차율, 그리고 LPC 잔여신호의 Peakiness 측정값을 이용하였다.

  • PDF

Generalized cross correlation with phase transform sound source localization combined with steered response power method (조정 응답 파워 방법과 결합된 generalized cross correlation with phase transform 음원 위치 추정)

  • Kim, Young-Joon;Oh, Min-Jae;Lee, In-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.36 no.5
    • /
    • pp.345-352
    • /
    • 2017
  • We propose a methods which is reducing direction estimation error of sound source in the reverberant and noisy environments. The proposed algorithm divides speech signal into voice and unvoice using VAD. We estimate the direction of source when current frame is voiced. TDOA (Time-Difference of Arrival) between microphone array using the GCC-PHAT (Generalized Cross Correlation with Phase Transform) method will be estimated in that frame. Then, we compare the peak value of cross-correlation of two signals applied to estimated time-delay with other time-delay in time-table in order to improve the accuracy of source location. If the angle of current frame is far different from before and after frame in successive voiced frame, the angle of current frame is replaced with mean value of the estimated angle in before and after frames.