• Title/Summary/Keyword: Speech activity detection

Search Result 85, Processing Time 0.02 seconds

A Single Channel Voice Activity Detection for Noisy Environments Using Wavelet Packet Decomposition and Teager Energy (웨이블렛 패킷 변환과 Teager 에너지를 이용한 잡음 환경에서의 단일 채널 음성 판별)

  • Koo, Boneung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.33 no.2
    • /
    • pp.139-145
    • /
    • 2014
  • In this paper, a feature parameter is obtained by applying the Teager energy to the WPD(Wavelet Packet Decomposition) coefficients. The threshold value is obtained based on means and standard deviations of nonspeech frames. Experimental results by using TIMIT speech and NOISEX-92 noise databases show that the proposed algorithm is superior to the typical VAD algorithm. The ROC(Receiver Operating Characteristics) curves are used to compare performance of VAD's for SNR values of ranging from 10 to -10 dB.

Reconstruction Effect of the Spectral Entropy for the Voice Activity Detection (음성 활동 구간 검출을 위한 스펙트랄 엔트로피의 재구성 효과)

  • Kwon HO-Min;Han Hag-Yong;Lee Kwang-Seok;Koh Si-Young;Hur Kang-In
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.25-28
    • /
    • 2002
  • Voice activity detection is important Problem in the speech recognition and communication. This paper introduces feature parameter which is reconstructed by the spectral entropy of information theory for the robust voice activity detection in the noise environment, analyzes and compares it with the energy method of voice activity detection and performance. In experiment, we confirmed that the spectral entropy is more feature parameter than the energy method for the robust voice activity detection in the various noise environment.

  • PDF

A Simple Speech/Non-speech Classifier Using Adaptive Boosting

  • Kwon, Oh-Wook;Lee, Te-Won
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.3E
    • /
    • pp.124-132
    • /
    • 2003
  • We propose a new method for speech/non-speech classifiers based on concepts of the adaptive boosting (AdaBoost) algorithm in order to detect speech for robust speech recognition. The method uses a combination of simple base classifiers through the AdaBoost algorithm and a set of optimized speech features combined with spectral subtraction. The key benefits of this method are the simple implementation, low computational complexity and the avoidance of the over-fitting problem. We checked the validity of the method by comparing its performance with the speech/non-speech classifier used in a standard voice activity detector. For speech recognition purpose, additional performance improvements were achieved by the adoption of new features including speech band energies and MFCC-based spectral distortion. For the same false alarm rate, the method reduced 20-50% of miss errors.

Dimension Reduction Method of Speech Feature Vector for Real-Time Adaptation of Voice Activity Detection (음성구간 검출기의 실시간 적응화를 위한 음성 특징벡터의 차원 축소 방법)

  • Park Jin-Young;Lee Kwang-Seok;Hur Kang-In
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.7 no.3
    • /
    • pp.116-121
    • /
    • 2006
  • In this paper, we propose the dimension reduction method of multi-dimension speech feature vector for real-time adaptation procedure in various noisy environments. This method which reduces dimensions non-linearly to map the likelihood of speech feature vector and noise feature vector. The LRT(Likelihood Ratio Test) is used for classifying speech and non-speech. The results of implementation are similar to multi-dimensional speech feature vector. The results of speech recognition implementation of detected speech data are also similar to multi-dimensional(10-order dimensional MFCC(Mel-Frequency Cepstral Coefficient)) speech feature vector.

  • PDF

Double-Talk Detection Based on Soft Decision for Acoustic Echo Suppression (음향학적 반향 제거를 위한 Soft Decision 기반의 동시통화 검출)

  • Park, Yun-Sik;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.3
    • /
    • pp.285-289
    • /
    • 2009
  • In this paper, we propose a novel double-talk detection (DTD) technique based on soft decision in the frequency domain. In the proposed method, global near-end speech presence probability (GNSPP) considering the statistical model assumption and voice activity detection (VAD) decision of the near-end and far-end signal are applied to the DTD algorithm in the frequency domain instead of the traditional hard decision scheme using cross-correlation coefficients. The performance of the proposed algorithm is evaluated by the objective test under various environments, and yields better results compared with the conventional scheme.

Development of an Integrated Packet Voice/Data Terminal (패킷 음성/데이터 집적 단말기의 개발)

  • 전홍범;은종관;조동호
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.13 no.2
    • /
    • pp.171-181
    • /
    • 1988
  • In this study, a packet voice/data terminal(PVDT) that services both voice and data in the packet-switched network is implemented. The software structure of the PVDT is designed according to the OSI 7 layer architecture. The discrimination of voice and data is made in the link layer. Voice packets have priority over data packets in order to minimize the transmission delay, and are serviced by a simple protocol so that the overhead arising form the retransmission of packets may be minimized. The hardware structure of the PVDT is divided into five modules; a master control module, a speech proessing module, a speech activity detection module, a telephone interface module, and an input/output interface module. In addition to the hardware implementation, the optimal reconstruction delay of voice packets to reduce the influence of delay variance is analyzed.

  • PDF

An Improved VAD Algorithm Employing Speech Enhancement Preprocessing and Threshold Updating (음성 향상 전처리와 문턱값 갱신을 적용한 향상된 음성검출 방법)

  • 이윤창;안상식
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.28 no.11C
    • /
    • pp.1161-1168
    • /
    • 2003
  • In this paper, we propose an improved statistical model-based voice activity detection algorithm and threshold update method. We first improve signal-to-noise ratio by using speech enhancement preprocessing algorithm combined power subtraction method and matched filter, then apply it to LLR test optimum decision rule for improving the performance even in low SNR conditions. And we propose an adaptive threshold update method that was not concerned in any papers. We also perform extensive computer simulations to demonstrate the performance improvement of the proposed VAD algorithm employing the proposed speech enhancement preprocessing algorithm and adaptive threshold update method under various background noise environments. Finally we verify our results by comparing ITU-T G.729 Annex B.

Speech Enhancement Algorithm Based on Teager Energy and Speech Absence Probability in Noisy Environments (잡음환경에서 Teager 에너지와 음성부재확률 기반의 음성향상 알고리즘)

  • Park, Yun-Sik;An, Hong-Sub;Lee, Sang-Min
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.49 no.3
    • /
    • pp.81-88
    • /
    • 2012
  • In this paper, we propose a novel speech enhancement algorithm for effective noise suppression in various noisy environments. In the proposed method, to result in improved decision performance for speech and noise segments, local speech absence probability (LSAP, local SAP) based on Teager energy of noisy speech is used as the feature parameter for voice activity detection (VAD) in each frequency subband instead of conventional LSAP. In addition, The presented method utilizes global SAP (GSAP) derived in each frame as the weighting parameter for the modification of the adopted TE operator to improve the performance of TE operator. Performances of the proposed algorithm are evaluated by objective test under various environments and better results compared with the conventional methods are obtained.

Visual Voice Activity Detection and Adaptive Threshold Estimation for Speech Recognition (음성인식기 성능 향상을 위한 영상기반 음성구간 검출 및 적응적 문턱값 추정)

  • Song, Taeyup;Lee, Kyungsun;Kim, Sung Soo;Lee, Jae-Won;Ko, Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.34 no.4
    • /
    • pp.321-327
    • /
    • 2015
  • In this paper, we propose an algorithm for achieving robust Visual Voice Activity Detection (VVAD) for enhanced speech recognition. In conventional VVAD algorithms, the motion of lip region is found by applying an optical flow or Chaos inspired measures for detecting visual speech frames. The optical flow-based VVAD is difficult to be adopted to driving scenarios due to its computational complexity. While invariant to illumination changes, Chaos theory based VVAD method is sensitive to motion translations caused by driver's head movements. The proposed Local Variance Histogram (LVH) is robust to the pixel intensity changes from both illumination change and translation change. Hence, for improved performance in environmental changes, we adopt the novel threshold estimation using total variance change. In the experimental results, the proposed VVAD algorithm achieves robustness in various driving situations.

Distant-talking of Speech Interface for Humanoid Robots (휴머노이드 로봇을 위한 원거리 음성 인터페이스 기술 연구)

  • Lee, Hyub-Woo;Yook, Dong-Suk
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.39-40
    • /
    • 2007
  • For efficient interaction between human and robots, speech interface is a core problem especially in noisy and reverberant conditions. This paper analyzes main issues of spoken language interface for humanoid robots, such as sound source localization, voice activity detection, and speaker recognition.

  • PDF