• Title/Summary/Keyword: Speech detection

Search Result 469, Processing Time 0.027 seconds

A Parametric Voice Activity Detection Based on the SPD-TE for Nonstationary Noises (비정체성 잡음을 위한 SPD-TE 기반 계수형 음성 활동 탐지)

  • Koo, Boneung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.34 no.4
    • /
    • pp.310-315
    • /
    • 2015
  • A single channel VAD (Voice Activity Detection) algorithm for nonstationary noise environment is proposed in this paper. Threshold values of the feature parameter for VAD decision are updated adaptively based on estimates of means and standard deviations of past non-speech frames. The feature parameter, SPD-TE (Spectral Power Difference-Teager Energy), is obtained by applying the Teager energy to the WPD (Wavelet Packet Decomposition) coefficients. It was reported previously that the SPD-TE is robust to noise as a feature for VAD. Experimental results by using TIMIT speech and NOISEX-92 noise databases show that decision accuracy of the proposed algorithm is comparable to several typical VAD algorithms including standards for SNR values ranging from 10 to -10 dB.

Speech Active Interval Detection Method in Noisy Speech (잡음음성에서의 음성 활성화 구간 검출 방법)

  • Lee, Kwang-Seok;Choo, Yeon-Gyu;Kim, Hyun-Deok
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2008.10a
    • /
    • pp.779-782
    • /
    • 2008
  • It is important to detect speech active interval from Noisy Speech in speech communication and speech recognition. In this research, we propose characteristic parameter with combining spectral Entropy for detect speech active interval in Noisy Speech, and compare performance of speech active interval based on energy. The results shows that analysis using proposed characteristic parameter is higher performance the others in noisy environment.

  • PDF

Speech Activity Detection using Lip Movement Image Signals (입술 움직임 영상 선호를 이용한 음성 구간 검출)

  • Kim, Eung-Kyeu
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.11 no.4
    • /
    • pp.289-297
    • /
    • 2010
  • In this paper, A method to prevent the external acoustic noise from being misrecognized as the speech recognition object is presented in the speech activity detection process for the speech recognition. Also this paper confirmed besides the acoustic energy to the lip movement image signals. First of all, the successive images are obtained through the image camera for personal computer and the lip movement whether or not is discriminated. The next, the lip movement image signal data is stored in the shared memory and shares with the speech recognition process. In the mean time, the acoustic energy whether or not by the utterance of a speaker is verified by confirming data stored in the shared memory in the speech activity detection process which is the preprocess phase of the speech recognition. Finally, as a experimental result of linking the speech recognition processor and the image processor, it is confirmed to be normal progression to the output of the speech recognition result if face to the image camera and speak. On the other hand, it is confirmed not to the output the result of the speech recognition if does not face to the image camera and speak. Also, the initial feature values under off-line are replaced by them. Similarly, the initial template image captured while off-line is replaced with a template image captured under on-line, so the discrimination of the lip movement image tracking is raised. An image processing test bed was implemented to confirm the lip movement image tracking process visually and to analyze the related parameters on a real-time basis. As a result of linking the speech and image processing system, the interworking rate shows 99.3% in the various illumination environments.

Speaker Change Detection by Normalization of Phonetic Characteristics (음소 특성 정규화를 통한 화자 변화 검출)

  • Kim Hyung Soon;Park Hae Young;Park Sun Young
    • MALSORI
    • /
    • no.47
    • /
    • pp.97-107
    • /
    • 2003
  • Speaker change detection is to detect automatically a point of time at which speaker was replaced. Since feature parameters used for speaker change detection depend not only on speaker characteristics but also on phonetic characteristics, spoken contents included in the feature parameters inevitably causes performance degradation of speaker change detection. In this paper, to alleviate this problem, a method to normalize phonetic variations in speech feature parameters is proposed for emphasizing changes due to speaker characteristics. Experimental results show that the proposed method improves the performance of speaker change detection.

  • PDF

Performance Analysis of A Variable Bit Rate Speech Coder (가변 비트율 음성 부호화기의 성능분석)

  • Iem, Byeong-Gwan
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.62 no.12
    • /
    • pp.1750-1754
    • /
    • 2013
  • A variable bit rate speech coder is presented. The coder is based on the observation that a speech signal can be viewed as a combination of piecewise linear signals in a short time period. The encoder detects the sample points where the slope of the signal changes, which are called the inflection points in this paper. The coder transmits the location and value for the detected inflection sample, but only the location information for the noninflection samples. In the decoder, the noninflection samples are estimated with interpolation of the received information. Several factors affecting the performance of the coder have been tested through simulation. Simulation results show that the linear interpolation produces 1 ~ 5 dB improvement over the cubic spline interpolation. And the -law companding does not provide any benefit when it is applied before the inflection detection. With low threshold values in the inflection point detection, the coder shows better MOS and more than 16 dB improvement in SNR compared to the continuously variable slope delta modulation (CVSDM).

Voice Activity Detection Algorithm Using Speech Periodicity and QSNR in Noisy Environment (음성의 주기성과 QSNR을 이용한 잡음환경에서의 음성검출 알고리즘)

  • Jeong, Ju-Hyun;Song, Hwa-Jeon;Kim, Hyung-Soon
    • Proceedings of the KSPS conference
    • /
    • 2005.11a
    • /
    • pp.59-62
    • /
    • 2005
  • Voice activity detection (VAD) is important in many areas of speech processing technology. Speech/nonspeech discrimination in noisy environments is a difficult task because the feature parameters used for the VAD are sensitive to the surrounding environments. Thus the VAD performance is severely degraded at low signal-to-noise ratios (SNRs). In this paper, a new VAD algorithm is proposed based on the degree of voicing and Quantile SNR (QSNR). These two feature parameters are more robust than other features such as energy and spectral entropy in noisy environments. The effectiveness of proposed algorithm is evaluated under the diverse noisy environments in the Aurora2 DB. According to out experiment, the proposed VAD outperforms the ETSI Advanced Frontend VAD.

  • PDF

Non-Dialog Section Detection for the Descriptive Video Service Contents Authoring (화면해설방송 저작을 위한 비 대사 구간 검출)

  • Jang, Inseon;Ahn, ChungHyun;Jang, Younseon
    • Journal of Broadcast Engineering
    • /
    • v.19 no.3
    • /
    • pp.296-306
    • /
    • 2014
  • This paper addresses a problem of non-dialog section detection for the DVS authoring, the goal of which is to find meaningful section from the broadcasting audio, where audio description can be inserted. The broadcasting audio involves the presence of various sounds so that it first discriminates between speech and non-speech for each audio frame. Proposed method jointly exploits the inter-channels structure and speech source characteristics of the broadcasting audio whose number of channel is stereo. Also, rule based post-processing is finally applied to detect the non-dialog section whose length is appropriate for audio description. Proposed method provides more accurate detection compared to conventional method. Experimental results on real broadcasting contents show that qualitative superiority of the proposed method.

A Nonuniform Sampling Technique and Its Application to Speech Coding (비균등 표본화 기법과 음성 부호화로의 응용)

  • Iem, Byeong-Gwan
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.1
    • /
    • pp.28-32
    • /
    • 2014
  • For a signal such as speech showing piece-wise linear shape in a very short time period, a nonuniform sampling method based on the inflection point detection (IPD) is proposed to reduce data rate. The method exploits the geometrical characteristics of signal further than the existing local maxima/minima detection (MMD) based sampling method. As results, the reconstructed signal by the interpolation of the IPD based sampled data resembles the original speech more. Computer simulation shows that the proposed IPD based method produces about 9~23 dB improvement over the existing MMD method. To show the usefulness of the IPD technique, it is applied to speech coding, and compared to the continuously variable slope delta modulation (CVSD). The nonuniformly sampled data is binary coded with one bit flag set "1". Noninflection samples are not sent, but only flag bits set 0 are sent. The method shows 0.3 ~ 9 dB SNR and 0.5 ~ 1.3 mean opinion score (MOS) improvements over the CVSD.

Speech Interface with Echo Canceller and Barge- In Functionality for Telematic System (텔레매틱스 시스템을 위한 반향제거 및 Barge-In 기능을 갖는 음성인터페이스)

  • Kim, Jun;Bae, Keun-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.5
    • /
    • pp.483-490
    • /
    • 2009
  • In this paper, we develop a speech interface that has acoustic echo cancelling and barge-in functionalities in the car environment. In the echo canceller, DT (Double-Talk) detection algorithm using the correlation coefficients between reference and desired signals can make DT detection errors often in the background noise. We reduce the DT detection errors by using the average power of noise and echo estimated from the input signal. In addition, to make it possible for drivers to give speech command to the system by interrupting the speaker output, barge-in functionality is implemented with the combination of DT detection and appropriate gain control of the speaker output. Through the computer simulation with the assumed car environment and experiment in the real laboratory environment, implemented speech interface has shown good performance in removing acoustic echo signals in the noisy environment with proper operation of barge-in functionality.