• Title/Summary/Keyword: Speech Detection

Search Result 465, Processing Time 0.038 seconds

Speech detection using the probability of spectral occupancy (주파수 차지확률을 이용한 음성검출기 제안)

  • Hong, Seong-Bong;Ki, Tae-Young;Kim, Nam-Soo;Kim, Taejeong
    • Proceedings of the IEEK Conference
    • /
    • 2000.09a
    • /
    • pp.171-174
    • /
    • 2000
  • In this paper, we improve statistical-model-based speech detector using the probability that a speech occupies a frequency bin. While the previous method assumes speech energy occupies all the frequency components and use them with equal weights in the likelihood ratio test for speech detection, the proposed method assumes speech energy occupies just some frequency componets and use them with different weights in accordance with the probabilities of spectral occupancy in the test. The probability is iteratively up-dated for speech frames to contribute to the likelihood ratio test. The proposed method well reflects the characteristic distribution of speech spectrum, and yields better detection performance.

  • PDF

Detection of Glottal Closure Instant for Voiced Speech Using Wavelet Transform (웨이브렛 변환을 이용한 음성신호의 성문폐쇄시점 검출)

  • Bae, Keun-Sung
    • Speech Sciences
    • /
    • v.7 no.3
    • /
    • pp.153-165
    • /
    • 2000
  • During the phonation of voiced sounds, instants exist where the glottis is opened or closed, due to the periodic vibration of the vocal cord. When closed, this is called the glottal closure instant(GCI) or epoch.. The correct detection of the GCI is one of the important problems in speech processing for pitch detection, pitch synchronous analysis, and so on. Recently, it has been shown that the local maxima points of the wavelet transformed speech signal correspond to the GCIs of speech signal. In this paper, we investigate the accuracy of Gels estimated from this wavelet transformed speech signal. For this purpose we compare them with the negative peak points of the differentiated EGG signal that represents the actual GCIs of speech signal.

  • PDF

Knowledge-driven speech features for detection of Korean-speaking children with autism spectrum disorder

  • Seonwoo Lee;Eun Jung Yeo;Sunhee Kim;Minhwa Chung
    • Phonetics and Speech Sciences
    • /
    • v.15 no.2
    • /
    • pp.53-59
    • /
    • 2023
  • Detection of children with autism spectrum disorder (ASD) based on speech has relied on predefined feature sets due to their ease of use and the capabilities of speech analysis. However, clinical impressions may not be adequately captured due to the broad range and the large number of features included. This paper demonstrates that the knowledge-driven speech features (KDSFs) specifically tailored to the speech traits of ASD are more effective and efficient for detecting speech of ASD children from that of children with typical development (TD) than a predefined feature set, extended Geneva Minimalistic Acoustic Standard Parameter Set (eGeMAPS). The KDSFs encompass various speech characteristics related to frequency, voice quality, speech rate, and spectral features, that have been identified as corresponding to certain of their distinctive attributes of them. The speech dataset used for the experiments consists of 63 ASD children and 9 TD children. To alleviate the imbalance in the number of training utterances, a data augmentation technique was applied to TD children's utterances. The support vector machine (SVM) classifier trained with the KDSFs achieved an accuracy of 91.25%, surpassing the 88.08% obtained using the predefined set. This result underscores the importance of incorporating domain knowledge in the development of speech technologies for individuals with disorders.

Application of Preemphasis FIR Filtering To Speech Detection and Phoneme Segmentation (프리엠퍼시스 FIR 필터링의 음성 검출 및 음소 분할에의 응용)

  • Lee, Chang-Young
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.8 no.5
    • /
    • pp.665-670
    • /
    • 2013
  • In this paper, we propose a new method of speech detection and phoneme segmentation. We investigate the effect of applying preemphasis FIR filtering on the speech signal before the usual speech detection that utilizes the energy profile for discriminating signals from background noise. By this procedure, only the speech section of low energy and frequency becomes distinct in energy profile. It is verified experimentally that the silence/speech boundary becomes sharper by applying the filtering compared to the conventional method. By applications of this procedure, phoneme segmentation is also found to be much facilitated.

Voice Activity Detection Based on Entropy in Noisy Car Environment (차량 잡음 환경에서 엔트로피 기반의 음성 구간 검출)

  • Roh, Yong-Wan;Lee, Kue-Bum;Lee, Woo-Seok;Hong, Kwang-Seok
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.9 no.2
    • /
    • pp.121-128
    • /
    • 2008
  • Accurate voice activity detection have a great impact on performance of speech applications including speech recognition, speech coding, and speech communication. In this paper, we propose methods for voice activity detection that can adapt to various car noise situations during driving. Existing voice activity detection used various method such as time energy, frequency energy, zero crossing rate, and spectral entropy that have a weak point of rapid. decline performance in noisy environments. In this paper, the approach is based on existing spectral entropy for VAD that we propose voice activity detection method using MFB(Met-frequency filter banks) spectral entropy, gradient FFT(Fast Fourier Transform) spectral entropy. and gradient MFB spectral entropy. FFT multiplied by Mel-scale is MFB and Mel-scale is non linear scale when human sound perception reflects characteristic of speech. Proposed MFB spectral entropy method clearly improve the ability to discriminate between speech and non-speech for various in noisy car environments that achieves 93.21% accuracy as a result of experiments. Compared to the spectral entropy method, the proposed voice activity detection gives an average improvement in the correct detection rate of more than 3.2%.

  • PDF

A Study on the Endpoint Detection Algorithm (끝점 검출 알고리즘에 관한 연구)

  • 양진우
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1984.12a
    • /
    • pp.66-69
    • /
    • 1984
  • This paper is a study on the Endpoint Detection for Korean Speech Recognition. In speech signal process, analysis parameter was classification from Zero Crossing Rate(Z.C.R), Log Energy(L.E), Energy in the predictive error(Ep) and fundamental Korean Speech digits, /영/-/구/ are selected as date for the Recognition of Speech. The main goal of this paper is to develop techniques and system for Speech input ot machine. In order to detect the Endpoint, this paper makes choice of Log Energy(L.E) from various parameters analysis, and the Log Energy is very effective parameter in classifying speech and nonspeech segments. The error rate of 1.43% result from the analysis.

  • PDF

A Low Bit Rate Speech Coder Based on the Inflection Point Detection

  • Iem, Byeong-Gwan
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.15 no.4
    • /
    • pp.300-304
    • /
    • 2015
  • A low bit rate speech coder based on the non-uniform sampling technique is proposed. The non-uniform sampling technique is based on the detection of inflection points (IP). A speech block is processed by the IP detector, and the detected IP pattern is compared with entries of the IP database. The address of the closest member of the database is transmitted with the energy of the speech block. In the receiver, the decoder reconstructs the speech block using the received address and the energy information of the block. As results, the coder shows fixed data rate contrary to the existing speech coders based on the non-uniform sampling. Through computer simulation, the usefulness of the proposed technique is shown. The SNR performance of the proposed method is approximately 5.27 dB with the data rate of 1.5 kbps.

Robust Feature Extraction for Voice Activity Detection in Nonstationary Noisy Environments (음성구간검출을 위한 비정상성 잡음에 강인한 특징 추출)

  • Hong, Jungpyo;Park, Sangjun;Jeong, Sangbae;Hahn, Minsoo
    • Phonetics and Speech Sciences
    • /
    • v.5 no.1
    • /
    • pp.11-16
    • /
    • 2013
  • This paper proposes robust feature extraction for accurate voice activity detection (VAD). VAD is one of the principal modules for speech signal processing such as speech codec, speech enhancement, and speech recognition. Noisy environments contain nonstationary noises causing the accuracy of the VAD to drastically decline because the fluctuation of features in the noise intervals results in increased false alarm rates. In this paper, in order to improve the VAD performance, harmonic-weighted energy is proposed. This feature extraction method focuses on voiced speech intervals and weighted harmonic-to-noise ratios to determine the amount of the harmonicity to frame energy. For performance evaluation, the receiver operating characteristic curves and equal error rate are measured.

Performance Evaluation of Novel AMDF-Based Pitch Detection Scheme

  • Kumar, Sandeep
    • ETRI Journal
    • /
    • v.38 no.3
    • /
    • pp.425-434
    • /
    • 2016
  • A novel average magnitude difference function (AMDF)-based pitch detection scheme (PDS) is proposed to achieve better performance in speech quality. A performance evaluation of the proposed PDS is carried out through both a simulation and a real-time implementation of a speech analysis-synthesis system. The parameters used to compare the performance of the proposed PDS with that of PDSs that are based on either a cepstrum, an autocorrelation function (ACF), an AMDF, or circular AMDF (CAMDF) methods are as follows: percentage gross pitch error (%GPE); a subjective listening test; an objective speech quality assessment; a speech intelligibility test; a synthesized speech waveform; computation time; and memory consumption. The proposed PDS results in lower %GPE and better synthesized speech quality and intelligibility for different speech signals as compared to the cepstrum-, ACF-, AMDF-, and CAMDF-based PDSs. The computational time of the proposed PDS is also less than that for the cepstrum-, ACF-, and CAMDF-based PDSs. Moreover, the total memory consumed by the proposed PDS is less than that for the ACF- and cepstrum-based PDSs.