• Title/Summary/Keyword: Speech/Non-speech Detection

Search Result 46, Processing Time 0.024 seconds

Speech Recognition in Noisy Environments using the NOise Spectrum Estimation based on the Histogram Technique (히스토그램 처리방법에 의한 잡음 스펙트럼 추정을 이용한 잡음환경에서의 음성인식)

  • Kwon, Young-Uk;Kim, Hyung-Soon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.5
    • /
    • pp.68-75
    • /
    • 1997
  • Spectral subtraction is widely-used preprocessing technique for speech recognition in additive noise environments, but it requires a good estimate of the noise power spectrum. In this paper, we employ the histogram technique for the estimation of noise spectrum. This technique has advantages over other noise estimation methods in that it does not requires speech/non-speech detection and can estimate slowly-varying noise spectra. According to the speaker-independent isolated word recognition in both colored Gaussian and car noise environments under various SNR conditions. Histogram-technique-based spectral subtraction method yields superier performance to the one with conventional noise estimation method using the spectral average of initial frames during non-speech period.

  • PDF

Correlation analysis of antipsychotic dose and speech characteristics according to extrapyramidal symptoms (추체외로 증상에 따른 항정신병 약물 복용량과 음성 특성의 상관관계 분석)

  • Lee, Subin;Kim, Seoyoung;Kim, Hye Yoon;Kim, Euitae;Yu, Kyung-Sang;Lee, Ho-Young;Lee, Kyogu
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.3
    • /
    • pp.367-374
    • /
    • 2022
  • In this paper, correlation analysis between speech characteristics and the dose of antipsychotic drugs was performed. To investigate the pattern of speech characteristics of ExtraPyramidal Symptoms (EPS) related to voice change, a common side effect of antipsychotic drugs, a Korean-based extrapyramidal symptom speech corpus was constructed through the sentence development. Through this, speech patterns of EPS and non-EPS groups were investigated, and in particular, a strong speech feature correlation was shown in the EPS group. In addition, it was confirmed that the type of speech sentence affects the speech feature pattern, and these results suggest the possibility of early detection of antipsychotics-induced EPS based on the speech features.

The Effect of Visual Cues in the Identification of the English Consonants /b/ and /v/ by Native Korean Speakers (한국어 화자의 영어 양순음 /b/와 순치음 /v/ 식별에서 시각 단서의 효과)

  • Kim, Yoon-Hyun;Koh, Sung-Ryong;Valerie, Hazan
    • Phonetics and Speech Sciences
    • /
    • v.4 no.3
    • /
    • pp.25-30
    • /
    • 2012
  • This study investigated whether native Korean listeners could use visual cues for the identification of the English consonants /b/ and /v/. Both auditory and audiovisual tokens of word minimal pairs in which the target phonemes were located in word-initial or word-medial position were used. Participants were instructed to decide which consonant they heard in $2{\times}2$ conditions: cue (audio-only, audiovisual) and location (word-initial, word-medial). Mean identification scores were significantly higher for audiovisual than audio-only condition and for word-initial than word-medial condition. Also, according to signal detection theory, sensitivity, d', and response bias, c were calculated based on both hit rates and false alarm rates. The measures showed that the higher identification rate in the audiovisual condition was related with an increase in sensitivity. There were no significant differences in response bias measures across conditions. This result suggests that native Korean speakers can use visual cues while identifying confusing non-native phonemic contrasts. Visual cues can enhance non-native speech perception.

A Parametric Voice Activity Detection Based on the SPD-TE for Nonstationary Noises (비정체성 잡음을 위한 SPD-TE 기반 계수형 음성 활동 탐지)

  • Koo, Boneung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.34 no.4
    • /
    • pp.310-315
    • /
    • 2015
  • A single channel VAD (Voice Activity Detection) algorithm for nonstationary noise environment is proposed in this paper. Threshold values of the feature parameter for VAD decision are updated adaptively based on estimates of means and standard deviations of past non-speech frames. The feature parameter, SPD-TE (Spectral Power Difference-Teager Energy), is obtained by applying the Teager energy to the WPD (Wavelet Packet Decomposition) coefficients. It was reported previously that the SPD-TE is robust to noise as a feature for VAD. Experimental results by using TIMIT speech and NOISEX-92 noise databases show that decision accuracy of the proposed algorithm is comparable to several typical VAD algorithms including standards for SNR values ranging from 10 to -10 dB.

A Study on a Non-Voice Section Detection Model among Speech Signals using CNN Algorithm (CNN(Convolutional Neural Network) 알고리즘을 활용한 음성신호 중 비음성 구간 탐지 모델 연구)

  • Lee, Hoo-Young
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.6
    • /
    • pp.33-39
    • /
    • 2021
  • Speech recognition technology is being combined with deep learning and is developing at a rapid pace. In particular, voice recognition services are connected to various devices such as artificial intelligence speakers, vehicle voice recognition, and smartphones, and voice recognition technology is being used in various places, not in specific areas of the industry. In this situation, research to meet high expectations for the technology is also being actively conducted. Among them, in the field of natural language processing (NLP), there is a need for research in the field of removing ambient noise or unnecessary voice signals that have a great influence on the speech recognition recognition rate. Many domestic and foreign companies are already using the latest AI technology for such research. Among them, research using a convolutional neural network algorithm (CNN) is being actively conducted. The purpose of this study is to determine the non-voice section from the user's speech section through the convolutional neural network. It collects the voice files (wav) of 5 speakers to generate learning data, and utilizes the convolutional neural network to determine the speech section and the non-voice section. A classification model for discriminating speech sections was created. Afterwards, an experiment was conducted to detect the non-speech section through the generated model, and as a result, an accuracy of 94% was obtained.

Speech Enhancement Based on Modified IMCRA Using Spectral Minima Tracking with Weighted Subband Selection (서브밴드 가중치를 적용한 스펙트럼 최소값 추적을 이용하는 수정된 IMCRA 기반의 음성 향상 기법)

  • Park, Yun-Sik;Park, Gyu-Seok;Lee, Sang-Min
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.49 no.3
    • /
    • pp.89-97
    • /
    • 2012
  • In this paper, we propose a novel approach to noise power estimation for speech enhancement in noisy environments. The method based on IMCRA (improved minima controlled recursive averaging) which is widely used in speech enhancement utilizes a rough VAD (voice activity detection) algorithm which excludes speech components during speech periods in order to improves the performance of the noise power estimation by reducing the speech distortion caused by the conventional algorithm based on the minimum power spectrum derived from the noisy speech. However, since the VAD algorithm is not sufficient to distinguish speech from noise at non-stationary noise and low SNRs (signal-to-noise ratios), the speech distortion resulted from the minimum tracking during speech periods still remained. In the proposed method, minimum power estimate obtained by IMCRA is modified by SMT (spectral minima tracking) to reduce the speech distortion derived from the bias of the estimated minimum power. In addition, in order to effectively estimate minimum power by considering the distribution characteristic of the speech and noise spectrum, the presented method combines the minimum estimates provided by IMCRA and SMT depending on the weighting factor based on the subband. Performance of the proposed algorithm is evaluated by subjective and objective quality tests under various environments and better results compared with the conventional method are obtained.

Speech Compression by Non-uniform Sampling at the maxima and minima (극대 및 극소점에서의 비균일 표본화에 의한 음성압축)

  • Rheem, Jae-Yeol;Baek, Sung-Joon;Ann, Sou-Guil;Kim, Bum-Hoon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.11 no.4
    • /
    • pp.36-44
    • /
    • 1992
  • To reduce the redundancy within samples that resulted from uniform sampling method, nonuniform sampling or nonredundant-sample coding methods can be considered. But it is well-known that when conventional nonuniform sampling methods are applied directly to speech signal, the amount of data required is comparable to or more than that required by uniform sampling method like PCM. To overcome this problem, we consider properties of speech signal in the sense of perception, and suggest a nonuniform sampling method at the maxima and minima of speech wave. To analyze the performance of the suggested method, compression ratio is considered. We show that compression ratio can be improved by silence detection, which can't be implemented by conventional methods based on uniform sampling. As experimental results, compression ratios of 1.54 without silence detection and 2.88 with silence detection for 8kHz 8-bit PCM signals are obtained.

  • PDF

Speech Enhancement Based on Improved Minima Controlled Recursive Averaging Incorporating GSAP (전역 음성 부재 확률 기반의 향상된 최소값 제어 재귀평균기법을 이용한 음성 향상 기법)

  • Song, Ji-Hyun;Bang, Dong-Hyeouck;Lee, Sang-Min
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.49 no.1
    • /
    • pp.104-111
    • /
    • 2012
  • In this paper, we propose a novel method to improve the performance of the improved minima controlled recursive averaging (IMCRA). From an examination for various noise environment, it is shown that the IMCRA has a fundamental drawback for the noise power estimate at the offset region of continuity speech signals. Espectially, it is difficult to obtain the robust estimates of the noise power in non-stationary noisy environments that is rapidly changed the spectral characteristics such as babble noise. To overcome the drawback, we apply the global speech absence probability (GSAP) conditioned on both a priori SNR and a posteriori SNR to the speech detection algorithm of IMCRA. With the performance criteria of the ITU-T P.862 perceptual evaluation of speech quality (PESQ) and a composite measure test, we show that the proposed algorithm yields better results compared to the conventional IMCRA-based scheme under various noise environments. In particular, in the case of babble 5 dB, the proposed method produced a remarkable improvement compared to the IMCRA ( PESQ = 0.026, composite measure = 0.029 ).

Statistical Model-Based Voice Activity Detection Using the Second-Order Conditional Maximum a Posteriori Criterion with Adapted Threshold (적응형 문턱값을 가지는 2차 조건 사후 최대 확률을 이용한 통계적 모델 기반의 음성 검출기)

  • Kim, Sang-Kyun;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.1
    • /
    • pp.76-81
    • /
    • 2010
  • In this paper, we propose a novel approach to improve the performance of a statistical model-based voice activity detection (VAD) which is based on the second-order conditional maximum a posteriori (CMAP). In our approach, the VAD decision rule is expressed as the geometric mean of likelihood ratios (LRs) based on adapted threshold according to the speech presence probability conditioned on both the current observation and the speech activity decisions in the pervious two frames. Experimental results show that the proposed approach yields better results compared to the statistical model-based and the CMAP-based VAD using the LR test.

Noise Reduction using Spectral Subtraction in the Discrete Wavelet Transform Domain (이산 웨이브렛 변환영역에서의 스펙트럼 차감법을 이용한 잡음제거)

  • 김현기;이상운;홍재근
    • Journal of Korea Multimedia Society
    • /
    • v.4 no.4
    • /
    • pp.306-315
    • /
    • 2001
  • In noise reduction method from noisy speech for speech recognition in noisy environments, conventional spectral subtraction method has a disadvantage which distinction of noise and speech is difficult, and characteristic of noise can't be estimated accurately. Also, noise reduction method in the wavelet transform domain has a disadvantage which loss of signal is generated in the high frequency domain. In order to compensate theme disadvantage, this paper propose spectral subtraction method in continuous wavelet transform domain which speech and non- speech intervals is distinguished by standard deviation of wavelet coefficient, and signal is divided three scales at different scale. The proposed method extract accurately characteristic of noise in order to apply spectral subtraction method by end detection and band division. The proposed method shows better performance than noise reduction method using conventional spectral subtraction and wavelet transform from viewpoint signal to noise ratio and Itakura-Saito distance by experimental.

  • PDF