Search | Korea Science

Spectral Subtraction Using Spectral Harmonics for Robust Speech Recognition in Car Environments

Beh, Jounghoon;Ko, Hanseok
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.2E
- /
- pp.62-68
- /
- 2003
This paper addresses a novel noise-compensation scheme to solve the mismatch problem between training and testing condition for the automatic speech recognition (ASR) system, specifically in car environment. The conventional spectral subtraction schemes rely on the signal-to-noise ratio (SNR) such that attenuation is imposed on that part of the spectrum that appears to have low SNR, and accentuation is made on that part of high SNR. However, these schemes are based on the postulation that the power spectrum of noise is in general at the lower level in magnitude than that of speech. Therefore, while such postulation is adequate for high SNR environment, it is grossly inadequate for low SNR scenarios such as that of car environment. This paper proposes an efficient spectral subtraction scheme focused specifically to low SNR noisy environment by extracting harmonics distinctively in speech spectrum. Representative experiments confirm the superior performance of the proposed method over conventional methods. The experiments are conducted using car noise-corrupted utterances of Aurora2 corpus.
PDF KSCI

Preprocessing Technique for Improvement of Speech Recognition in a Car (차량에서의 음성인식율 향상을 위한 전처리 기법)

Kim, Hyun-Tae;Park, Jang-Sik
- The Journal of the Korea Contents Association
- /
- v.9 no.1
- /
- pp.139-146
- /
- 2009
This paper addresses a modified spectral subtraction schemes which is suitable to speech recognition under low signal-to-noise ratio (SNR) noisy environment such as the automatic speech recognition (ASR) system in car. The conventional spectral subtraction schemes rely on the SNR such that attenuation is imposed on that part of the spectrum that appears to have low SNR, and accentuation is made on that part of high SNR. However, such postulation is adequate for high SNR environment, it is grossly inadequate for low SNR scenarios such as that of car environment. Proposed methods focused specifically to low SNR noisy environment by using weighting function for enhancing speech dominant region in speech spectrum. Experimental results by using voice commands for car show the superior performance of the proposed method over conventional methods.
https://doi.org/10.5392/JKCA.2009.9.1.139 인용 PDF

MFSK Signal Individual Identification Algorithm Based on Bi-spectrum and Wavelet Analyses

Ye, Fang;Chen, Jie;Li, Yibing;Ge, Juan
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.10 no.10
- /
- pp.4808-4824
- /
- 2016
Signal individual reconnaissance and identification is an extremely important research topic in non-cooperative domains such as electronic countermeasures and intelligence reconnaissance. Facing the characteristics of the complexity and changeability of current communication environment, how to realize radiation source signal individual identification under the low SNR conditions is an emphasis of research. A novel emitter individual identification method combined bi-spectrum analysis with wavelet feature is presented in this paper. It makes a feature fusion of bi-spectrum slice characteristics and energy variance characteristics of the secondary wavelet transform coefficient to identify MFSK signals under the low SNR (signal-to-noise ratios) environment. Theoretical analyses and computer simulation results show that the proposed algorithm has good recognition performance with the ability to suppress noise and interference, and reaches the recognition rate of more than 90% when the SNR is -6dB.
https://doi.org/10.3837/tiis.2016.10.010 인용 PDF KSCI KPUBS HTML

Robust Distributed Speech Recognition under noise environment using MESS and EH-VAD (멀티밴드 스펙트럼 차감법과 엔트로피 하모닉을 이용한 잡음환경에 강인한 분산음성인식)

Choi, Gab-Keun;Kim, Soon-Hyob
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.48 no.1
- /
- pp.101-107
- /
- 2011
The background noises and distortions by channel are major factors that disturb the practical use of speech recognition. Usually, noise reduce the performance of speech recognition system DSR(Distributed Speech Recognition) based speech recognition also bas difficulty of improving performance for this reason. Therefore, to improve DSR-based speech recognition under noisy environment, this paper proposes a method which detects accurate speech region to extract accurate features. The proposed method distinguish speech and noise by using entropy and detection of spectral energy of speech. The speech detection by the spectral energy of speech shows good performance under relatively high SNR(SNR 15dB). But when the noise environment varies, the threshold between speech and noise also varies, and speech detection performance reduces under low SNR(SNR 0dB) environment. The proposed method uses the spectral entropy and harmonics of speech for better speech detection. Also, the performance of AFE is increased by precise speech detections. According to the result of experiment, the proposed method shows better recognition performance under noise environment.
PDF KSCI

Robust Voice Activity Detection in Noisy Environment Using Entropy and Harmonics Detection (엔트로피와 하모닉 검출을 이용한 잡음환경에 강인한 음성검출)

Choi, Gab-Keun;Kim, Soon-Hyob
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.47 no.1
- /
- pp.169-174
- /
- 2010
This paper explains end-point detection method for better speech recognition rates. The proposed method determines speech and non-speech region with the entropy and the harmonic detection of speech. The end-point detection using entropy on the speech spectral energy has good performance at the high SNR(SNR 15dB) environments. At the low SNR environment(SNR 0dB), however, the threshold level of speech and noise varies, so the precise end-point detection is difficult. Therefore, this paper introduces the end-point detection methods which uses speech spectral entropy and harmonics. Experiment shows better performance than the conventional entropy methods.
PDF KSCI

Improvement of Signal-to-Noise Ratio for Speech under Noisy Environment (잡음환경 하에서의 음성의 SNR 개선)

Choi, Jae-Seung
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.17 no.7
- /
- pp.1571-1576
- /
- 2013
This paper proposes an improvement algorithm of signal-to-noise ratios (SNRs) for speech signals under noisy environments. The proposed algorithm first estimates the SNRs in a low SNR, mid SNR and high SNR areas, in order to improve the SNRs in the speech signal from background noise, such as white noise and car noise. Thereafter, this algorithm subtracts the noise signal from the noisy speech signal at each bands using a spectrum sharpening method. In the experiment, good signal-to-noise ratios (SNR) are obtained for white noise and car noise compared with a conventional spectral subtraction method. From the experiment results, the maximal improvement in the output SNR results was approximately 4.2 dB and 3.7 dB better for white noise and car noise compared with the results of the spectral subtraction method, in the background noisy environment, respectively.
https://doi.org/10.6109/jkiice.2013.17.7.1571 인용 PDF KSCI

Voice Activity Detection Method Using Psycho-Acoustic Model Based on Speech Energy Maximization in Noisy Environments (잡음 환경에서 심리음향모델 기반 음성 에너지 최대화를 이용한 음성 검출 방법)

Choi, Gab-Keun;Kim, Soon-Hyob
- The Journal of the Acoustical Society of Korea
- /
- v.28 no.5
- /
- pp.447-453
- /
- 2009
This paper introduces the method for detect voices and exact end point at low SNR by maximizing voice energy. Conventional VAD (Voice Activity Detection) algorithm estimates noise level so it tends to detect the end point inaccurately. Moreover, because it uses relatively long analysis range for reflecting temporal change of noise, computing load too high for application. In this paper, the SEM-VAD (Speech Energy Maximization-Voice Activity Detection) method which uses psycho-acoustical bark scale filter banks to maximize voice energy within frames is introduced. Stable threshold values are obtained at various noise environments (SNR 15 dB, 10 dB, 5 dB, 0 dB). At the test for voice detection in car noisy environment, PHR (Pause Hit Rate) was 100%accurate at every noise environment, and FAR (False Alarm Rate) shows 0% at SNR15 dB and 10 dB, 5.6% at SNR5 dB and 9.5% at SNR0 dB.
https://doi.org/10.7776/ASK.2009.28.5.447 인용 PDF KSCI

Voice Activity Detection Algorithm using Fuzzy Membership Shifted C-means Clustering in Low SNR Environment (낮은 신호 대 잡음비 환경에서의 퍼지 소속도 천이 C-means 클러스터링을 이용한 음성구간 검출 알고리즘)

Lee, G.H.;Lee, Y.J.;Cho, J.H.;Kim, M.N.
- Journal of Korea Multimedia Society
- /
- v.17 no.3
- /
- pp.312-323
- /
- 2014
Voice activity detection is very important process that find voice activity from noisy speech signal for noise cancelling and speech enhancement. Over the past few years, many studies have been made on voice activity detection, it has poor performance for speech signal of sentence form in a low SNR environment. In this paper, it proposed new voice activity detection algorithm that has beginning VAD process using entropy and main VAD process using fuzzy membership shifted c-means clustering. We conduct an experiment in various SNR environment of white noise to evaluate performance of the proposed algorithm and confirmed good performance of the proposed algorithm.
https://doi.org/10.9717/kmms.2014.17.3.312 인용 PDF KSCI KPUBS HTML

Study of Target Tracking Algorithm using iterative Joint Integrated Probabilistic Data Association in Low SNR Multi-Target Environments (낮은 SNR 다중 표적 환경에서의 iterative Joint Integrated Probabilistic Data Association을 이용한 표적추적 알고리즘 연구)

Kim, Hyung-June;Song, Taek-Lyul
- Journal of the Korea Institute of Military Science and Technology
- /
- v.23 no.3
- /
- pp.204-212
- /
- 2020
For general target tracking works by receiving a set of measurements from sensor. However, if the SNR(Signal to Noise Ratio) is low due to small RCS(Radar Cross Section), caused by remote small targets, the target's information can be lost during signal processing. TBD(Track Before Detect) is an algorithm that performs target tracking without threshold for detection. That is, all sensor data is sent to the tracking system, which prevents the loss of the target's information by thresholding the signal intensity. On the other hand, using all sensor data inevitably leads to computational problems that can severely limit the application. In this paper, we propose an iterative Joint Integrated Probabilistic Data Association as a practical target tracking technique suitable for a low SNR multi-target environment with real time operation capability, and verify its performance through simulation studies.
https://doi.org/10.9766/KIMST.2020.23.3.204 인용 PDF KSCI

Quantitative Evaluation of the Performance of Monaural FDSI Beamforming Algorithm using a KEMAR Mannequin (KEMAR 마네킹을 이용한 단이 보청기용 FDSI 빔포밍 알고리즘의 정량적 평가)

Cho, Kyeongwon;Nam, Kyoung Won;Han, Jonghee;Lee, Sangmin;Kim, Dongwook;Hong, Sung Hwa;Jang, Dong Pyo;Kim, In Young
- Journal of Biomedical Engineering Research
- /
- v.34 no.1
- /
- pp.24-33
- /
- 2013
To enhance the speech perception of hearing aid users in noisy environment, most hearing aid devices adopt various beamforming algorithms such as the first-order differential microphone (DM1) and the two-stage directional microphone (DM2) algorithms that maintain sounds from the direction of the interlocutor and reduce the ambient sounds from the other directions. However, these conventional algorithms represent poor directionality ability in low frequency area. Therefore, to enhance the speech perception of hearing aid uses in low frequency range, our group had suggested a fractional delay subtraction and integration (FDSI) algorithm and estimated its theoretical performance using computer simulation in previous article. In this study, we performed a KEMAR test in non-reverberant room that compares the performance of DM1, DM2, broadband beamforming (BBF), and proposed FDSI algorithms using several objective indices such as a signal-to-noise ratio (SNR) improvement, a segmental SNR (seg-SNR) improvement, a perceptual evaluation of speech quality (PESQ), and an Itakura-Saito measure (IS). Experimental results showed that the performance of the FDSI algorithm was -3.26-7.16 dB in SNR improvement, -1.94-5.41 dB in segSNR improvement, 1.49-2.79 in PESQ, and 0.79-3.59 in IS, which demonstrated that the FDSI algorithm showed the highest improvement of SNR and segSNR, and the lowest IS. We believe that the proposed FDSI algorithm has a potential as a beamformer for digital hearing aid devices.
https://doi.org/10.9718/JBER.2013.34.1.24 인용 PDF KSCI

Search Result 81, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)