Search | Korea Science

Accurate Speech Detection based on Sub-band Selection for Robust Keyword Recognition (강인한 핵심어 인식을 위해 유용한 주파수 대역을 이용한 음성 검출기)

Ji Mikyong;Kim Hoirin
- Proceedings of the KSPS conference
- /
- 2002.11a
- /
- pp.183-186
- /
- 2002
The speech detection is one of the important problems in real-time speech recognition. The accurate detection of speech boundaries is crucial to the performance of speech recognizer. In this paper, we propose a speech detector based on Mel-band selection through training. In order to show the excellence of the proposed algorithm, we compare it with a conventional one, so called, EPD-VAA (EndPoint Detector based on Voice Activity Detection). The proposed speech detector is trained in order to better extract keyword speech than other speech. EPD-VAA usually works well in high SNR but it doesn't work well any more in low SNR. But the proposed algorithm pre-selects useful bands through keyword training and decides the speech boundary according to the energy level of the sub-bands that is previously selected. The experimental result shows that the proposed algorithm outperforms the EPD-VAA.
PDF

Statistical Voice Activity Defector Based on Signal Subspace Model (신호 준공간 모델에 기반한 통계적 음성 검출기)

Ryu, Kwang-Chun;Kim, Dong-Kook
- The Journal of the Acoustical Society of Korea
- /
- v.27 no.7
- /
- pp.372-378
- /
- 2008
Voice activity detectors (VAD) are important in wireless communication and speech signal processing, In the conventional VAD methods, an expression for the likelihood ratio test (LRT) based on statistical models is derived in discrete Fourier transform (DFT) domain, Then, speech or noise is decided by comparing the value of the expression with a threshold, This paper presents a new statistical VAD method based on a signal subspace approach, The probabilistic principal component analysis (PPCA) is employed to obtain a signal subspace model that incorporates probabilistic model of noisy signal to the signal subspace method, The proposed approach provides a novel decision rule based on LRT in the signal subspace domain, Experimental results show that the proposed signal subspace model based VAD method outperforms those based on the widely used Gaussian distribution in DFT domain.
https://doi.org/10.7776/ASK.2008.27.7.372 인용 PDF KSCI

Design of a Statistical Model Based Voice Activity Detector (통계적 모델에 근거한 음성 검출기의 설계)

손종서
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1998.08a
- /
- pp.465-469
- /
- 1998
가변 전송율 음성 부호화기를 위한 음성 검출기를 통계적 모델을 적용하여 설계한다. 제안된 음성 검출기는 음성 파라미터를 decision-directed 방식으로 추정함으로써 LRT를 이용하여 동작 특성이 우수한 판정 규칙을 유도한다. 또한 음성 발생 사건들을 1차의 Markov process 로 모델링 함으로써 과거의 관찰들을 현재 프레임의 음성 검출 과정에서 고려할 수 있는 행오버 알고리즘을 개발한다. 개발된 음성 검출기는 고려된 실험환경에서 ITU-T 표준인 G.729 Annex B 음성 검출기보다 맹 우수한 성능을 나타내었다.
PDF

A New Statistical Voice Activity Detector Based on UMP Test (UMP 테스트에 근거한 새로운 통계적 음성검출기)

Jang, Keun-Won;Chang, Joon-Hyuk;Kim, Dong-Kook
- The Journal of the Acoustical Society of Korea
- /
- v.26 no.1
- /
- pp.16-24
- /
- 2007
Voice activity detectors (VADs) are important in wireless communication and speech signal processing. In the conventional VAD methods. an expression for the likelihood ratio test (LRT) based on statistical models is derived. Then, speech or noise is decided by comparing the value of the expression with a threshold. We propose a new method with the modified decision rule based on the Gaussian distribution and the uniformly most power (UMP) test. This method requires the distribution of the absolute value of the incoming speech signal. Then we can obtain the final decision through the relation between the Rayleigh distributions. This VAD method can detect speech without a priori signal-to-noise ratio (SNR) which is required in the conventional VAD algorithms. Additionally, in the various VAD performance tests, the proposed VAD method is shown to be more effective than the traditional scheme.
https://doi.org/10.7776/ASK.2007.26.1.016 인용 PDF KSCI

Dimension Reduction Method of Speech Feature Vector for Real-Time Adaptation of Voice Activity Detection (음성구간 검출기의 실시간 적응화를 위한 음성 특징벡터의 차원 축소 방법)

Park Jin-Young;Lee Kwang-Seok;Hur Kang-In
- Journal of the Institute of Convergence Signal Processing
- /
- v.7 no.3
- /
- pp.116-121
- /
- 2006
In this paper, we propose the dimension reduction method of multi-dimension speech feature vector for real-time adaptation procedure in various noisy environments. This method which reduces dimensions non-linearly to map the likelihood of speech feature vector and noise feature vector. The LRT(Likelihood Ratio Test) is used for classifying speech and non-speech. The results of implementation are similar to multi-dimensional speech feature vector. The results of speech recognition implementation of detected speech data are also similar to multi-dimensional(10-order dimensional MFCC(Mel-Frequency Cepstral Coefficient)) speech feature vector.
PDF

Improvement of VAD Performance for the Reduction of the Bit Rate Under the Noise Environment in the G.723.1 (잡음 환경에서의 전송률 감소를 위한 G.723.1 음성활동 검출기 성능 개선에 관한 연구)

김정진;장경아;배명진
- The Journal of the Acoustical Society of Korea
- /
- v.20 no.3
- /
- pp.42-47
- /
- 2001
This paper improves the performance of VAD (Voice Activity Detector) in G.723.1 Annex A 6.3kbps/5.3kbps dual rate speech coder, which is developed for Internet Phone and videoconferencing. The VAD decision is based on a three-level energy threshold. We evaluates for processing time, speech quality, and bit rate. The processing time is reduced due to the accuracy of VAD decision on the silence period. On subjective quality test there is almost no difference compared with the G.723.1. In order to measure the bit rate we count the active speech frame (VAD=1) and we can reduce more bit rate as silence periods are shown.
PDF

Adaptive Multi-Rate(AMR) Speech Coding Algorithm (Adaptive Multi-Rate(AMR) 음성부호화 알고리즘)

서정욱;배건성
- Proceedings of the IEEK Conference
- /
- 2000.06d
- /
- pp.92-97
- /
- 2000
An AMR(Adaptive Multi-Rate) speech coding algorithm has been adopted as a standard speech codec for IMT-2000. It is based on the algebraic CELP, and consists of eight speech coding modes having the bit rate from 4.75 kbit/s to 12.2 kbit/s. It also contains the VAD(Voice Activity Detector), SCR (Source Controlled Rate) operation, and error concealment scheme for robustness in a radio channel. The bit rate of AMR is changed on a frame basis depending on the channel condition. In this paper, we introduced AMR speech coding algorithm and performed the real-time implementation using TMS320C6201, i.e., a Texas Instrument's fixed-point DSP. With the ANSI C source code released from ETSI and 3GPP, we convert and optimize the program to make it run in real time using the C compiler and assembly language. It is verified that the decoded result of the implemented speech codec on the DSP is identical with the PC simulation result using ANSI C code for test sequences. Also, actual sound input/output test using microphone and speaker demonstrates its proper real-time operation without distortions or delays.
PDF

A Simple Speech/Non-speech Classifier Using Adaptive Boosting

Kwon, Oh-Wook;Lee, Te-Won
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.3E
- /
- pp.124-132
- /
- 2003
We propose a new method for speech/non-speech classifiers based on concepts of the adaptive boosting (AdaBoost) algorithm in order to detect speech for robust speech recognition. The method uses a combination of simple base classifiers through the AdaBoost algorithm and a set of optimized speech features combined with spectral subtraction. The key benefits of this method are the simple implementation, low computational complexity and the avoidance of the over-fitting problem. We checked the validity of the method by comparing its performance with the speech/non-speech classifier used in a standard voice activity detector. For speech recognition purpose, additional performance improvements were achieved by the adoption of new features including speech band energies and MFCC-based spectral distortion. For the same false alarm rate, the method reduced 20-50% of miss errors.
PDF KSCI

Improvement of VAD Performance using the LSP Variation in the G.723.1 (LSP변화도를 이용한 G-723.1 보코더의 VAD 성능향상에 관한 연구)

LEE HeeWon;NA Ducksu;BAE MyungJin
- Proceedings of the Acoustical Society of Korea Conference
- /
- autumn
- /
- pp.19-22
- /
- 2000
ITU-T 국제 표준화 기구에서 인터넷 폰과 화상회의를 목적으로 개발된 G.723.1 음성 부호화기는 잡음 구간에서의 전송률을 낮추기 위한 방법으로 VAD(Voice Activity Detector)와 CNG(Comfortable Noise Generator)를 사용하고 있다. 이중 VAD는 최종적으로 현재 프레임의 에너지 레벨을 비교하여 음성의 활동 유무를 판정하고 있다. 하지만 G.723.1 VAD에서는 보다 안정적인 판정을 위해 음성 활동 구간 사이에 삽입되어 있는 묵음 구간에 대해서는 거의 대부분 음성이 활동하는 영역으로 판정을 하고 있다. 따라서 본 논문에서는 묵음 구간에 대해 보다 정확한 판정을 통하여 기존의 방법에 비해 전송률을 더욱 감소시킬 수 있는 방법을 제안한다. 제안한 방법은 음성신호와 잡음신호의 LSP 파라미터 간격 정보를 이용하여 음성구간을 검출한다. 묵음구간을 길게 조절한 문장을 사용하여 실험한 결과 VAD=1로 판정한 프레임수가 약 $48.98\%$ 감소하였으며 주관적인 음질평가의 경우 음질의 열하는 거의 발생하지 않았다.
PDF

Implementation of Hands-Free Phone in a Car Using DSP (DSP를 이용한 차량용 핸즈프리 전화기의 구현)

Hong, Ki-Jun;Roh, Yi-Ju;Jeong, Kyung-Hoon;Kang, Dong-Wook;Yun, Kee-Bang;Kim, Ki-Doo
- 전자공학회논문지 IE
- /
- v.44 no.4
- /
- pp.1-10
- /
- 2007
In this thesis, we study the implementation of hands-free phone in a car, taking acoustic echo canceller, in order to remove acoustic echo effectively. Conventional coustic echo canceller used for only adaptive filtering has much difficulty to solve both echo and double-talk problem. To tackle this problem, we propose acoustic echo canceller consisting of adaptive filter using a modified NLMS, VAD to catch exact voice activity duration using two independent forgetting factors, double-talk detector to detect fast and precise double talk duration using cross-correlation between microphone signal and residual echo, and output controller using VAD and double-talk detector. The proposed hands-free phone taking acoustic echo canceller shows the performance that has not acoustic echo and guarantees full duplex.
PDF KSCI

Search Result 24, Processing Time 0.033 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)