• Title/Summary/Keyword: 음성 검출기

Search Result 137, Processing Time 0.026 seconds

A Gain Control Algorithm of Low Computational Complexity based on Voice Activity Detection (음성 검출 기반의 저연산 이득 제어 알고리즘)

  • Kim, Sang-Kuyn;Cho, Woo-Hyeong;Jeong, Min-A;Kwon, Jang-Woo;Lee, Sangmin
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.40 no.5
    • /
    • pp.924-930
    • /
    • 2015
  • In this paper, we propose a novel approach of low computational complexity to improve the speech quality of the small acoustic equipment in noisy environment. The conventional gain control algorithm suppresses the noise of input signal, and then the part of wide dynamic range compression (WDRC) amplifies the undesired signal. The proposed algorithm controls the gain of hearing aids according to speech present probability by using the output of a voice activity detection (VAD). The performance of the proposed scheme is evaluated under various noise conditions by using objective measurement and yields superior results compared with the conventional algorithm.

A Study of Keyword Spotting System Based on the Weight of Non-Keyword Model (비핵심어 모델의 가중치 기반 핵심어 검출 성능 향상에 관한 연구)

  • Kim, Hack-Jin;Kim, Soon-Hyub
    • The KIPS Transactions:PartB
    • /
    • v.10B no.4
    • /
    • pp.381-388
    • /
    • 2003
  • This paper presents a method of giving weights to garbage class clustering and Filler model to improve performance of keyword spotting system and a time-saving method of dialogue speech processing system for keyword spotting by calculating keyword transition probability through speech analysis of task domain users. The point of the method is grouping phonemes with phonetic similarities, which is effective in sensing similar phoneme groups rather than individual phonemes, and the paper aims to suggest five groups of phonemes obtained from the analysis of speech sentences in use in Korean morphology and in stock-trading speech processing system. Besides, task-subject Filler model weights are added to the phoneme groups, and keyword transition probability included in consecutive speech sentences is calculated and applied to the system in order to save time for system processing. To evaluate performance of the suggested system, corpus of 4,970 sentences was built to be used in task domains and a test was conducted with subjects of five people in their twenties and thirties. As a result, FOM with the weights on proposed five phoneme groups accounts for 85%, which has better performance than seven phoneme groups of Yapanel [1] with 88.5% and a little bit poorer performance than LVCSR with 89.8%. Even in calculation time, FOM reaches 0.70 seconds than 0.72 of seven phoneme groups. Lastly, it is also confirmed in a time-saving test that time is saved by 0.04 to 0.07 seconds when keyword transition probability is applied.

A Study on the Rejection Capability Based on Anti-phone Modeling (반음소 모델링을 이용한 거절기능에 대한 연구)

  • 김우성;구명완
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.3
    • /
    • pp.3-9
    • /
    • 1999
  • This paper presents the study on the rejection capability based on anti-phone modeling for vocabulary independent speech recognition system. The rejection system detects and rejects out-of-vocabulary words which were not included in candidate words which are defined while the speech recognizer is made. The rejection system can be classified into two categories by their implementation methods, keyword spotting method and utterance verification method. The keyword spotting method uses an extra filler model as a candidate word as well as keyword models. The utterance verification method uses the anti-models for each phoneme for the calculation of confidence score after it has constructed the anti-models for all phonemes. We implemented an utterance verification algorithm which can be used for vocabulary independent speech recognizer. We also compared three kinds of means for the calculation of confidence score, and found out that the geometric mean had shown the best result. For the normalization of confidence score, usually Sigmoid function is used. On using it, we compared the effect of the weight constant for Sigmoid function and determined the optimal value. And we compared the effects of the size of cohort set, the results showed that the larger set gave the better results. And finally we found out optimal confidence score threshold value. In case of using the threshold value, the overall recognition rate including rejection errors was about 76%. This results are going to be adapted for stock information system based on speech recognizer which is currently provided as an experimental service by Korea Telecom.

  • PDF

Target Speech Detection Using Gaussian Mixture Model of Frequency Bandwise Power Ratio for GSC-Based Beamforming (GSC 기반 빔포밍을 위한 주파수 밴드별 전력비 분포의 혼합 가우시안 모델을 이용한 목표 음성신호의 검출)

  • Chang, Hyungwook;Kim, Youngil;Jeong, Sangbae
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.1
    • /
    • pp.61-68
    • /
    • 2015
  • Noise reduction is necessary to compensate for the degradation of recognition performance by various types of noises. Among many noise reduction techniques using microphone array, generalized sidelobe canceller (GSC) has been widely applied to reduce nonstationary noises. The performance of GSC is directly affected by its adaptation mode controller (AMC). That is, accurate target speech detection is essential to guarantee the sufficient noise reduction in pure noise intervals and the less distortion in target speech intervals. Thus, this paper proposes an improved AMC design technique in which the power ratio of the output of fixed beamforming to that of blocking matrix is calculated frequency bandwise and probabilistically modeled by mixture Gaussians for each class. Experimental results show that the proposed algorithm outperforms conventional AMCs in receiver operating curves (ROC) and output SNRs.

Robust Speech Recognition Using Missing Data Theory (손실 데이터 이론을 이용한 강인한 음성 인식)

  • 김락용;조훈영;오영환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.3
    • /
    • pp.56-62
    • /
    • 2001
  • In this paper, we adopt a missing data theory to speech recognition. It can be used in order to maintain high performance of speech recognizer when the missing data occurs. In general, hidden Markov model (HMM) is used as a stochastic classifier for speech recognition task. Acoustic events are represented by continuous probability density function in continuous density HMM(CDHMM). The missing data theory has an advantage that can be easily applicable to this CDHMM. A marginalization method is used for processing missing data because it has small complexity and is easy to apply to automatic speech recognition (ASR). Also, a spectral subtraction is used for detecting missing data. If the difference between the energy of speech and that of background noise is below given threshold value, we determine that missing has occurred. We propose a new method that examines the reliability of detected missing data using voicing probability. The voicing probability is used to find voiced frames. It is used to process the missing data in voiced region that has more redundant information than consonants. The experimental results showed that our method improves performance than baseline system that uses spectral subtraction method only. In 452 words isolated word recognition experiment, the proposed method using the voicing probability reduced the average word error rate by 12% in a typical noise situation.

  • PDF

Study of Speech Recognition System Using the Java (자바를 이용한 음성인식 시스템에 관한 연구)

  • Choi, Kwang-Kook;Kim, Cheol;Choi, Seung-Ho;Kim, Jin-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.6
    • /
    • pp.41-46
    • /
    • 2000
  • In this paper, we implement the speech recognition system based on the continuous distribution HMM and Browser-embedded model using the Java. That is developed for the speech analysis, processing and recognition on the Web. Client sends server through the socket to the speech informations that extracting of end-point detection, MFCC, energy and delta coefficients using the Java Applet. The sewer consists of the HMM recognizer and trained DB which recognizes the speech and display the recognized text back to the client. Because of speech recognition system using the java is high error rate, the platform is independent of system on the network. But the meaning of implemented system is merged into multi-media parts and shows new information and communication service possibility in the future.

  • PDF

Auditory Representations for Robust Speech Recognition in Noisy Environments (잡음 환경에서의 음성 인식을 위한 청각 표현)

  • Kim, Doh-Suk;Lee, Soo-Young;Kil, Rhee-M.
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.5
    • /
    • pp.90-98
    • /
    • 1996
  • An auditory model is proposed for robust speech recognition in noisy environments. The model consists of cochlear bandpass filters and nonlinear stages, and represents frequency and intensity information efficiently even in noisy environments. Frequency information of the signal is obtained by zero-crossing intervals, and intensity information is also incorporated by peak detectors and saturating nonlinearities. Also, the robustness of the zero-crossings in estimating frequency is verified by the developed analytic relationship of the variance of the level-crossing interval perturbations as a function of the crossing level values. The proposed auditory model is computationally efficient and free from many unknown parameters compared with other auditory models. Speaker-independent speech recognition experiments demonstrate the robustness of the proposed method.

  • PDF

Virtual Dialog System Based on Multimedia Signal Processing for Smart Home Environments (멀티미디어 신호처리에 기초한 스마트홈 가상대화 시스템)

  • Kim, Sung-Ill;Oh, Se-Jin
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.2
    • /
    • pp.173-178
    • /
    • 2005
  • This paper focuses on the use of the virtual dialog system whose aim is to build more convenient living environments. In order to realize this, the main emphasis of the paper lies on the description of the multimedia signal processing on the basis of the technologies such as speech recognition, speech synthesis, video, or sensor signal processing. For essential modules of the dialog system, we incorporated the real-time speech recognizer based on HM-Net(Hidden Markov Network) as well as speech synthesis into the overall system. In addition, we adopted the real-time motion detector based on the changes of brightness in pixels, as well as the touch sensor that was used to start system. In experimental evaluation, the results showed that the proposed system was relatively easy to use for controlling electric appliances while sitting in a sofa, even though the performance of the system was not better than the simulation results owing to the noisy environments.

Voice Activity Detection based on Adaptive Band-Partitioning using the Likelihood Ratio (우도비를 이용한 적응 밴드 분할 기반의 음성 검출기)

  • Kim, Sang-Kyun;Shim, Hyeon-Min;Lee, Sangmin
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.9
    • /
    • pp.1064-1069
    • /
    • 2014
  • In this paper, we propose a novel approach to improve the performance of a voice activity detection(VAD) which is based on the adaptive band-partitioning with the likelihood ratio(LR). The previous method based on the adaptive band-partitioning use the weights that are derived from the variance of the spectral. In our VAD algorithm, the weights are derived from LR, and then the weights are incorporated with the entropy. The proposed algorithm discriminates the voice activity by comparing the weighted entropy with the adaptive threshold. Experimental results show that the proposed algorithm yields better results compared to the conventional VAD algorithms. Especially, the proposed algorithm shows superior improvement in non-stationary noise environments.

On an Improving Performance of Low Bit-Rate Speech Coder (저전송율 보코더의 성능개선에 관한 연구)

  • 박영호;홍성훈;배명진
    • The Journal of the Acoustical Society of Korea
    • /
    • v.17 no.7
    • /
    • pp.101-107
    • /
    • 1998
  • 본 논문에서는 잔차신호를 모델링하기 위해 사용되는 동적희박대수코드북에 대해 분석하고 성능이 향상된 새로운 대수코드북 구조 및 검색과정을 제안하였다. 제안된 알고리 즘은 대수 코드북의 단점을 계산량의 증가 없이 개선시켰다. 먼저 기존에 단순히 부호비트 만을 검색하는 것에 대해 다양한 펄스 진폭의 선택을 가능하게 하였다. 그리고 동일 트랙상 에서 두 펄스를 선택하게 하였으며 추가 계산량이 필요없는 무성음에서 유성음으로의 천이 구간 검출기를 이용하여 LSF 보간 시 발생하는 천이구간에서의 LP지연을 최소화하였다. 제 안된 알고리즘을 이용한 5.6kbps음성부호화기는 전화선상의 음질을 시료로 하여 주관적 음 질면에서 6.3kbps MP-MLQ와 동등하였으며 MNRU Q=15dB에서는 MP-MLQ에 비해 약간 의 음질열하가 발생하였다.

  • PDF