• Title/Summary/Keyword: Speech Signal

Search Result 1,175, Processing Time 0.027 seconds

Statistical Voice Activity Detection Using Probabilistic Non-Negative Matrix Factorization (확률적 비음수 행렬 인수분해를 사용한 통계적 음성검출기법)

  • Kim, Dong Kook;Shin, Jong Won;Kwon, Kisoo;Kim, Nam Soo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.41 no.8
    • /
    • pp.851-858
    • /
    • 2016
  • This paper presents a new statistical voice activity detection (VAD) based on the probabilistic interpretation of nonnegative matrix factorization (NMF). The objective function of the NMF using Kullback-Leibler divergence coincides with the negative log likelihood function of the data if the distribution of the data given the basis and encoding matrices is modeled as Poisson distributions. Based on this probabilistic NMF, the VAD is constructed using the likelihood ratio test assuming that speech and noise follow Poisson distributions. Experimental results show that the proposed approach outperformed the conventional Gaussian model-based and NMF-based methods at 0-15 dB signal-to-noise ratio simulation conditions.

Korean Word Recognition using the Transition Matrix of VQ-Code and DHMM (VQ코드의 천이 행렬과 이산 HMM을 이용한 한국어 단어인식)

  • Chung, Kwang-Woo;Hong, Kwang-Seok;Park, Byung-Chul
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.4
    • /
    • pp.40-49
    • /
    • 1994
  • In this paper, we propose methods for improving the performance of word recognition system. The ray stratey of the first method is to apply the inertia to the feature vector sequences of speech signal to stabilize the transitions between VQ cdoes. The second method is generating the new observation probabilities using the transition matrix of VQ codes as weights at the observation probability of the output symbol, so as to take into account the time relation between neighboring frames in DHMM. By applying the inertia to the feature vector sequences, we can reduce the overlapping of probability distribution of the response paths for each word and stabilize state transitions in the HMM. By using the transition matrix of VQ codes as weights in conventional DHMM. we can divide the probability distribution of feature vectors more and more, and restrict the feature distribution to a suitable region so that the performance of recognition system can improve. To evaluate the performance of the proposed methods, we carried out experiments for 50 DDD area names. As a result, the proposed methods improved the recognition rate by $4.2\%$ in the speaker-dependent test and $12.45\%$ in the speaker-independent test, respectively, compared with the conventional DHMM.

  • PDF

An Analysis on Phone-Like Units for Korean Continuous Speech Recognition in Noisy Environments (잡음환경하의 연속 음성인식을 위한 유사음소단위 분석)

  • Shen Guang-Hu;Lim Soo-Ho;Seo Jun-Bae;Kim Joo-Gon;Jung Ho-Youl;Chung Hyun-Yeol
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.123-126
    • /
    • 2004
  • 본 논문은 잡음환경 하에서의 효율적인 문맥의존 음향 모델 구성에 대한 기초연구로서 잡음환경 하에서의 유사 음소단위 수에 따른 연속 음성인식 성능을 비교, 평가한 결과에 대한 보고이다. 기존의 연구[1,2]로부터 연속음성 인식의 경우 문맥종속모델은 변이음을 고려한 39유사음소를 이용한 경우가 48유사음소를 이용하는 것보다 더 좋은 인식성능을 나타냄을 알 수 있었다. 이 연구 결과를 바탕으로 본 연구에서는 잡음환경에서도 효율적인 문맥 의존 음향모델을 구성하기 위한 기초 연구를 수행하였다. 다양한 잡음환경을 고려하기 위해 White, Pink, LAB 잡음을 신호 대 잡음비(Signal to Noise Ratio) 5dB, 10dB, 15dB 레벨로 음성에 부가한 후 각 유사음소단위 수에 따른 연속음성인식 실험을 수행하였다. 그 결과, 39유사음소를 이용한 경우가 48유사음소를 이용한 경우보다 clear 환경인 경우에 약 $7\%$$17\%$ 향상된 단어인식률과 문장 인식률을 얻을 수 있었으며, 각 잡음환경에서도 39유사음소를 이용한 경우가 48유사음소를 이용한 경우보다 평균 적으로 $17\%$$28\%$ 향상된 단어인식률과 문장인식률을 얻을 수 있어 39유사음소 단위가 한국어 연속음성인식에 더 적합하고 잡음환경에서도 유효함을 확인할 수 있었다.

  • PDF

Drone Location Tracking with Circular Microphone Array by HMM (HMM에 의한 원형 마이크로폰 어레이 적용 드론 위치 추적)

  • Jeong, HyoungChan;Lim, WonHo;Guo, Junfeng;Ahmad, Isitiaq;Chang, KyungHi
    • Journal of Advanced Navigation Technology
    • /
    • v.24 no.5
    • /
    • pp.393-407
    • /
    • 2020
  • In order to reduce the threat by illegal unmanned aerial vehicles, a tracking system based on sound was implemented. There are three main points to the drone acoustic tracking method. First, it scans the space through variable beam formation to find a sound source and records the sound using a microphone array. Second, it classifies it into a hidden Markov model (HMM) to find out whether the sound source exists or not, and finally, the sound source is In the case of a drone, a sound source recorded and stored as a tracking reference signal based on an adaptive beam pattern is used. The simulation was performed in both the ideal condition without background noise and interference sound and the non-ideal condition with background noise and interference sound, and evaluated the tracking performance of illegal drones. The drone tracking system designed the criteria for determining the presence or absence of a drone according to the improvement of the search distance performance according to the microphone array performance and the degree of sound pattern matching, and reflected in the design of the speech reading circuit.

Design of the Noise Suppressor Using Wavelet Transform (웨이블릿 변환을 이용한 잡음제거기 설계)

  • 원호진;김종학;이인성
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.7
    • /
    • pp.37-46
    • /
    • 2001
  • This paper proposes a new noise suppression method using the Wavelet transform analysis. The noise suppressor using the Wavelet transform shows the more effective advantages in a babble noise than one using the short-time Fourier transform. We designed a new channel structure based on spectral subtraction of Wavelet transform coefficients and used the Wavelet mask pattern with more higher time resolution in high frequency. It showed a good adaptation capability for babble noise with a non-stationary property. To evaluate the performance of proposed noise canceller, the informal subjective listening tests (Mos tests) were performed in background noise environments (car noise, street noise, babble noise) of mobile communication. The proposed noise suppression algorithm showed about MOS 0.2 performance improvements than the suppression algorithm of EVRC in informal listening tests. The noise reduction by the proposed method was shown in spectrogram of speech signal.

  • PDF

Spectro-Temporal Filtering Based on Soft Decision for Stereophonic Acoustic Echo Suppression (스테레오 음향학적 에코 제거를 위한 Soft Decision 기반 필터 확장 기법)

  • Lee, Chul Min;Bae, Soo Hyun;Kim, Jeung Hun;Kim, Nam Soo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.39C no.12
    • /
    • pp.1346-1351
    • /
    • 2014
  • We propose a novel approach for stereophonic acoustic echo suppression using spectro-temporal filtering based on soft decision. Unlike the conventional approaches estimating the echo pathes directly, the proposed technique can estimate stereo echo spectra without any double-talk detector. In order to improve the estimation of echo spectra, the extended power spectrum density matrix and echo overestimation control matrix are applied on this method. In addition, this echo suppression technique is based on soft decision technique using speech absence probability in STFT domain. Experimental results show that the proposed method improves compared with the conventional approaches.

Development of a Bone Conduction Telephone for Conductive Hearing Impaired Persons and its Performance Test (전음성 청각장애인용 골도 전화기 개발 및 성능 평가)

  • Kang, Kyeong-Ok;Kang, Seong-Hoon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.2
    • /
    • pp.113-122
    • /
    • 1995
  • This paper describes characteristics of a bone conduction telephone which was developed for conductive hearing impaired persons to call without additional devices and results of its performance test. Not only the hearing impaired but also normal hearing persons can use this telephone because we developed a bone conduction vibrator with which they can perceive speech signal using functions of air conductive hearing as well as bone conductive hearing. It also has tone control function compensating hearing losses for the hearing impaired originating from their hearing characteristics, and using this function together with received volume control it has received volume range of 20dB in loudness rating, which is similar effect as what a telephone set with built-in received amplifier has. From results of articulation and intelligibility tests for 19 hearing impaired persons, we can see that if their bone-conduction hearing loss is 61dB or less, they can understand words or sentences and response well with this telephone.

  • PDF

Design of Programmable SC Filter (프로그램 가능한 SC Filter의 설계)

  • 이병수;이종악
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.11 no.3
    • /
    • pp.172-178
    • /
    • 1986
  • The recent interest in the design of filters is motivatied by the fact that such filter can be fully integrated using standard metal-oxide-semiconductor processing technology. This is due to replacing all the resistors in the active RC filter network by the switched capacitors. The voltage gain of a SC filter depends only on the rations of capacitance and these ratios can be obtained and maintained to high accuracy. Therefore, it is known that a switched capacitor is much better than a resistor in temperature and linearity characteristics. This paper proposed a programmable SC filter and proved the fact that ${omega}_0$ Q and G of this circuit can be controlled by digital signal. Experiments show that SC filter remains the low sensitivities but it can't avoid little influence of parasitic capacitance. As the transfer characteristic of the SC filter is varied with sampling frequency and resistor array, SC filtering technigue can be applied for digital processing, speech analysis and synthesis and so on.

  • PDF

Pitch Estimation Method in an Integrated Time and Frequency Domain by Applying Linear Interpolation (선형 보간법을 이용한 시간과 주파수 조합영역에서의 피치 추정 방법)

  • Kim, Ki-Chul;Park, Sung-Joo;Lee, Seok-Pil;Kim, Moo-Young
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.47 no.5
    • /
    • pp.100-108
    • /
    • 2010
  • An autocorrelation method is used in pitch estimation. Autocorrelation values in time and frequency domains, which have different characteristics, correspond to the pitch period and fundamental frequency, respectively. We utilize an integrated autocorrelation method in time and frequency domains. It can remove the errors of pitch doubling and having. In the time and frequency domains, pitch period and fundamental frequency have reciprocal relation to each other. Especially, fundamental frequency estimation ends up as an error because of the resolution of FFT. To reduce these artifacts, interpolation methods are applied in the integrated autocorrelation domain, which decreases pitch errors. Moreover, only for the pitch candidates found in a time domain, the corresponding frequency-domain autocorrelation values are calculated with reduced computational complexity. Using linear interpolation, we can decrease the required number of FFT coefficients by 8 times. Thus, compared to the conventional methods, computational complexity can be reduced by 9.5 times.

The Proposal of the Fuzzed Lyapunov Dimension at Speech Signal (음성에 대한 퍼지-리아프노프 차원의 제안)

  • In, Joon-Hawn;Yoo, Byong-Wook;Ryu, Seok-Han;Jung, Myong-Jin;Kim, Chang-Seok
    • Journal of the Korean Institute of Telematics and Electronics T
    • /
    • v.36T no.4
    • /
    • pp.30-37
    • /
    • 1999
  • This study suggested the Fuzzy Lyapunov dimension. The Fuzzy Lyapunov dimension is to evaluate the quantitative variation of the attractor. In this paper the speaker recognition is evaluated by the Fuzzy Lyapunov dimension. It has been proved that the suggested Fuzzy Lyapunov dimension is superior in the discrimination characteristics between standard reference pattern attractors, and in reference to the test pattern attractor, it has been verified that it is the speaker recognition parameter which absorbs the pattern variation. In order to evaluate the Fuzzy Lyapunov dimension as speaker recognition parameter, the mistaken recognition according to discrimination error in each of speaker and standard reference pattern was estimated, and the validity of the speaker recognition parameter was experimental. As the result of the speaker recognition experiment, 97.0[%] of recognition ratio was obtained, and it was confirmed that the Fuzzy Lyapunov dimension was fit for the speaker recognition parameter.

  • PDF