Search | Korea Science

Noisy Speech Recognition Based on Noise-Adapted HMMs Using Speech Feature Compensation

Chung, Yong-Joo
- Journal of the Institute of Convergence Signal Processing
- /
- v.15 no.2
- /
- pp.37-41
- /
- 2014
The vector Taylor series (VTS) based method usually employs clean speech Hidden Markov Models (HMMs) when compensating speech feature vectors or adapting the parameters of trained HMMs. It is well-known that noisy speech HMMs trained by the Multi-condition TRaining (MTR) and the Multi-Model-based Speech Recognition framework (MMSR) method perform better than the clean speech HMM in noisy speech recognition. In this paper, we propose a method to use the noise-adapted HMMs in the VTS-based speech feature compensation method. We derived a novel mathematical relation between the train and the test noisy speech feature vector in the log-spectrum domain and the VTS is used to estimate the statistics of the test noisy speech. An iterative EM algorithm is used to estimate train noisy speech from the test noisy speech along with noise parameters. The proposed method was applied to the noise-adapted HMMs trained by the MTR and MMSR and could reduce the relative word error rate significantly in the noisy speech recognition experiments on the Aurora 2 database.
PDF KSCI

A Study on Phoneme Recognition using Neural Networks and Fuzzy logic (신경망과 퍼지논리를 이용한 음소인식에 관한 연구)

Han, Jung-Hyun;Choi, Doo-Il
- Proceedings of the KIEE Conference
- /
- 1998.07g
- /
- pp.2265-2267
- /
- 1998
This paper deals with study of Fast Speaker Adaptation Type Speech Recognition, and to analyze speech signal efficiently in time domain and time-frequency domain, utilizes SCONN[1] with Speech Signal Process suffices for Fast Speaker Adaptation Type Speech Recognition, and examined Speech Recognition to investigate adaptation of system, which has speech data input after speaker dependent recognition test.
PDF

Voiced, Unvoiced, and Silence Classification of human speech signals by enphasis characteristics of spectrum (Spectrum 강조특성을 이용한 음성신호에서 Voicd - Unvoiced - Silence 분류)

배명수;안수길
- The Journal of the Acoustical Society of Korea
- /
- v.4 no.1
- /
- pp.9-15
- /
- 1985
In this paper, we describe a new algorithm for deciding whether a given segment of a speech signal is classified as voiced speech, unvoiced speech, or silence, based on parameters made on the signal. The measured parameters for the voiced-unvoiced classfication are the areas of each Zero crossing interval, which is given by multiplication of the magnitude by the inverse zero corssing rate of speech signals. The employed parameter for the unvoiced-silence classification, also, are each of positive area summation during four milisecond interval for the high frequency emphasized speech signals.
PDF

Voiced/Unvoiced/Silence Classification웨 of Speech Signal Using Wavelet Transform (웨이브렛 변환을 이용한 음성신호의 유성음/무성음/묵음 분류)

Son, Young-Ho;Bae, Keun-Sung
- Speech Sciences
- /
- v.4 no.2
- /
- pp.41-54
- /
- 1998
Speech signals are, depending on the characteristics of waveform, classified as voiced sound, unvoiced sound, and silence. Voiced sound, produced by an air flow generated by the vibration of the vocal cords, is quasi-periodic, while unvoiced sound, produced by a turbulent air flow passed through some constriction in the vocal tract, is noise-like. Silence represents the ambient noise signal during the absence of speech. The need for deciding whether a given segment of a speech waveform should be classified as voiced, unvoiced, or silence has arisen in many speech analysis systems. In this paper, a voiced/unvoiced/silence classification algorithm using spectral change in the wavelet transformed signal is proposed and then, experimental results are demonstrated with our discussions.
PDF

A Study on the Eavesdropping of the Glass Window Vibration in a Conference Room (회의실내 유리창 진동의 도청에 대한 연구)

Kim, Seock-Hyun;Kim, Yoon-Ho;Heo, Wook
- Journal of Industrial Technology
- /
- v.27 no.A
- /
- pp.55-60
- /
- 2007
Possibility of the eavesdropping is investigated on a conference room-glass window coupled system. Speech intelligibility analysis is performed on the eavesdropping sound of the glass window. Using MLS(Maximum Length Sequency) signal as a sound source, acceleration and velocity responses of the glass window are measured by accelerometer and laser doppler vibrometer. MTF(Modulation Transfer Function) is used to identify the speech transmission characteristics of the room and window system. STI(Speech Transmission Index) is calculated by using MTF and speech intelligibility of the vibration sound is estimated. Speech intelligibilities by the acceleration signal and the velocity signal are compared.
PDF

Microphone Array Based Speech Enhancement Using Independent Vector Analysis (마이크로폰 배열에서 독립벡터분석 기법을 이용한 잡음음성의 음질 개선)

Wang, Xingyang;Quan, Xingri;Bae, Keunsung
- Phonetics and Speech Sciences
- /
- v.4 no.4
- /
- pp.87-92
- /
- 2012
Speech enhancement aims to improve speech quality by removing background noise from noisy speech. Independent vector analysis is a type of frequency-domain independent component analysis method that is known to be free from the frequency bin permutation problem in the process of blind source separation from multi-channel inputs. This paper proposed a new method of microphone array based speech enhancement that combines independent vector analysis and beamforming techniques. Independent vector analysis is used to separate speech and noise components from multi-channel noisy speech, and delay-sum beamforming is used to determine the enhanced speech among the separated signals. To verify the effectiveness of the proposed method, experiments for computer simulated multi-channel noisy speech with various signal-to-noise ratios were carried out, and both PESQ and output signal-to-noise ratio were obtained as objective speech quality measures. Experimental results have shown that the proposed method is superior to the conventional microphone array based noise removal approach like GSC beamforming in the speech enhancement.
https://doi.org/10.13064/KSSS.2012.4.4.087 인용 PDF

Speech Enhancement Using the Adaptive Noise Canceling Technique with a Recursive Time Delay Estimator (재귀적 지연추정기를 갖는 적응잡음제거 기법을 이용한 음성개선)

강해동;배근성
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.31B no.7
- /
- pp.33-41
- /
- 1994
A single channel adaptive noise canceling (ANC) technique with a recursive time delay estimator (RTDE) is presented for removing effects of additive noise on the speech signal. While the conventional method makes a reference signal for the adaptive filter using the pitch estimated on a frame basis from the input speech, the proposed method makes the reference signal using the delay estimated recursively on a sample-by-sample basis. As the RTDEs, the recursion formulae of autocorrelation function (ACF) and average magnitude difference function (AMDF) are derived. The normalized least mean square (NLMS) and recursive least square (RLS) algorithms are applied for adaptation of filter coefficients. Experimental results with noisy speech demonstrate that the proposed method improves the perceived speech quality as well as the signal-to-noise ratio and cepstral distance when compared with the conventional method.
PDF

A Speech Coder using the Simplified Multi-mode Method (단순화된 다중 모드 방법을 이용한 음성 부호화기)

강홍구
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1995.06a
- /
- pp.146-149
- /
- 1995
This paper proposes a SM-CELP speech coder which applies different excitation signal according to the characteristic of speech segment at bit-rate below 4 kbps. Speech signal is divided with 2 modes such as stationary voice and etc. using the parameters of average energy of the short-time speech and the residual signal after long term prediction. Structured multi-pulse method is used for the excitation of mode-A and gaussian or pulse-like codebook for mode-B. 4.8kbps DoD-CELP are used to evaluate the performance of the proposed coder. As a result, the propose method shows 1~2 dB higher segmental signal to noise ratio and better subjectional quality without increasing the computational amount.
PDF

A Study on the Pitch Alteration Technique by Subband Scaling in Speech Signal (서브밴드 스케일링에 의한 음성신호의 피치변경법에 관한 연구)

Kim, Young-Kyu;Bae, Myung-Jin
- Speech Sciences
- /
- v.10 no.4
- /
- pp.137-147
- /
- 2003
Speech synthesis can classify by synthesis way, that is waveform coding, source coding and mixture coding. Specially, waveform coding is suitable for high quality synthesis. However, it is not desirable by synthesis techniques of syllable or phoneme unit because it do not separate and handles excitation and formant part. Therefore, there is a need for pitch alteration method applied in synthesis by the rule in waveform coding. This study propose about pitch alteration method that use spectrum scaling after do to flatten spectra by subband linear approximation to minimize spectrum distortion. This paper show evaluation whether show excellency of some measure compared with LPC, Cepstrum, lifter function and method that propose. estimation method seeks distribution of each flattened signal and measured degree of flattened spectra Signal flattened is normalized, So that highest point amounts to zero, and distribution of signal ,whose average is zero, is calculated. this show result that measure the spectrum distortion rate to estimate performance of method that propose. The average spectrum distortion rate was kept below the average 2.12%, so the method that propose is superiors than existent method.
PDF

Analysis of Eigenvalues of Covariance Matrices of Speech Signals in Frequency Domain for Various Bands (음성 신호의 주파수 영역에서의 주파수 대역별 공분산 행렬의 고유값 분석)

Kim, Seonil
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2016.05a
- /
- pp.293-296
- /
- 2016
Speech Signals consist of signals of consonants and vowels, but the lasting time of vowels is much longer than that of consonants. It can be assumed that the correlations between signal blocks in speech signal is very high. But the correlations between signal blocks in various frequency bands can be quite different. Each speech signal is divided into blocks which have 128 speech data. FFT is applied to each block. Various frequency areas of the results of FFT are taken and Covariance matrix between blocks in a speech signal is extracted and finally eigenvalues of those matrix are obtained. It is studied that in the eigenvalues of various frequency bands which band can be used to get more reliable result.
PDF

Search Result 1,172, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)