Search | Korea Science

Speech Enhancement Based on Psychoacoustic Model

Lee, Jingeol;Kim, Soowon
- The Journal of the Acoustical Society of Korea
- /
- v.19 no.3E
- /
- pp.12-18
- /
- 2000
Psychoacoustic model based methods have recently been introduced in order to enhance speech signals corrupted by ambient noise. In particular, the perceptual filter is analytically derived where the frequency content of the input noisy signal is made the same as that of the estimated clean signal in auditory domain. However, the analytical derivation should rely on the deconvolution associated with the spreading function in the psychoacoustic model, which results in an ill-conditioned problem. In order to cope with the problem associated with the deconvolution, we propose a novel psychoacoustic model based speech enhancement filter whose principle is the same as the perceptual filter, however the filter is derived by a constrained optimization which provides solutions to the ill-conditioned problem. It is demonstrated with artificially generated signals that the proposed filter operates according to the principle. It is shown that superior performance results from the proposed filter over the perceptual filter provided that a clean speech signal is separable from noise.
PDF

Dimension Reduction Method of Speech Feature Vector for Real-Time Adaptation of Voice Activity Detection (음성구간 검출기의 실시간 적응화를 위한 음성 특징벡터의 차원 축소 방법)

Park Jin-Young;Lee Kwang-Seok;Hur Kang-In
- Journal of the Institute of Convergence Signal Processing
- /
- v.7 no.3
- /
- pp.116-121
- /
- 2006
In this paper, we propose the dimension reduction method of multi-dimension speech feature vector for real-time adaptation procedure in various noisy environments. This method which reduces dimensions non-linearly to map the likelihood of speech feature vector and noise feature vector. The LRT(Likelihood Ratio Test) is used for classifying speech and non-speech. The results of implementation are similar to multi-dimensional speech feature vector. The results of speech recognition implementation of detected speech data are also similar to multi-dimensional(10-order dimensional MFCC(Mel-Frequency Cepstral Coefficient)) speech feature vector.
PDF

Emotion Recognition using Robust Speech Recognition System (강인한 음성 인식 시스템을 사용한 감정 인식)

Kim, Weon-Goo
- Journal of the Korean Institute of Intelligent Systems
- /
- v.18 no.5
- /
- pp.586-591
- /
- 2008
This paper studied the emotion recognition system combined with robust speech recognition system in order to improve the performance of emotion recognition system. For this purpose, the effect of emotional variation on the speech recognition system and robust feature parameters of speech recognition system were studied using speech database containing various emotions. Final emotion recognition is processed using the input utterance and its emotional model according to the result of speech recognition. In the experiment, robust speech recognition system is HMM based speaker independent word recognizer using RASTA mel-cepstral coefficient and its derivatives and cepstral mean subtraction(CMS) as a signal bias removal. Experimental results showed that emotion recognizer combined with speech recognition system showed better performance than emotion recognizer alone.
https://doi.org/10.5391/JKIIS.2008.18.5.586 인용 PDF KSCI

Robust Speech Parameters for the Emotional Speech Recognition (감정 음성 인식을 위한 강인한 음성 파라메터)

Lee, Guehyun;Kim, Weon-Goo
- Journal of the Korean Institute of Intelligent Systems
- /
- v.22 no.6
- /
- pp.681-686
- /
- 2012
This paper studied the speech parameters less affected by the human emotion for the development of the robust emotional speech recognition system. For this purpose, the effect of emotion on the speech recognition system and robust speech parameters of speech recognition system were studied using speech database containing various emotions. In this study, mel-cepstral coefficient, delta-cepstral coefficient, RASTA mel-cepstral coefficient, root-cepstral coefficient, PLP coefficient and frequency warped mel-cepstral coefficient in the vocal tract length normalization method were used as feature parameters. And CMS (Cepstral Mean Subtraction) and SBR(Signal Bias Removal) method were used as a signal bias removal technique. Experimental results showed that the HMM based speaker independent word recognizer using frequency warped RASTA mel-cepstral coefficient in the vocal tract length normalized method, its derivatives and CMS as a signal bias removal showed the best performance.
https://doi.org/10.5391/JKIIS.2012.22.6.681 인용 PDF KSCI

Performance Improvement of Adaptive Noise Cancellation Using a Speech Detector

Park, Jang-Sik
- The Journal of the Acoustical Society of Korea
- /
- v.15 no.2E
- /
- pp.39-44
- /
- 1996
The performance of two-channel adaptive noise canceller is ofter degraded by the weights perturbation due to the speech signal. In this paper, an adaptive noise canceller employing a speech detector and two adaptation algorithms which are switched according to the speech detector is proposed. When highly correlated speech signal is detected, the tap weights of the adaptive filter are adapted by the sign algorithm. On the other hand, the weights are adapted by the NLMS algorithm when silence is detected or when the characteristics of the noise propagation channel is changed. The employed speech detector utilizes the power ratio of the input and the output of an adaptive linear prediction-error filter. According to the computer simulation, the proposed method yields better performance than conventional ones.
PDF

Speech Denoising via Low-Rank and Sparse Matrix Decomposition

Huang, Jianjun;Zhang, Xiongwei;Zhang, Yafei;Zou, Xia;Zeng, Li
- ETRI Journal
- /
- v.36 no.1
- /
- pp.167-170
- /
- 2014
In this letter, we propose an unsupervised framework for speech noise reduction based on the recent development of low-rank and sparse matrix decomposition. The proposed framework directly separates the speech signal from noisy speech by decomposing the noisy speech spectrogram into three submatrices: the noise structure matrix, the clean speech structure matrix, and the residual noise matrix. Evaluations on the Noisex-92 dataset show that the proposed method achieves a signal-to-distortion ratio approximately 2.48 dB and 3.23 dB higher than that of the robust principal component analysis method and the non-negative matrix factorization method, respectively, when the input SNR is -5 dB.
https://doi.org/10.4218/etrij.14.0213.0033 인용 PDF KSCI

Matlab Implementation of Real-time Speech Analysis Tool (실시간 음성분석도구의 MatLab 구현)

Bak Il-suh;Kim Dae-hyun;Jo Cheol-woo
- MALSORI
- /
- no.44
- /
- pp.93-104
- /
- 2002
There are many speech analysis tools available. Among them real-time analysis tool is very useful for interactive experiments. A real-time speech analysis tool was implemented using Matlab. Matlab is a very widely used general purpose signal processing tool. In general, its computational speed is relatively lower than that of the codes from conventional programming languages. Especially, real-time analysis including input of signal and output of the result was not possible in the past. However, due to the improvement of computing power of PCs and inclusion of real-time I/O toolboxes in Matlab, real-time analysis is now possible in some extent by Matlab only. In this experiment, we tried to implement a real-time speech analysis tool using Matlab. Pitch and spectral information is computed in real-time. From the result it is shown that such real-time applications can be implemented easily using Matlab.
PDF

Abrupt Noise Cancellation and Speech Restoration for Speech Enhancement (음질 개선을 위한 돌발잡음 제거와 음성복원)

Son BeakKwon;Hahn Minsoo
- Proceedings of the KSPS conference
- /
- 2003.10a
- /
- pp.101-104
- /
- 2003
In this paper, speech quality is improved by removing abrupt noise intervals and then substituting the gaps with estimates of the previous speech waveform. An abrupt noise detection signal has been proposed as a prediction error signal by utilizing LP coefficients of the previous frame. Abrupt noise intervals are estimated by using spectral energy. After removing estimated noise intervals, we applied several waveform substitution techniques such as zero substitution, previous frame repetition, pattern matching, and pitch waveform replication. To prove the validity of our algorithm, the LPC spectral distortion test and the recognition test are executed and, the results show that the speech quality is fairly well improved.
PDF

Lipreading과 음성인식에 의한 향상된 화자 인증 시스템

지승남;이종수
- 제어로봇시스템학회:학술대회논문집
- /
- 2000.10a
- /
- pp.274-274
- /
- 2000
In the future, the convenient speech command system will become an widely-using interface in automation systems. But the previous research in speech recognition didn't give satisfactory recognition results for the practical realization in the noise environment. The purpose of this research is the development of a practical system, which reliably recognizes the speech command of the registered users, by complementing an existing research which used the image information with the speech signal. For the lip-reading feature extraction from a image, we used the DWT(Discrete Wavelet Transform), which reduces the size and gives useful characteristics of the original image. And to enhance the robustness to the environmental changes of speakers, we acquired the speech signal by stereo method. We designed an economic stand-alone system, which adopted a Bt829 and an AD1819B with a TMS320C31 DSP based add-on board.
PDF

Secret Data Communication Method using Quantization of Wavelet Coefficients during Speech Communication (음성통신 중 웨이브렛 계수 양자화를 이용한 비밀정보 통신 방법)

Lee, Jong-Kwan
- Proceedings of the Korean Information Science Society Conference
- /
- 2006.10d
- /
- pp.302-305
- /
- 2006
In this paper, we have proposed a novel method using quantization of wavelet coefficients for secret data communication. First, speech signal is partitioned into small time frames and the frames are transformed into frequency domain using a WT(Wavelet Transform). We quantize the wavelet coefficients and embedded secret data into the quantized wavelet coefficients. The destination regard quantization errors of received speech as seceret dat. As most speech watermark techniques have a trade off between noise robustness and speech quality, our method also have. However we solve the problem with a partial quantization and a noise level dependent threshold. In additional, we improve the speech quality with de-noising method using wavelet transform. Since the signal is processed in the wavelet domain, we can easily adapt the de-noising method based on wavelet transform. Simulation results in the various noisy environments show that the proposed method is reliable for secret communication.
PDF

Search Result 1,172, Processing Time 0.03 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)