Search | Korea Science

A Speech Enhancement Algorithm based on Human Psychoacoustic Property (심리음향 특성을 이용한 음성 향상 알고리즘)

Jeon, Yu-Yong;Lee, Sang-Min
- The Transactions of The Korean Institute of Electrical Engineers
- /
- v.59 no.6
- /
- pp.1120-1125
- /
- 2010
In the speech system, for example hearing aid as well as speech communication, speech quality is degraded by environmental noise. In this study, to enhance the speech quality which is degraded by environmental speech, we proposed an algorithm to reduce the noise and reinforce the speech. The minima controlled recursive averaging (MCRA) algorithm is used to estimate the noise spectrum and spectral weighting factor is used to reduce the noise. And partial masking effect which is one of the human hearing properties is introduced to reinforce the speech. Then we compared the waveform, spectrogram, Perceptual Evaluation of Speech Quality (PESQ) and segmental Signal to Noise Ratio (segSNR) between original speech, noisy speech, noise reduced speech and enhanced speech by proposed method. As a result, enhanced speech by proposed method is reinforced in high frequency which is degraded by noise, and PESQ, segSNR is enhanced. It means that the speech quality is enhanced.
https://doi.org/10.5370/KIEE.2010.59.6.1120 인용 PDF KSCI

Target signal detection using MUSIC spectrum in noise environments (MUSIC 스펙트럼을 이용한 잡음환경에서의 목표 신호 구간 검출)

Park, Sang-Jun;Jeong, Sang-Bae
- Phonetics and Speech Sciences
- /
- v.4 no.3
- /
- pp.103-110
- /
- 2012
In this paper, a target signal detection method using multiple signal classification (MUSIC) algorithm is proposed. The MUSIC algorithm is a subspace-based direction of arrival (DOA) estimation method. Using the inverse of the eigenvalue-weighted eigen spectra, the algorithm detects the DOAs of multiple sources. To apply the algorithm in target signal detection for GSC-based beamforming, we utilize its spectral response for the DOA of the target source in noisy conditions. The performance of the proposed target signal detection method is compared with those of the normalized cross-correlation (NCC), the fixed beamforming, and the power ratio method. Experimental results show that the proposed algorithm significantly outperforms the conventional ones in receiver operating characteristics (ROC) curves.
https://doi.org/10.13064/KSSS.2012.4.3.103 인용 PDF

Speech Recognition Using MSVQ/TDRNN (MSVQ/TDRNN을 이용한 음성인식)

Kim, Sung-Suk
- The Journal of the Acoustical Society of Korea
- /
- v.33 no.4
- /
- pp.268-272
- /
- 2014
This paper presents a method for speech recognition using multi-section vector-quantization (MSVQ) and time-delay recurrent neural network (TDTNN). The MSVQ generates the codebook with normalized uniform sections of voice signal, and the TDRNN performs the speech recognition using the MSVQ codebook. The TDRNN is a time-delay recurrent neural network classifier with two different representations of dynamic context: the time-delayed input nodes represent local dynamic context, while the recursive nodes are able to represent long-term dynamic context of voice signal. The cepstral PLP coefficients were used as speech features. In the speech recognition experiments, the MSVQ/TDRNN speech recognizer shows 97.9 % word recognition rate for speaker independent recognition.
https://doi.org/10.7776/ASK.2014.33.4.268 인용 PDF KSCI

Emotion recognition from speech using Gammatone auditory filterbank

Le, Ba-Vui;Lee, Young-Koo;Lee, Sung-Young
- Proceedings of the Korean Information Science Society Conference
- /
- 2011.06a
- /
- pp.255-258
- /
- 2011
An application of Gammatone auditory filterbank for emotion recognition from speech is described in this paper. Gammatone filterbank is a bank of Gammatone filters which are used as a preprocessing stage before applying feature extraction methods to get the most relevant features for emotion recognition from speech. In the feature extraction step, the energy value of output signal of each filter is computed and combined with other of all filters to produce a feature vector for the learning step. A feature vector is estimated in a short time period of input speech signal to take the advantage of dependence on time domain. Finally, in the learning step, Hidden Markov Model (HMM) is used to create a model for each emotion class and recognize a particular input emotional speech. In the experiment, feature extraction based on Gammatone filterbank (GTF) shows the better outcomes in comparison with features based on Mel-Frequency Cepstral Coefficient (MFCC) which is a well-known feature extraction for speech recognition as well as emotion recognition from speech.

A Study on Approximation-Synthesis of Transition Segment in Speech Signal (음성신호에서 천이구간의 근사합성에 관한 연구)

Lee See-Woo
- The Journal of the Korea Contents Association
- /
- v.5 no.3
- /
- pp.167-173
- /
- 2005
In a speech coding system using excitation source of voiced and unvoiced, it would be involved a distortion of speech quality in case coexist with a voiced and unvoiced consonants in a frame. So, I propose TSIUVC(Transition Segment Including Unvoiced Consonant) extraction method by using pitch pulses and Zero Crossing Rate in order to unexistent with a voiced and unvoiced consonants in a frame. And this paper present a TSIUVC approximate-synthesis method by using frequency band division. As a result, this method obtains a high quality approximation-synthesis waveform within TSIUVC by using frequency information of 0.547kHz below and 2.813kHz above. And the TSIUVC extraction rate was $91\%$ for female voice and $96.2\%$ for male voice respectively This method has the capability of being applied to a new speech coding of Voiced/Silence/TSIUVC, speech analysis, and speech synthesis.
PDF

Pseudo-Cepstral Representation of Speech Signal and Its Application to Speech Recognition (음성 신호의 의사 켑스트럼 표현 및 음성 인식에의 응용)

Kim, Hong-Kook;Lee, Hwang-Soo
- The Journal of the Acoustical Society of Korea
- /
- v.13 no.1E
- /
- pp.71-81
- /
- 1994
In this paper, we propose a pseudo-cepstral representation of line spectrum pair(LSP) frequencies and evaluate speech recognition performance with cepstral lift using the pseudo-cepstrum. The pseudo-cepstrum corresponding to LSP frequencies is derived by approxmating the relationship between LPC-cepstrum and LSP frequencies. Three cepstral liftering procedures are applied to the pseudo-cepstrum to improve the performance of speech recognition. They are the root-power-sums ligter, the general exponential lifter, and the bandpass lifter. Then, the liftered psedudo-cepstra are warped into a mel-frequency scale to obtain feature vectors for speech recognition. Among the three lifters, the general exponential lifter results in the best performance on speech recognition. When we use the proposed pseudo-cepstra feature vectors for recognizing noisy speech, the signal-to-noise ratio (SNR) improvement of about 5~10dB LSP is obtained.
PDF

Hands-free Speech Recognition based on Echo Canceller and MAP Estimation (에코제거기와 MAP 추정에 기초한 핸즈프리 음성 인식)

Sung-ill Kim;Wee-jae Shin
- Journal of the Institute of Convergence Signal Processing
- /
- v.4 no.3
- /
- pp.15-20
- /
- 2003
For some applications such as teleconference or telecommunication systems using a distant-talking hands-free microphone, the near-end speech signals to be transmitted is disturbed by an ambient noise and by an echo which is due to the coupling between the microphone and the loudspeaker. Furthermore, the environmental noise including channel distortion or additive noise is assumed to affect the original input speech. In the present paper, a new approach using echo canceller and maximum a posteriori(MAP) estimation is introduced to improve the accuracy of hands-free speech recognition. In this approach, it was shown that the proposed system was effective for hands-free speech recognition in ambient noise environment including echo. The experimental results also showed that the combination system between echo canceller and MAP environmental adaptation technique were well adapted to echo and noise environment.
PDF

A Selection Method of Reliable Codevectors using Noise Estimation Algorithm (잡음 추정 알고리즘을 이용한 신뢰성 있는 코드벡터 조합의 선정 방법)

Jung, Seungmo;Kim, Moo Young
- Journal of the Institute of Electronics and Information Engineers
- /
- v.52 no.7
- /
- pp.119-124
- /
- 2015
Speech enhancement has been required as a preprocessor for a noise robust speech recognition system. Codebook-based Speech Enhancement (CBSE) is highly robust in nonstationary noise environments compared with conventional noise estimation algorithms. However, its performance is severely degraded for the codevector combinations that have lower correlation with the input signal since CBSE depends on the trained codebook information. To overcome this problem, only the reliable codevector combinations are selected to be used to remove the codevector combinations that have lower correlation with input signal. The proposed method produces the improved performance compared to the conventional CBSE in terms of Log-Spectral Distortion (LSD) and Perceptual Evaluation of Speech Quality (PESQ).
https://doi.org/10.5573/ieie.2015.52.7.119 인용 PDF KSCI

Robust speech quality enhancement method against background noise and packet loss at voice-over-IP receiver (배경잡음 및 패킷손실에 강인한 voice-over-IP 수신단 기반 음질향상 기법)

Kim, Gee Yeun;Kim, Hyoung-Gook
- The Journal of the Acoustical Society of Korea
- /
- v.37 no.6
- /
- pp.512-517
- /
- 2018
Improving voice quality is a major concern in telecommunications. In this paper, we propose a robust speech quality enhancement against background noise and packet loss at VoIP (Voice-over-IP) receiver. The proposed method combines network jitter estimation based on hybrid Markov chain, adaptive playout scheduling using the estimated jitter, and speech enhancement based on restoration of amplitude and phase to enhance the quality of the speech signal arriving at the VoIP receiver over IP network. The experimental results show that the proposed method removes the background noise added to the speech signal before encoding at the sender side and provides the enhanced speech quality in an unstable network environment.
https://doi.org/10.7776/ASK.2018.37.6.512 인용 PDF KSCI HTML

A Study on the Impact of Speech Data Quality on Speech Recognition Models

Yeong-Jin Kim;Hyun-Jong Cha;Ah Reum Kang
- Journal of the Korea Society of Computer and Information
- /
- v.29 no.1
- /
- pp.41-49
- /
- 2024
Speech recognition technology is continuously advancing and widely used in various fields. In this study, we aimed to investigate the impact of speech data quality on speech recognition models by dividing the dataset into the entire dataset and the top 70% based on Signal-to-Noise Ratio (SNR). Utilizing Seamless M4T and Google Cloud Speech-to-Text, we examined the text transformation results for each model and evaluated them using the Levenshtein Distance. Experimental results revealed that Seamless M4T scored 13.6 in models using data with high SNR, which is lower than the score of 16.6 for the entire dataset. However, Google Cloud Speech-to-Text scored 8.3 on the entire dataset, indicating lower performance than data with high SNR. This suggests that using data with high SNR during the training of a new speech recognition model can have an impact, and Levenshtein Distance can serve as a metric for evaluating speech recognition models.
https://doi.org/10.9708/jksci.2024.29.01.041 인용 PDF HTML

Search Result 1,174, Processing Time 0.032 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)