• Title/Summary/Keyword: Speech signal processing

Search Result 331, Processing Time 0.02 seconds

Binary Mask Criteria Based on Distortion Constraints Induced by a Gain Function for Speech Enhancement

  • Kim, Gibak
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.2 no.4
    • /
    • pp.197-202
    • /
    • 2013
  • Large gains in speech intelligibility can be obtained using the SNR-based binary mask approach. This approach retains the time-frequency (T-F) units of the mixture signal, where the target signal is stronger than the interference noise (masker) (e.g., SNR > 0 dB), and removes the T-F units, where the interfering noise is dominant. This paper introduces two alternative binary masks based on the distortion constraints to improve the speech intelligibility. The distortion constraints are induced by a gain function for estimating the short-time spectral amplitude. One binary mask is designed to retain the speech underestimated (T-F) units while removing the speech overestimated (T-F)units. The other binary mask is designed to retain the noise overestimated (T-F) units while removing noise underestimated (T-F) units. Listening tests with oracle binary masks were conducted to assess the potential of the two binary masks in improving the intelligibility. The results suggested that the two binary masks based on distortion constraints can provide large gains in intelligibility when applied to noise-corrupted speech.

  • PDF

Speech Spectrum Enhancement Combined with Frequency-weighted Spectrum Shaping Filter and Wiener Filter (주파수가중 스펙트럼성형필터와 위너필터를 결합한 음성 스펙트럼 강조)

  • Choi, Jae-Seung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.10
    • /
    • pp.1867-1872
    • /
    • 2016
  • In the area of digital signal processing, it is necessary to improve the quality of the speech signal after removing the background noise which exists in a various real environments. The important thing to consider when removing the background noise acoustically is that to solve the problem, depending on the information of the human auditory mechanism is mainly the amplitude spectrum of the speech signal. This paper introduces the characteristics of a frequency-weighted spectrum shaping filter for the extraction of the amplitude spectrum of the speech signal with the primary purpose. Therefore, this paper proposes an algorithm using the methods of a Wiener filter and the frequency-weighted spectrum shaping filter according to the acoustic model, after extracted the amplitude spectral information in the noisy speech signal. The spectral distortion (SD) output of the proposed algorithm is experimentally improved more than 5.28 dB compared to a conventional method.

Hands-free Speech Recognition based on Echo Canceller and MAP Estimation (에코제거기와 MAP 추정에 기초한 핸즈프리 음성 인식)

  • Sung-ill Kim;Wee-jae Shin
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.4 no.3
    • /
    • pp.15-20
    • /
    • 2003
  • For some applications such as teleconference or telecommunication systems using a distant-talking hands-free microphone, the near-end speech signals to be transmitted is disturbed by an ambient noise and by an echo which is due to the coupling between the microphone and the loudspeaker. Furthermore, the environmental noise including channel distortion or additive noise is assumed to affect the original input speech. In the present paper, a new approach using echo canceller and maximum a posteriori(MAP) estimation is introduced to improve the accuracy of hands-free speech recognition. In this approach, it was shown that the proposed system was effective for hands-free speech recognition in ambient noise environment including echo. The experimental results also showed that the combination system between echo canceller and MAP environmental adaptation technique were well adapted to echo and noise environment.

  • PDF

A Robust Non-Speech Rejection Algorithm

  • Ahn, Young-Mok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.17 no.1E
    • /
    • pp.10-13
    • /
    • 1998
  • We propose a robust non-speech rejection algorithm using the three types of pitch-related parameters. The robust non-speech rejection algorithm utilizes three kinds of pitch parameters : (1) pitch range, (2) difference of the successive pitch range, and (3) the number of successive pitches satisfying constraints related with the previous two parameters. The acceptance rate of the speech commands was 95% for -2.8dB signal-to-noise ratio (SNR) speech database that consisted of 2440 utterances. The rejection rate of the non-speech sounds was 100% while the acceptance rate of the speech commands was 97% in an office environment.

  • PDF

An Adaptive Microphone Array with Linear Phase Response (선형 위상 특성을 갖는 적응 마이크로폰 어레이)

  • Kang, Hong-Gu;Youn, Dae-Hui;Cha, Il-Hwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.11 no.3
    • /
    • pp.53-60
    • /
    • 1992
  • Many adaptive beamforming methods have been studied for interference cancellation and speech signal enhancement in telephone conference and auditorium. Main aspect of adaptive beamforming methods for speech signal processing is different from radar, sonar and seismic signal processing because desire output signal should be apt to the human ear. Considering that phase of speech is quite insensible to the human ear, Sondhi proposed a nonlinear constrained optimization technique whose constraint was on the magnitude transfer function from the source to the output. In real environment the phase response of the speech signal affects the human auditorium system. So it is desirable to design linear phase system. In this paper, linear phase beamformer is proposed and sample processing algorithm is also proposed for real time consideration Simulation results show that the proposed algorithm yields more consistent beam patterns and deep nulls to the noise direction than Sondhi's.

  • PDF

Elderly Speech Signal Processing: A Systematic Review for Analysis of Gender Innovation (노인음성신호처리: 젠더혁신 분석에 대한 체계적 문헌고찰)

  • Lee, JiYeoun
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.8
    • /
    • pp.148-154
    • /
    • 2019
  • The purpose of this study is to review systematically the literatures on the research of the elderly speech signal processing based on the domestic gender innovation and to introduce the utility and innovation of the gender analysis methods. From 2000 to present, among the 37 research papers published in the Korean Journal, 25 papers were selected according to the inclusion and exclusion criteria. And gender analysis methods were applied to gender research subject and design. Research results show diversity of research field and high gender recognition of R & D team are needed in research and development of engineering of gender innovation perspective. In addition, government-level regulation and research funding should be systematically applied to gender innovation research processes in the elderly voice signal processing and various gender innovation projects. In the future gender innovation in the elderly speech signal processing can contribute to the creation of a new market by developing a voice recognition system and service that reflects the needs of both men and women.

Real-Time Implementation of AMR Speech Codec Using TMS320VC5510 DSP (TMS320VC5510 DSP를 이용한 AMR 음성부호화기의 실시간 구현)

  • Kim, Jun;Bae, Keun-Sung
    • MALSORI
    • /
    • no.65
    • /
    • pp.143-152
    • /
    • 2008
  • This paper focuses on the real time implementation of an adaptive multi-rate (AMR) speech codec, that is a standard speech codec of IMT-2000, using the TMS320VC5510. The series of TMS320VC55x is a 16-bit fixed-point digital signal processor (DSP) having low power consumption for the use of mobile communications by Texas Instruments (TI) corporation. After we analyze the AMR algorithm and source code as well as the structure and I/O of 7MS320VC55x, we carry out optimizing the programs for real time implementation. The implemented AMR speech codec uses 55.2 kbyte for the program memory and 98.3 kbyte for the data memory, and it requires 709,878 clocks, i.e. about 3.5 ms, for processing a frame of 20 ms speech signal.

  • PDF

A Study of Peak Finding Algorithms for the Autocorrelation Function of Speech Signal

  • So, Shin-Ae;Lee, Kang-Hee;You, Kwang-Bock;Lim, Ha-Young;Park, Ji Su
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.12
    • /
    • pp.131-137
    • /
    • 2016
  • In this paper, the peak finding algorithms corresponding to the Autocorrelation Function (ACF), which are widely exploited for detecting the pitch of voiced signal, are proposed. According to various researchers, it is well known fact that the estimation of fundamental frequency (F0) in speech signal is not only very important task but quite difficult mission. The proposed algorithms, presented in this paper, are implemented by using many characteristics - such as monotonic increasing function - of ACF function. Thus, the proposed algorithms may be able to estimate both reliable and correct the fundamental frequency as long as the autocorrelation function of speech signal is accurate. Since the proposed algorithms may reduce the computational complexity it can be applied to the real-time processing. The speech data, is composed of Korean emotion expressed words, is used for evaluation of their performance. The pitches are measured to compare the performance of proposed algorithms.

On a Pitch Extraction of Speech Signal using Residual Signal of the Uniform Quantizer (균일양자화기의 잔여신호를 이용한 음성신호의 피치검출)

  • Bae, Myung-Jin;Han, Ki-Cheon;Cha, Jin-Jong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.2
    • /
    • pp.36-40
    • /
    • 1997
  • In speech signal processing, it is necessary and important to detect exactly the pitch. The algorithms of pitch extraction which have been proposed until now are difficult exactly pitches over wide range speech signals. In this paper, thus, we proposed a new pitch detection algorithm that finds the fundamental period of speech signal in the residual signal quantized by the uniform quantizer as PCM. The proposed method shows little gross error of average 0.25% for clean speech and average 3.39% for SNR of 0dB. It also achieves results of the pitch contours, improving the accuracy of pitch detection in transient phonemes and noise environments.

  • PDF

DNN based Robust Speech Feature Extraction and Signal Noise Removal Method Using Improved Average Prediction LMS Filter for Speech Recognition (음성 인식을 위한 개선된 평균 예측 LMS 필터를 이용한 DNN 기반의 강인한 음성 특징 추출 및 신호 잡음 제거 기법)

  • Oh, SangYeob
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.6
    • /
    • pp.1-6
    • /
    • 2021
  • In the field of speech recognition, as the DNN is applied, the use of speech recognition is increasing, but the amount of calculation for parallel training needs to be larger than that of the conventional GMM, and if the amount of data is small, overfitting occurs. To solve this problem, we propose an efficient method for robust voice feature extraction and voice signal noise removal even when the amount of data is small. Speech feature extraction efficiently extracts speech energy by applying the difference in frame energy for speech and the zero-crossing ratio and level-crossing ratio that are affected by the speech signal. In addition, in order to remove noise, the noise of the speech signal is removed by removing the noise of the speech signal with an average predictive improved LMS filter with little loss of speech information while maintaining the intrinsic characteristics of speech in detection of the speech signal. The improved LMS filter uses a method of processing noise on the input speech signal by adjusting the active parameter threshold for the input signal. As a result of comparing the method proposed in this paper with the conventional frame energy method, it was confirmed that the error rate at the start point of speech is 7% and the error rate at the end point is improved by 11%.