• Title/Summary/Keyword: Log-Spectral Amplitude

Search Result 11, Processing Time 0.034 seconds

Music and Voice Separation Using Log-Spectral Amplitude Estimator Based on Kernel Spectrogram Models Backfitting (커널 스펙트럼 모델 backfitting 기반의 로그 스펙트럼 진폭 추정을 적용한 배경음과 보컬음 분리)

  • Lee, Jun-Yong;Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.34 no.3
    • /
    • pp.227-233
    • /
    • 2015
  • In this paper, we propose music and voice separation using kernel sptectrogram models backfitting based on log-spectral amplitude estimator. The existing method separates sources based on the estimate of a desired objects by training MSE (Mean Square Error) designed Winer filter. We introduce rather clear music and voice signals with application of log-spectral amplitude estimator, instead of adaptation of MSE which has been treated as an existing method. Experimental results reveal that the proposed method shows higher performance than the existing methods.

Speech Estimators Based on Generalized Gamma Distribution and Spectral Gain Floor Applied to an Automatic Speech Recognition (잡음에 강인한 음성인식을 위한 Generalized Gamma 분포기반과 Spectral Gain Floor를 결합한 음성향상기법)

  • Kim, Hyoung-Gook;Shin, Dong;Lee, Jin-Ho
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.8 no.3
    • /
    • pp.64-70
    • /
    • 2009
  • This paper presents a speech enhancement technique based on generalized Gamma distribution in order to obtain robust speech recognition performance. For robust speech enhancement, the noise estimation based on a spectral noise floor controled recursive averaging spectral values is applied to speech estimation under the generalized Gamma distribution and spectral gain floor. The proposed speech enhancement technique is based on spectral component, spectral amplitude, and log spectral amplitude. The performance of three different methods is measured by recognition accuracy of automatic speech recognition (ASR).

  • PDF

A single-channel speech enhancement method based on restoration of both spectral amplitudes and phases for push-to-talk communication (Push-to-talk 통신을 위한 진폭 및 위상 복원 기반의 단일 채널 음성 향상 방식)

  • Cho, Hye-Seung;Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.36 no.1
    • /
    • pp.64-69
    • /
    • 2017
  • In this paper, we propose a single-channel speech enhancement method based on restoration of both spectral amplitudes and phases for PTT (Push-To-Talk) communication. The proposed method combines the spectral amplitude and phase enhancement to provide high-quality speech unlike other single-channel speech enhancement methods which only use spectral amplitudes. We carried out side-by-side comparison experiment in various non-stationary noise environments in order to evaluate the performance of the proposed method. The experimental results show that the proposed method provides high quality speech better than other methods under different noise conditions.

Speech Processing System Using a Noise Reduction Neural Network Based on FFT Spectrums

  • Choi, Jae-Seung
    • Journal of information and communication convergence engineering
    • /
    • v.10 no.2
    • /
    • pp.162-167
    • /
    • 2012
  • This paper proposes a speech processing system based on a model of the human auditory system and a noise reduction neural network with fast Fourier transform (FFT) amplitude and phase spectrums for noise reduction under background noise environments. The proposed system reduces noise signals by using the proposed neural network based on FFT amplitude spectrums and phase spectrums, then implements auditory processing frame by frame after detecting voiced and transitional sections for each frame. The results of the proposed system are compared with the results of a conventional spectral subtraction method and minimum mean-square error log-spectral amplitude estimator at different noise levels. The effectiveness of the proposed system is experimentally confirmed based on measuring the signal-to-noise ratio (SNR). In this experiment, the maximal improvement in the output SNR values with the proposed method is approximately 11.5 dB better for car noise, and 11.0 dB better for street noise, when compared with a conventional spectral subtraction method.

A NMF-Based Speech Enhancement Method Using a Prior Time Varying Information and Gain Function (시간 변화에 따른 사전 정보와 이득 함수를 적용한 NMF 기반 음성 향상 기법)

  • Kwon, Kisoo;Jin, Yu Gwang;Bae, Soo Hyun;Kim, Nam Soo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38C no.6
    • /
    • pp.503-511
    • /
    • 2013
  • This paper presents a speech enhancement method using non-negative matrix factorization. In training phase, we can obtain each basis matrix from speech and specific noise database. After training phase, the noisy signal is separated from the speech and noise estimate using basis matrix in enhancement phase. In order to improve the performance, we model the change of encoding matrix from training phase to enhancement phase using independent Gaussian distribution models, and then use the constraint of the objective function almost same as that of the above Gaussian models. Also, we perform a smoothing operation to the encoding matrix by taking into account previous value. Last, we apply the Log-Spectral Amplitude type algorithm as gain function.

Improved Single Channel Speech Enhancement Algorithm Using Adaptive Postfiltering

  • Song, Eunwoo;Kang, Hong-Goo
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2011.07a
    • /
    • pp.122-125
    • /
    • 2011
  • In real environment, background noise exists everywhere and degrades the performance of system. To reduce this distortion, a speech enhancement algorithm can be very useful and variety methods have been proposed. In this paper, we propose a postfilter to improve the performance of optimally modified log-spectral amplitude (OM-LSA) estimator. Proposed algorithm uses the formant postfilter to minimize perceptual distortion caused by background noise. We adjust an emphasizing parameter which is varied by spectral flatness and first reflection coefficient. The performance of the proposed algorithm is evaluated by measuring the log-spectral distance (LSD) and the perceptual evaluation of speech quality (PESQ) score. The test results show the improvement of proposed algorithm compared to conventional OM-LSA.

  • PDF

CASA Based Approach to Estimate Acoustic Transfer Function Ratios (CASA 기반의 마이크간 전달함수 비 추정 알고리즘)

  • Shin, Minkyu;Ko, Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.33 no.1
    • /
    • pp.54-59
    • /
    • 2014
  • Identification of RTF (Relative Transfer Function) between sensors is essential to multichannel speech enhancement system. In this paper, we present an approach for estimating the relative transfer function of speech signal. This method adapts a CASA (Computational Auditory Scene Analysis) technique to the conventional OM-LSA (Optimally-Modified Log-Spectral Amplitude) based approach. Evaluation of the proposed approach is performed under simulated stationary and nonstationary WGN (White Gaussian Noise). Experimental results confirm advantages of the proposed approach.

Detection of formation boundaries and permeable fractures based on frequency-domain Stoneley wave logs

  • Saito Hiroyuki;Hayashi Kazuo;Iikura Yoshikazu
    • Geophysics and Geophysical Exploration
    • /
    • v.7 no.1
    • /
    • pp.45-50
    • /
    • 2004
  • This paper describes a method of detecting formation boundaries, and permeable fractures, from frequency-domain Stoneley wave logs. Field data sets were collected between the depths of 330 and 360 m in well EE-4 in the Higashi-Hachimantai geothermal field, using a monopole acoustic logging tool with a source central frequency of 15 kHz. Stoneley wave amplitude spectra were calculated by performing a fast Fourier transform on the waveforms, and the spectra were then collected into a frequency-depth distribution of Stoneley wave amplitudes. The frequency-domain Stoneley wave log shows four main characteristic peaks at frequencies 6.5, 8.8, 12, and 13.3 kHz. The magnitudes of the Stoneley wave at these four frequencies are affected by formation properties. The Stoneley wave at higher frequencies (12 and 13.3 kHz) has higher amplitudes in hard formations than in soft formations, while the wave at lower frequencies (6.5 and 8.8 kHz) has higher amplitudes in soft formations than in hard formations. The correlation of the frequency-domain Stoneley wave log with the logs of lithology, degree of welding, and P-wave velocity is excellent, with all of them showing similar discontinuities at the depths of formation boundaries. It is obvious from these facts that the frequency-domain Stoneley wave log provides useful clues for detecting formation boundaries. The frequency-domain Stoneley wave logs are also applicable to the detection of a single permeable fracture. The procedure uses the Stoneley wave spectral amplitude logs at the four frequencies, and weighting functions. The optimally weighted sum of the four Stoneley wave spectral amplitudes becomes almost constant at all depths, except at the depth of a permeable fracture. The assumptions that underlie this procedure are that the energy of the Stoneley wave is conserved in continuous media, but that attenuation of the Stoneley wave may occur at a permeable fracture. This attenuation may take place at anyone of the four characteristic Stoneley wave frequencies. We think our multispectral approach is the only reliable method for the detection of permeable fractures.

Discrimination of Local Microearthquakes and Artificial Underground Explosions on the Basis of Time-Frequency Domain (시간-주파수 영역에서의 국지 미소지진과 지하인공폭발의 구별)

  • 김소구;박용철
    • The Journal of Engineering Geology
    • /
    • v.7 no.1
    • /
    • pp.63-79
    • /
    • 1997
  • In this study, our purpose is to develop a technique to discriminate artificial explosions from local microearthquakes on the basis of time-frequency domain. To obtain spectral features of artificial explosions and microearthquakes, we used 3-d spectrograms(frequency, time and amplitude) because this is a useful tool to study the frequency content of entire seismic waveforms observed at local and regional distances (e. g., Kim et al., 1994). P and S waves from quarry blasts show that frequency content of dominant amplitude appeared above 10 Hz and Rg phases that are observed at near distance ranges. But P and S waves from microearthquakes have more broad frequency content as well as below 10 Hz. And for discrimination, Pg/Lg spectral ratio is performed below 10 Hz. In order to select time windows we computed group velocity using multiple filter method(MFM) and removed free surface effects from all 3-components data for improving on data quality. Next step, we computed Fast-Fourier transform, and a log average spectral amplitude over seven frequency bands : 0.5 to 3, 2 to 4, 3 to 5, 4 to 6, 5 to 7, 6 to 8 and 8 to 10 Hz. The best separation is observed from 6 to 8 Hz.

  • PDF

Performance Analysis of a Class of Single Channel Speech Enhancement Algorithms for Automatic Speech Recognition (자동 음성 인식기를 위한 단채널 음질 향상 알고리즘의 성능 분석)

  • Song, Myung-Suk;Lee, Chang-Heon;Lee, Seok-Pil;Kang, Hong-Goo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.2E
    • /
    • pp.86-99
    • /
    • 2010
  • This paper analyzes the performance of various single channel speech enhancement algorithms when they are applied to automatic speech recognition (ASR) systems as a preprocessor. The functional modules of speech enhancement systems are first divided into four major modules such as a gain estimator, a noise power spectrum estimator, a priori signal to noise ratio (SNR) estimator, and a speech absence probability (SAP) estimator. We investigate the relationship between speech recognition accuracy and the roles of each module. Simulation results show that the Wiener filter outperforms other gain functions such as minimum mean square error-short time spectral amplitude (MMSE-STSA) and minimum mean square error-log spectral amplitude (MMSE-LSA) estimators when a perfect noise estimator is applied. When the performance of the noise estimator degrades, however, MMSE methods including the decision directed module to estimate a priori SNR and the SAP estimation module helps to improve the performance of the enhancement algorithm for speech recognition systems.