• Title/Summary/Keyword: Speech Source Separation

Search Result 34, Processing Time 0.02 seconds

Independent Component Analysis Based on Frequency Domain Approach Model for Speech Source Signal Extraction (음원신호 추출을 위한 주파수영역 응용모델에 기초한 독립성분분석)

  • Choi, Jae-Seung
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.15 no.5
    • /
    • pp.807-812
    • /
    • 2020
  • This paper proposes a blind speech source separation algorithm using a microphone to separate only the target speech source signal in an environment in which various speech source signals are mixed. The proposed algorithm is a model of frequency domain representation based on independent component analysis method. Accordingly, for the purpose of verifying the validity of independent component analysis in the frequency domain for two speech sources, the proposed algorithm is executed by changing the type of speech sources to perform speech sources separation to verify the improvement effect. It was clarified from the experimental results by the waveform of this experiment that the two-channel speech source signals can be clearly separated compared to the original waveform. In addition, in this experiments, the proposed algorithm improves the speech source separation performance compared to the existing algorithms, from the experimental results using the target signal to interference energy ratio.

Post-Processing of IVA-Based 2-Channel Blind Source Separation for Solving the Frequency Bin Permutation Problem (IVA 기반의 2채널 암묵적신호분리에서 주파수빈 뒤섞임 문제 해결을 위한 후처리 과정)

  • Chu, Zhihao;Bae, Keunsung
    • Phonetics and Speech Sciences
    • /
    • v.5 no.4
    • /
    • pp.211-216
    • /
    • 2013
  • The IVA(Independent Vector Analysis) is a well-known FD-ICA method used to solve the frequency permutation problem. It generally works quite well for blind source separation problems, but still needs some improvements in the frequency bin permutation problem. This paper proposes a post-processing method which can improve the source separation performance with the IVA by fixing the remaining frequency permutation problem. The proposed method makes use of the correlation coefficient of power ratio between frequency bins for separated signals with the IVA-based 2-channel source separation. Experimental results verified that the proposed method could fix the remaining frequency permutation problem in the IVA and improve the speech quality of the separated signals.

Speech Enhancement Using Blind Signal Separation Combined With Null Beamforming

  • Nam Seung-Hyon;Jr. Rodrigo C. Munoz
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.4E
    • /
    • pp.142-147
    • /
    • 2006
  • Blind signal separation is known as a powerful tool for enhancing noisy speech in many real world environments. In this paper, it is demonstrated that the performance of blind signal separation can be further improved by combining with a null beamformer (NBF). Cascading the blind source separation with null beamforming is equivalent to the decomposition of the received signals into the direct parts and reverberant parts. Investigation of beam patterns of the null beamformer and blind signal separation reveals that directional null of NBF reduces mainly direct parts of the unwanted signals whereas blind signal separation reduces reverberant parts. Further, it is shown that the decomposition of received signals can be exploited to solve the local stability problem. Therefore, faster and improved separation can be obtained by removing the direct parts first by null beamforming. Simulation results using real office recordings confirm the expectation.

Multi-channel Speech Enhancement Using Blind Source Separation and Cross-channel Wiener Filtering

  • Jang, Gil-Jin;Choi, Chang-Kyu;Lee, Yong-Beom;Kim, Jeong-Su;Kim, Sang-Ryong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.2E
    • /
    • pp.56-67
    • /
    • 2004
  • Despite abundant research outcomes of blind source separation (BSS) in many types of simulated environments, their performances are still not satisfactory to be applied to the real environments. The major obstacle may seem the finite filter length of the assumed mixing model and the nonlinear sensor noises. This paper presents a two-step speech enhancement method with multiple microphone inputs. The first step performs a frequency-domain BSS algorithm to produce multiple outputs without any prior knowledge of the mixed source signals. The second step further removes the remaining cross-channel interference by a spectral cancellation approach using a probabilistic source absence/presence detection technique. The desired primary source is detected every frame of the signal, and the secondary source is estimated in the power spectral domain using the other BSS output as a reference interfering source. Then the estimated secondary source is subtracted to reduce the cross-channel interference. Our experimental results show good separation enhancement performances on the real recordings of speech and music signals compared to the conventional BSS methods.

Iterative Computation of Periodic and Aperiodic Part from Speech Signal (음성 신호로부터 주기, 비주기 성분의 반복적 계산법에 의한 분리 실험)

  • Jo Cheol-Woo;Lee Tao
    • MALSORI
    • /
    • no.48
    • /
    • pp.117-126
    • /
    • 2003
  • source of speech signal is actually composed of combination of periodic and aperiodic components, although it is often modeled to either one of those. In the paper an experiment which can separate periodic and aperiodic components from speech source. Linear predictive residual signal was used as a approximated vocal source the original speech to obtain the estimated aperiodic part. Iterative extrapolation method was used to compute the aperiodic part.

  • PDF

A study on sound source segregation of frequency domain binaural model with reflection (반사음이 존재하는 양귀 모델의 음원분리에 관한 연구)

  • Lee, Chai-Bong
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.15 no.3
    • /
    • pp.91-96
    • /
    • 2014
  • For Sound source direction and separation method, Frequency Domain Binaural Model(FDBM) shows low computational cost and high performance for sound source separation. This method performs sound source orientation and separation by obtaining the Interaural Phase Difference(IPD) and Interaural Level Difference(ILD) in frequency domain. But the problem of reflection occurs in practical environment. To reduce this reflection, a method to simulate the sound localization of a direct sound, to detect the initial arriving sound, to check the direction of the sound, and to separate the sound is presented. Simulation results show that the direction is estimated to lie close within 10% from the sound source and, in the presence of the reflection, the level of the separation of the sound source is improved by higher Coherence and PESQ(Perceptual Evaluation of Speech Quality) and by lower directional damping than those of the existing FDBM. In case of no reflection, the degree of separation was low.

Target Speaker Speech Restoration via Spectral bases Learning (주파수 특성 기저벡터 학습을 통한 특정화자 음성 복원)

  • Park, Sun-Ho;Yoo, Ji-Ho;Choi, Seung-Jin
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.3
    • /
    • pp.179-186
    • /
    • 2009
  • This paper proposes a target speech extraction which restores speech signal of a target speaker form noisy convolutive mixture of speech and an interference source. We assume that the target speaker is known and his/her utterances are available in the training time. Incorporating the additional information extracted from the training utterances into the separation, we combine convolutive blind source separation(CBSS) and non-negative decomposition techniques, e.g., probabilistic latent variable model. The nonnegative decomposition is used to learn a set of bases from the spectrogram of the training utterances, where the bases represent the spectral information corresponding to the target speaker. Based on the learned spectral bases, our method provides two postprocessing steps for CBSS. Channel selection step finds a desirable output channel from CBSS, which dominantly contains the target speech. Reconstruct step recovers the original spectrogram of the target speech from the selected output channel so that the remained interference source and background noise are suppressed. Experimental results show that our method substantially improves the separation results of CBSS and, as a result, successfully recovers the target speech.

A Frequency-Domain Normalized MBD Algorithm with Unidirectional Filters for Blind Speech Separation

  • Kim Hye-Jin;Nam Seung-Hyon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.2E
    • /
    • pp.54-60
    • /
    • 2005
  • A new multichannel blind deconvolution algorithm is proposed for speech mixtures. It employs unidirectional filters and normalization of gradient terms in the frequency domain. The proposed algorithm is shown to be approximately nonholonomic. Thus it provides improved convergence and separation performances without whitening effect for nonstationary sources such as speech and audio signals. Simulations using real world recordings confirm superior performances over existing algorithms and its usefulness for real applications.

Convolutive source separation in noisy environments (잡음 환경하에서의 음성 분리)

  • Jang Inseon;Choi Seungjin
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.97-100
    • /
    • 2003
  • This paper addresses a method of convolutive source separation that based on SEONS (Second Order Nonstationary Source Separation) [1] that was originally developed for blind separation of instantaneous mixtures using nonstationarity. In order to tackle this problem, we transform the convolutive BSS problem into multiple short-term instantaneous problems in the frequency domain and separated the instantaneous mixtures in every frequency bin. Moreover, we also employ a H infinity filtering technique in order to reduce the sensor noise effect. Numerical experiments are provided to demonstrate the effectiveness of the proposed approach and compare its performances with existing methods.

  • PDF

Remote speech recognition preprocessing system for intelligent robot in noisy environment (지능로봇에 적합한 잡음 환경에서의 원거리 음성인식 전처리 시스템)

  • Gwon, Se-Do;Jeong, Hong
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.365-366
    • /
    • 2006
  • This paper describes a pre-processing methodology which can apply to remote speech recognition system of service robot in noisy environment. By combining beamforming and blind source separation, we can overcome the weakness of beamforming (reverberation) and blind source separation (distributed noise, permutation ambiguity). As this method is designed to be implemented with hardware, we can achieve real-time execution with FPGA by using systolic array architecture.

  • PDF