• Title/Summary/Keyword: Sound source separation

Search Result 28, Processing Time 0.018 seconds

A Method of Sound Segmentation in Time-Frequency Domain Using Peaks and Valleys in Spectrogram for Speech Separation (음성 분리를 위한 스펙트로그램의 마루와 골을 이용한 시간-주파수 공간에서 소리 분할 기법)

  • Lim, Sung-Kil;Lee, Hyon-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.27 no.8
    • /
    • pp.418-426
    • /
    • 2008
  • In this paper, we propose an algorithm for the frequency channel segmentation using peaks and valleys in spectrogram. The frequency channel segments means that local groups of channels in frequency domain that could be arisen from the same sound source. The proposed algorithm is based on the smoothed spectrum of the input sound. Peaks and valleys in the smoothed spectrum are used to determine centers and boundaries of segments, respectively. To evaluate a suitableness of the proposed segmentation algorithm before that the grouping stage is applied, we compare the synthesized results using ideal mask with that of proposed algorithm. Simulations are performed with mixed speech signals with narrow band noises, wide band noises and other speech signals.

Stereo Sound Demixing Method in Time-Frequency Domain (시간-주파수 영역에서의 스테레오 사운드 분리기법)

  • Lee, Jae-Eun;Kim, Young-Moon;Lim, Chan;Kang, Hyun-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.7 no.8
    • /
    • pp.1-12
    • /
    • 2007
  • This paper presents a new demixing method that separates each source from a stereo sound mixture. Under the W-Disjoint Orthogonal assumption in DUET(Degenerate Unmixing Estimation Technique) algorithm. The proposed method is mainly processed in time-frequency domain by using windowed-fourier transform. In this paper there are two main contributions: a weighted mask by panning index distances and a binary mask by comparing each channel value. The former has tender demixing characteristic, and the latter has stronger demixing characteristic. In experimental results, we will show that both masks produce more robust demixing than the existing demixing methods do.

CNN based Sound Event Detection Method using NMF Preprocessing in Background Noise Environment

  • Jang, Bumsuk;Lee, Sang-Hyun
    • International journal of advanced smart convergence
    • /
    • v.9 no.2
    • /
    • pp.20-27
    • /
    • 2020
  • Sound event detection in real-world environments suffers from the interference of non-stationary and time-varying noise. This paper presents an adaptive noise reduction method for sound event detection based on non-negative matrix factorization (NMF). In this paper, we proposed a deep learning model that integrates Convolution Neural Network (CNN) with Non-Negative Matrix Factorization (NMF). To improve the separation quality of the NMF, it includes noise update technique that learns and adapts the characteristics of the current noise in real time. The noise update technique analyzes the sparsity and activity of the noise bias at the present time and decides the update training based on the noise candidate group obtained every frame in the previous noise reduction stage. Noise bias ranks selected as candidates for update training are updated in real time with discrimination NMF training. This NMF was applied to CNN and Hidden Markov Model(HMM) to achieve improvement for performance of sound event detection. Since CNN has a more obvious performance improvement effect, it can be widely used in sound source based CNN algorithm.

Blind Rhythmic Source Separation (블라인드 방식의 리듬 음원 분리)

  • Kim, Min-Je;Yoo, Ji-Ho;Kang, Kyeong-Ok;Choi, Seung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.8
    • /
    • pp.697-705
    • /
    • 2009
  • An unsupervised (blind) method is proposed aiming at extracting rhythmic sources from commercial polyphonic music whose number of channels is limited to one. Commercial music signals are not usually provided with more than two channels while they often contain multiple instruments including singing voice. Therefore, instead of using conventional modeling of mixing environments or statistical characteristics, we should introduce other source-specific characteristics for separating or extracting sources in the under determined environments. In this paper, we concentrate on extracting rhythmic sources from the mixture with the other harmonic sources. An extension of nonnegative matrix factorization (NMF), which is called nonnegative matrix partial co-factorization (NMPCF), is used to analyze multiple relationships between spectral and temporal properties in the given input matrices. Moreover, temporal repeatability of the rhythmic sound sources is implicated as a common rhythmic property among segments of an input mixture signal. The proposed method shows acceptable, but not superior separation quality to referred prior knowledge-based drum source separation systems, but it has better applicability due to its blind manner in separation, for example, when there is no prior information or the target rhythmic source is irregular.

Vehicle-induced aerodynamic loads on highway sound barriers part 2: numerical and theoretical investigation

  • Wang, Dalei;Wang, Benjin;Chen, Airong
    • Wind and Structures
    • /
    • v.17 no.5
    • /
    • pp.479-494
    • /
    • 2013
  • The vehicle-induced aerodynamic loads bring vibrations to some of the highway sound barriers, for they are designed in consideration of natural wind loads only. As references to the previous field experiment, the vehicle-induced aerodynamic loads is investigated by numerical and theoretical methodologies. The numerical results are compared to the experimental one and proved to be available. By analyzing the flow field achieved in the numerical simulation, the potential flow is proved to be the main source of both head and wake impact, so the theoretical model is also validated. The results from the two methodologies show that the shorter vehicle length would produce larger negative pressure peak as the head impact and wake impact overlapping with each other, and together with the fast speed, it would lead to a wake without vortex shedding, which makes the potential hypothesis more accurate. It also proves the expectation in vehicle-induced aerodynamic loads on Highway Sound Barriers Part1: Field Experiment, that max/min pressure is proportional to the square of vehicle speed and inverse square of separation distance.

A Perception Based Active Matrix Decoder with Virtual Source Location Information (가상 음원 위치 정보를 이용한 능동 메트릭스 디코더)

  • Moon, Han-Gil
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.47 no.5
    • /
    • pp.18-24
    • /
    • 2010
  • In this paper, a new matrix decoding system using vector based Virtual Source Location Information (VSLI) is proposed as an alternative to the conventional Dolby Pro logic II/IIx system for reconstructing multi-channel output signals from matrix encoded two channel signals, Lt/Rt. This new matrix decoding system is composed of passive decoding part and active part. The passive part makes crude multi-channel signals using linear combination of the two encoded signals(Lt/Rt) and the active part enhances each channel regarding to the virtual source which is emergent in each inter channel. Since the virtual sources are related to the perceptual sound images in virtual sound field, the reconstructed multi-channel sound results in good dynamic perception and stable image localization. Moreover, the good channel separation is maintained with nonlinear trigonometric enhancing function.

Overlapped Subband-Based Independent Vector Analysis

  • Jang, Gil-Jin;Lee, Te-Won
    • The Journal of the Acoustical Society of Korea
    • /
    • v.27 no.1E
    • /
    • pp.30-34
    • /
    • 2008
  • An improvement to the existing blind signal separation (BSS) method has been made in this paper. The proposed method models the inherent signal dependency observed in acoustic object to separate the real-world convolutive sound mixtures. The frequency domain approach requires solving the well known permutation problem, and the problem had been successfully solved by a vector representation of the sources whose multidimensional joint densities have a certain amount of dependency expressed by non-spherical distributions. Especially for speech signals, we observe strong dependencies across neighboring frequency bins and the decrease of those dependencies as the bins become far apart. The non-spherical joint density model proposed in this paper reflects this property of real-world speech signals. Experimental results show the improved performances over the spherical joint density representations.

Implementation of Environmental Noise Remover for Speech Signals (배경 잡음을 제거하는 음성 신호 잡음 제거기의 구현)

  • Kim, Seon-Il;Yang, Seong-Ryong
    • 전자공학회논문지 IE
    • /
    • v.49 no.2
    • /
    • pp.24-29
    • /
    • 2012
  • The sounds of exhaust emissions of automobiles are independent sound sources which are nothing to do with voices. We have no information for the sources of voices and exhaust sounds. Accordingly, Independent Component Analysis which is one of the Blind Source Separaton methods was used to segregate two source signals from each mixed signals. Maximum Likelyhood Estimation was applied to the signals came through the stereo microphone to segregate the two source signals toward the maximization of independence. Since there is no clue to find whether it is speech signal or not, the coefficients of the slope was calculated by the autocovariances of the signals in frequcency domain. Noise remover for speech signals was implemented by coupling the two algorithms.

Acoustical Similarity for Small Cooling Fans Revisited (소형 송풍기 소음의 음향학적 상사성에 관한 연구)

  • 김용철;진성훈;이승배
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 1995.04a
    • /
    • pp.196-201
    • /
    • 1995
  • The broadband and discrete sources of sound in small cooling fans of propeller type and centrifugal type were investigated to understand the turbulent vortex structures from many bladed fans using ANSI test plenum for small air-moving devices (AMDs). The noise measurement method uses the plenum as a test apparatus to determine the acoustic source spectral density function at each operating conditions similar to real engineering applications based on acoustic similarity laws. The characteristics of fans including the head rise vs. volumetric flow rate performance were measured using a performance test facility. The sound power spectrum is decomposed into two non-dimensional functions: an acoustic source spectral distribution function F(St,.phi.) and an acoustic system response function G(He,.phi.) where St, He, and .phi. are the Strouhal number, the Helmholtz number, and the volumetric flow rate coefficient, respectively. The autospectra of radiated noise measurements for the fan operating at several volumetric flow rates,.phi., are analyzed using acoustical similarity. The rotating stall in the small propeller fan with a bell-mouth guided is mainly due to a leading edge separation. It creates a blockage in the passage and the reduction in the flow rate. The sound power levels with respect to the rotational speeds were measured to reveal the mechanisms of stall and/or surge for different loading conditions and geometries, for example, fans installed with a impinging plate. Lee and Meecham (1993) studied the effect of the large-scale motions like impinging normally on a flat plate using Large-Eddy Simulation(LES) and Lighthill's analogy.[ASME Winter Annual Meeting 1993, 93-WA/NCA-22]. The dipole and quadrupole sources in the fans tested are shown closely related to the vortex structures involved using cross-correlations of the hot-wire and microphone signals.

  • PDF

The Design of Object-based 3D Audio Broadcasting System (객체기반 3차원 오디오 방송 시스템 설계)

  • 강경옥;장대영;서정일;정대권
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.7
    • /
    • pp.592-602
    • /
    • 2003
  • This paper aims to describe the basic structure of novel object-based 3D audio broadcasting system To overcome current uni-directional audio broadcasting services, the object-based 3D audio broadcasting system is designed for providing the ability to interact with important audio objects as well as realistic 3D effects based on the MPEG-4 standard. The system is composed of 6 sub-modules. The audio input module collects the background sound object, which is recored by 3D microphone, and audio objects, which are recorded by monaural microphone or extracted through source separation method. The sound scene authoring module edits the 3D information of audio objects such as acoustical characteristics, location, directivity and etc. It also defines the final sound scene with a 3D background sound, which is intended to be delievered to a receiving terminal by producer. The encoder module encodes scene descriptors and audio objects for effective transmission. The decoder module extracts scene descriptors and audio objects from decoding received bistreams. The sound scene composition module reconstructs the 3D sound scene with scene descriptors and audio objects. The 3D sound renderer module maximizes the 3D sound effects through adapting the final sound to the listner's acoustical environments. It also receives the user's controls on audio objects and sends them to the scene composition module for changing the sound scene.