• Title/Summary/Keyword: audio separation

Search Result 35, Processing Time 0.023 seconds

Audio Source Separation Based on Residual Reprojection

  • Cho, Choongsang;Kim, Je Woo;Lee, Sangkeun
    • ETRI Journal
    • /
    • v.37 no.4
    • /
    • pp.780-786
    • /
    • 2015
  • This paper describes an audio source separation that is based on nonnegative matrix factorization (NMF) and expectation maximization (EM). For stable and highperformance separation, an effective auxiliary source separation that extracts source residuals and reprojects them onto proper sources is proposed by taking into account an ambiguous region among sources and a source's refinement. Specifically, an additional NMF (model) is designed for the ambiguous region - whose elements are not easily represented by any existing or predefined NMFs of the sources. The residual signal can be extracted by inserting the aforementioned model into the NMF-EM-based audio separation. Then, it is refined by the weighted parameters of the separation and reprojected onto the separated sources. Experimental results demonstrate that the proposed scheme (outlined above) is more stable and outperforms existing algorithms by, on average, 4.4 dB in terms of the source distortion ratio.

Background music monitoring framework and dataset for TV broadcast audio

  • Hyemi Kim;Junghyun Kim;Jihyun Park;Seongwoo Kim;Chanjin Park;Wonyoung Yoo
    • ETRI Journal
    • /
    • v.46 no.4
    • /
    • pp.697-707
    • /
    • 2024
  • Music identification is widely regarded as a solved problem for music searching in quiet environments, but its performance tends to degrade in TV broadcast audio owing to the presence of dialogue or sound effects. In addition, constructing an accurate dataset for measuring the performance of background music monitoring in TV broadcast audio is challenging. We propose a framework for monitoring background music by automatic identification and introduce a background music cue sheet. The framework comprises three main components: music identification, music-speech separation, and music detection. In addition, we introduce the Cue-K-Drama dataset, which includes reference songs, audio tracks from 60 episodes of five Korean TV drama series, and corresponding cue sheets that provide the start and end timestamps of background music. Experimental results on the constructed and existing datasets demonstrate that the proposed framework, which incorporates music identification with music-speech separation and music detection, effectively enhances TV broadcast audio monitoring.

Non-uniform Linear Microphone Array Based Source Separation for Conversion from Channel-based to Object-based Audio Content (채널 기반에서 객체 기반의 오디오 콘텐츠로의 변환을 위한 비균등 선형 마이크로폰 어레이 기반의 음원분리 방법)

  • Chun, Chan Jun;Kim, Hong Kook
    • Journal of Broadcast Engineering
    • /
    • v.21 no.2
    • /
    • pp.169-179
    • /
    • 2016
  • Recently, MPEG-H has been standardizing for a multimedia coder in UHDTV (Ultra-High-Definition TV). Thus, the demand for not only channel-based audio contents but also object-based audio contents is more increasing, which results in developing a new technique of converting channel-based audio contents to object-based ones. In this paper, a non-uniform linear microphone array based source separation method is proposed for realizing such conversion. The proposed method first analyzes the arrival time differences of input audio sources to each of the microphones, and the spectral magnitudes of each sound source are estimated at the horizontal directions based on the analyzed time differences. In order to demonstrate the effectiveness of the proposed method, objective performance measures of the proposed method are compared with those of conventional methods such as an MVDR (Minimum Variance Distortionless Response) beamformer and an ICA (Independent Component Analysis) method. As a result, it is shown that the proposed separation method has better separation performance than the conventional separation methods.

Audio Source Separation Method based on Beamspace-domain Multichannel Non-negative Matrix Factorization, Part II: A Study on the Beamspace Transform Algorithms (빔공간-영역 다채널 비음수 행렬 분해 알고리즘을 이용한 음원 분리 기법 Part II: 빔공간-변환 기법에 대한 고찰)

  • Lee, Seok-Jin;Park, Sang-Ha;Sung, Koeng-Mo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.31 no.5
    • /
    • pp.332-339
    • /
    • 2012
  • Beamspace transform algorithm transforms spatial-domain data - such as x, y, z dimension - into incidence-angle-domain data, which is called beamspace-domain data. The beamspace transform method is generally used in source localization and tracking, and adaptive beamforming problem. When the beamspace transform method is used in multichannel audio source separation, the inverse beamspace transform is also important because the source image have to be reconstructed. This paper studies the beamspace transform and inverse transform algorithms for multichannel audio source separation system, especially for the beamspace-domain multichannel NMF algorithm.

Audio Source Separation Method Based on Beamspace-domain Multichannel Non-negative Matrix Factorization, Part I: Beamspace-domain Multichannel Non-negative Matrix Factorization system (빔공간-영역 다채널 비음수 행렬 분해 알고리즘을 이용한 음원 분리 기법 Part I: 빔공간-영역 다채널 비음수 행렬 분해 시스템)

  • Lee, Seok-Jin;Park, Sang-Ha;Sung, Koeng-Mo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.31 no.5
    • /
    • pp.317-331
    • /
    • 2012
  • In this paper, we develop a multichannel blind source separation algorithm based on a beamspace transform and the multichannel non-negative matrix factorization (NMF) method. The NMF algorithm is a famous algorithm which is used to solve the source separation problems. In this paper, we consider a beamspace-time-frequency domain data model for multichannel NMF method, and enhance the conventional method using a beamspace transform. Our decomposition algorithm is applied to audio source separation, using a dataset from the international Signal Separation Evaluation Campaign 2010 (SiSEC 2010) for evaluation.

The Noise Influence of 4G Mobile Transmitter on Audio Devices (4G 휴대 단말기 송신에 의한 오디오 잡음 영향)

  • Yun, Hye-Ju;Lee, Il-Kyoo
    • Journal of Satellite, Information and Communications
    • /
    • v.8 no.1
    • /
    • pp.31-34
    • /
    • 2013
  • This paper deals with the interfering audio noise caused by LTE(Long Term Evolution) UE(User Equipment) which is 4th generation mobile communications on audio devices. At first, we realized that the interfering signal of the LTE UE is determined by the transmit power of the LTE UE through analysis and measurement. Then, we performed to measure audio noise level according to the variation of transmitting power level and separation distance between the LTE UE and an audio device. As a result, it is required that minimum separation distance should be 25 cm and above in order to protect audio device from the interference noise of the LTE UE with the maximum transmit power level of 22 dBm.

Audio signal clustering and separation using a stacked autoencoder (복층 자기부호화기를 이용한 음향 신호 군집화 및 분리)

  • Jang, Gil-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.35 no.4
    • /
    • pp.303-309
    • /
    • 2016
  • This paper proposes a novel approach to the problem of audio signal clustering using a stacked autoencoder. The proposed stacked autoencoder learns an efficient representation for the input signal, enables clustering constituent signals with similar characteristics, and therefore the original sources can be separated based on the clustering results. STFT (Short-Time Fourier Transform) is performed to extract time-frequency spectrum, and rectangular windows at all the possible locations are used as input values to the autoencoder. The outputs at the middle, encoding layer, are used to cluster the rectangular windows and the original sources are separated by the Wiener filters derived from the clustering results. Source separation experiments were carried out in comparison to the conventional NMF (Non-negative Matrix Factorization), and the estimated sources by the proposed method well represent the characteristics of the orignal sources as shown in the time-frequency representation.

Audio Watermarking Using Independent Component Analysis

  • Seok, Jong-Won
    • Journal of information and communication convergence engineering
    • /
    • v.10 no.2
    • /
    • pp.175-180
    • /
    • 2012
  • This paper presents a blind watermark detection scheme for an additive watermark embedding model. The proposed estimation-correlation-based watermark detector first estimates the embedded watermark by exploiting non-Gaussian of the real-world audio signal and the mutual independence between the host-signal and the embedded watermark and then a correlation-based detector is used to determine the presence or the absence of the watermark. For watermark estimation, blind source separation (BSS) based on independent component analysis (ICA) is used. Low watermark-to-signal ratio (WSR) is one of the limitations of blind detection with the additive embedding model. The proposed detector uses two-stage processing to improve the WSR at the blind detector; the first stage removes the audio spectrum from the watermarked audio signal using linear predictive (LP) filtering and the second stage uses the resulting residue from the LP filtering stage to estimate the embedded watermark using BSS based on ICA. Simulation results show that the proposed detector performs significantly better than existing estimation-correlationbased detection schemes.

A Frequency-Domain Normalized MBD Algorithm with Unidirectional Filters for Blind Speech Separation

  • Kim Hye-Jin;Nam Seung-Hyon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.2E
    • /
    • pp.54-60
    • /
    • 2005
  • A new multichannel blind deconvolution algorithm is proposed for speech mixtures. It employs unidirectional filters and normalization of gradient terms in the frequency domain. The proposed algorithm is shown to be approximately nonholonomic. Thus it provides improved convergence and separation performances without whitening effect for nonstationary sources such as speech and audio signals. Simulations using real world recordings confirm superior performances over existing algorithms and its usefulness for real applications.

Sound Source Separation Using Interaural Intensity Difference in Closely Spaced Stereo Omnidirectional Microphones (인접 배치된 스테레오 무지향성 마이크로폰 환경에서 양이간 강도차를 활용한 음원 분리 기법)

  • Chun, Chan Jun;Jeong, Seok Hee;Kim, Hong Kook
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.12
    • /
    • pp.191-196
    • /
    • 2013
  • In this paper, the interaural intensity difference (IID)-based sounr source separation method in closely spaced stereo omnidirectional microphones is proposed. First, in order to improve the channel separability, a minimum variance distortionless response (MVDR) beamformer is employed to increase the intensity difference between stereo channels. After that, IID-based sound source separation method is applied. In order to evaluate the performance of the proposed method, source-to-distortion ratio (SDR), source-to-interference ratio (SIR), and sources-to-artifacts ratio (SAR), which are defined as objective evaluation criteria in stereo audio source separation evaluation campaign (SASSEC), are measured. As a result, it was shown from the objective evaluation that the proposed method outperforms a sound source separation method without applying a beamformer.