Music/Voice Separation Based on Kernel Back-Fitting Using Weighted β-Order MMSE Estimation

Kim, Hyoung-Gook;Kim, Jin Young;

doi:10.4218/etrij.16.0115.0256

ETRI Journal

Volume 38 Issue 3
/
Pages.510-517
/
2016
/
1225-6463(pISSN)
/
2233-7326(eISSN)

Electronics and Telecommunications Research Institute (한국전자통신연구원)

DOI QR Code

Music/Voice Separation Based on Kernel Back-Fitting Using Weighted β-Order MMSE Estimation

Kim, Hyoung-Gook (Department of Electronics Convergence Engineering, Kwangwoon University) ;
Kim, Jin Young (Department of Electronics and Computer Engineering, Chonnam National University)

Received : 2015.03.16
Accepted : 2015.12.28
Published : 2016.06.01

https://doi.org/10.4218/etrij.16.0115.0256 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Recent developments in the field of separation of mixed signals into music/voice components have attracted the attention of many researchers. Recently, iterative kernel back-fitting, also known as kernel additive modeling, was proposed to achieve good results for music/voice separation. To obtain minimum mean square error (MMSE) estimates of short-time Fourier transforms of sources, generalized spatial Wiener filtering (GW) is typically used. In this paper, we propose an advanced music/voice separation method that utilizes a generalized weighted ${\beta}$-order MMSE estimation (WbE) based on iterative kernel back-fitting (KBF). In the proposed method, WbE is used for the step of mixed music signal separation, while KBF permits kernel spectrogram model fitting at each iteration. Experimental results show that the proposed method achieves better separation performance than GW and existing Bayesian estimators.

Keywords

References

Z. Rafii and B. Pardo, "REpeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation," IEEE Trans. Audio, Speech, Language Process., vol. 21, no. 1, Jan. 2013, pp. 73-84. https://doi.org/10.1109/TASL.2012.2213249
N.C. Maddage, C. Xu, and Y. Wang, "Singer Identification Based on Vocal and Instrumental Models," Proc. Int. Conf. Pattern Recogn., Cambridge, UK, Aug. 23-26, 2004, pp. 375-378.
M. Ryynanen and A. Klapuri, "Transcription of the Singing Melody in Polyphonic Music," Int. Conf. Music Inf. Retrieval, Victoria, Canada, Oct. 8-12, 2006, pp. 222-227.
S. Marchand et al., "DReaM: A Novel System for Joint Source Separation and Multi-track Coding," 133rd AES Conv., San Francisco, CA, USA, Oct. 26-29, 2012.
J. Nikunen, T. Virtanen, and M. Vilermo, "Multichannel Audio Upmixing Based on Non-negative Tensor Factorization Representation," IEEE Workshop Appl. Signal Process. Audio Acoust., New Paltz, NY, USA, Oct. 16-19, 2011, pp. 33-36.
U. Simsekli, Y.K. Yilmaz, and A.T. Cemgil, "Score Guided Audio Restoration via Generalized Coupled Tensor Factorisation," IEEE Int. Conf. Acoust., Speech Signal Process., Kyoto, Japan, Mar. 25-30, 2012, pp. 5369-5372.
J.L. Durrieu, B. David, and G. Richard, "A Musically Motivated Mid-level Representation for Pitch Estimation and Musical Audio Source Separation," IEEE J. Sel. Topics Signal Process., vol. 5, no. 6, Oct. 2011, pp. 1180-1191. https://doi.org/10.1109/JSTSP.2011.2158801
C.L. Hsu and J.S.R. Jang, "On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset," IEEE Trans. Audio, Speech, Language Process., vol. 18, no. 2, Feb. 2010, pp. 310-319. https://doi.org/10.1109/TASL.2009.2026503
T. Virtanen, A. Mesaros, and M. Ryynanen, "Combining Pitch-Based Inference and Non-negative Spectrogram Factorization in Separating Vocals from Polyphonic Music," ISCA Tutorial Res. Workshop Statistical Perceptual Audition, Brisbane, Australia, Sept. 21, 2008, pp. 17-22.
A. Liutkus et al., "Kernel Additive Models for Source Separation," IEEE Trans. Signal Process., vol. 62, no. 16, Aug. 2014, pp. 4298-4310. https://doi.org/10.1109/TSP.2014.2332434
D. Fitzgerald, "Harmonic/Percussive Separation Using Median Filtering," Int. Conf. Digital Audio Effects, Graz, Austria, Sept. 6-10, 2010, pp. 1-4.
Z. Rafii and B. Pardo, "A Simple Music/Voice Separation Method Based on the Extraction of the Repeating Musical Structure," IEEE Int. Conf. Acoust., Speech Signal Process., Prague, Czech Republic, May 22-27, 2011, pp. 221-224.
A. Liutkus et al., "Adaptive Filtering for Music/Voice Separation Exploiting the Repeating Musical Structure," IEEE Int. Conf. Acoust., Speech Signal Process., Kyoto, Japan, Mar. 25-30, 2012, pp. 53-56.
Z. Rafii and B. Pardo, "Music/Voice Separation Using the Similarity Matrix," Int. Conf. Music Inf. Retrieval, Porto, Portugal, Oct. 8-12, 2012, pp. 583-588.
O. Yilmaz and S. Rickard, "Blind Separation of Speech Mixtures via Time-Frequency Masking," IEEE Trans. Signal Process., vol. 52, no. 7, July 2004, pp. 1830-1847. https://doi.org/10.1109/TSP.2004.828896
Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator," IEEE Trans. Acoust., Speech, Signal Process., vol. 32, no. 6, Dec. 1984, pp. 1109-1121. https://doi.org/10.1109/TASSP.1984.1164453
E. Plourde and B. Champagne, "Auditory-Based Spectral Amplitude Estimators for Speech Enhancement," IEEE Trans. Audio, Speech, Language Process., vol. 16, no. 8, Nov. 2008, pp. 1614-1623. https://doi.org/10.1109/TASL.2008.2004304
C.H. You, S.N. Koh, and S. Rahardja, "${\beta}$-Order MMSE Spectral Amplitude Estimation for Speech Enhancement," IEEE Trans. Speech, Audio Process., vol. 13, no. 4, July. 2005, pp. 475-486. https://doi.org/10.1109/TSA.2005.848883
F. Deng, F. Bao, and C.-C. Bao, "Speech Enhancement Using Generalized ${\beta}$-Order Spectral Amplitude Estimator," Speech Commun., vol. 59, Apr. 2014, pp. 55-68. https://doi.org/10.1016/j.specom.2014.01.002
C.H. You, S.N. Koh, and S. Rahardja, "Masking-Based ${\beta}$-Order MMSE Speech Enhancement," Speech Commun., vol. 48, no. 1, Jan. 2006, pp. 57-70. https://doi.org/10.1016/j.specom.2005.05.012
C.H. You, S.N. Koh, and S. Rahardja, "Improved Adaptive ${\beta}$- Order MMSE Speech Enhancement," APSIPA Ann. Summit Conf., Sapporo, Japan, Oct. 4-7, 2009, pp. 797-800.
D.D. Greenwood, "A Cochlear Frequency-Position Function for Several Species-29 Years Later," J. Acoust. Soc. America, vol. 87, no. 6, July 1990, pp. 2592-2605. https://doi.org/10.1121/1.399052
Multimedia Technology Laboratory homepage, Accessed Nov. 20, 2015. http://imsp.kw.ac.kr/Research.html
E. Vincent, R. Gribonval, and C. Fevotte, "Performance Measurement in Blind Audio Source Separation," IEEE Trans. Audio, Speech, Language Process., vol. 14, no. 4, July 2006, pp. 1462-1469. https://doi.org/10.1109/TSA.2005.858005
R.C. Hendriks et al., "Minimum Mean-Square Error Amplitude Estimators for Speech Enhancement under the Generalized Gamma Distribution," Int. Workshop Acoust. Echo Noise Contr., Paris, France, Sept. 12-14, 2006, pp. 1-4.
Z. Rafii, A. Liutkus, and B. Pardo, "REPET for Background/Foreground Separation in Audio," in Blind Source Separation: Advances in Theory, Algorithms and Appl., Berlin, Germany: Springer, 2014, pp. 395-411.
P.S. Huang et al., "Singing-Voice Separation from Monaural Recordings Using Robust Principal Component Analysis," IEEE Int. Conf. Acoust., Speech Signal Process., Kyoto, Japan, Mar. 25-30, 2012, pp. 57-60.

ETRI Journal

Music/Voice Separation Based on Kernel Back-Fitting Using Weighted β-Order MMSE Estimation

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)