Music and Voice Separation Using Log-Spectral Amplitude Estimator Based on Kernel Spectrogram Models Backfitting

Lee, Jun-Yong;Kim, Hyoung-Gook;

doi:10.7776/ASK.2015.34.3.227

The Journal of the Acoustical Society of Korea (한국음향학회지)

Volume 34 Issue 3
/
Pages.227-233
/
2015
/
1225-4428(pISSN)
/
2287-3775(eISSN)

The Acoustical Society of Korea (한국음향학회)

DOI QR Code

Music and Voice Separation Using Log-Spectral Amplitude Estimator Based on Kernel Spectrogram Models Backfitting

커널 스펙트럼 모델 backfitting 기반의 로그 스펙트럼 진폭 추정을 적용한 배경음과 보컬음 분리

Lee, Jun-Yong ;
Kim, Hyoung-Gook (Department of Wireless Communication Engineering, Kwang-Woon University)

이준용 (광운대학교 전파공학과) ;
김형국 (광운대학교 전파공학과)

Received : 2014.12.15
Accepted : 2015.04.01
Published : 2015.05.31

https://doi.org/10.7776/ASK.2015.34.3.227 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In this paper, we propose music and voice separation using kernel sptectrogram models backfitting based on log-spectral amplitude estimator. The existing method separates sources based on the estimate of a desired objects by training MSE (Mean Square Error) designed Winer filter. We introduce rather clear music and voice signals with application of log-spectral amplitude estimator, instead of adaptation of MSE which has been treated as an existing method. Experimental results reveal that the proposed method shows higher performance than the existing methods.

본 논문은 커널 스펙트럼 모델 backfitting 기반의 로그 스펙트럼 진폭 추정부를 적용한 배경음과 보컬음 분리를 제안한다. 기존의 커널 스펙트럼 모델 기반의 배경음과 보컬음 분리는 추출하고자하는 객체의 모델을 기반으로 위너형태의 평균 제곱의 오차의 이득값을 학습함으로써 배경음과 보컬음을 분리하는 기술이다. 본 논문은 기존의 커널 스펙트럴 모델 기반의 배경음과 보컬음 분리 방식에서 위너형태의 이득값 대신 로그 스펙트럼 진폭 추정을 적용하여 기존 방식 보다 명료한 배경음과 보컬음을 추출한다. 실험결과는 본 논문에서 제안한 방식이 기존의 방식들보다 더 우수하다는 것을 보인다.

Keywords

References

P. Comon and C. Jutten, Handbook of Blind Source Separation: Independent Component Analysis and Applications (Academic Press, 2010). pp. 208-214.
P.-S. Huang, S. D. Chen, P. Smaragdis, and M. H. Johnson, "Singing-voice separation from monaural recordings using robust principal component analysis," in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 57-60 (2012).
A. Ozerov, E. Vincent, and F. Bimbot, "A general flexible framework for the handling of prior information in audio source separation," Audio, Speech, and Language Processing, IEEE Transactions on, 1118-1133 (2011)
Z. Rafii and B. Pardo, "Repeating pattern extraction technique (REPET): A simple method for music/voice separation," IEEE Transactions on Audio, Speech & Language Processing, 71-82 (2013).
A. Liutkus, Z. Rafii, E. Fitzgerald and L. Daudet, "Kernel spectrogram models for source separation," 4th Joint Workshop on Hands-free Speech Communication Microphone Arrays, (2014).
Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-square error log-spectral amplitude estimator," IEEE Trans. Acoust. Speech Signal Process, 443-445 (1985).
B. J. Shannon and K. K. Paliwal, "Role of phase estimation in speech enhancement," in Proc. 9th Int. Conf. Spoken Language Processing - Interspeech, Pittsburgh, PA, 1423-1426 (2006).
Y. Ephraim and I. Cohen, "Recent advancements in speech enhancement," in the Electrical Engineering Handbook, (CRC press, 2005).
E. Vincent, R. Gribonval, and M. Plumbley, "Oracle estimators for the benchmarking of source separation algorithms," Signal Processing, 1933-1950, (2007).

Cited by

An Overview of Lead and Accompaniment Separation in Music vol.26, pp.8, 2018, https://doi.org/10.1109/TASLP.2018.2825440

The Journal of the Acoustical Society of Korea (한국음향학회지)

Music and Voice Separation Using Log-Spectral Amplitude Estimator Based on Kernel Spectrogram Models Backfitting

커널 스펙트럼 모델 backfitting 기반의 로그 스펙트럼 진폭 추정을 적용한 배경음과 보컬음 분리

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)