DOI QR코드

DOI QR Code

배경음악 분리를 위한 확장된 합성곱을 이용한 멀티 밴드 멀티 스케일 DenseNet

Multi-band multi-scale DenseNet with dilated convolution for background music separation

  • 허운행 (충북대학교 일반대학원 제어로봇공학전공) ;
  • 김혜미 (한국전자통신연구원 차세대콘텐츠연구본부) ;
  • 권오욱 (충북대학교 전자공학부)
  • 투고 : 2019.09.10
  • 심사 : 2019.09.20
  • 발행 : 2019.11.30

초록

방송 콘텐츠의 혼합 신호에서 배경음악 신호를 분리하는 확장된 합성곱을 이용한 멀티 밴드 멀티 스케일 DenseNet을 제안한다. 확장된 합성곱은 스펙트로그램의 다양한 스케일 문맥 정보를 학습하기 용이하도록 한다. 컴퓨터 모의실험 결과, 제안한 구조는 신호대잡음비(Signal to Noise Ratio, SNR) 0 dB, -10 dB의 환경에서 각각 0.15 dB, 0.27 dB의 신호대왜곡비(Signal to Distortion Ratio, SDR)를 개선하였다.

We propose a multi-band multi-scale DenseNet with dilated convolution that separates background music signals from broadcast content. Dilated convolution can learn the multi-scale context information represented by spectrogram. In computer simulation experiments, the proposed architecture is shown to improve Signal to Distortion Ratio (SDR) by 0.15 dB and 0.27 dB in 0dB and -10 dB Signal to Noise Ratio (SNR) environments, respectively.

키워드

참고문헌

  1. D. D. Lee and H. S. Seung, "Algorithms for nonnegative matrix factorization," Proc. NIPS, 556-562 (2001).
  2. J. Le Roux, J. Hershey, and F. Weninger, "Deep NMF for speech separation," Proc. IEEE Int. Conf. Acoust., Speech Signal Process, 66-70 (2015).
  3. A. A. Nugraha, A. Liutkus, and E. Vincent, "Multichannel music separation with deep neural networks," Proc. EUSIPCO. 1748-1752 (2015).
  4. A. Jansson, E. Humphrey, N. Montecchio, R. Bittner, A. Kumar, and T. Weyde, "Singing voice separation with deep U-Net convolutional Networks," Proc. ISMIR, 323-332 (2017).
  5. N. Takahashi and Y. Mitsufuji, "Multi-scale multiband DenseNets for audio source separation," Proc. WASPAA. 261-265 (2017).
  6. O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation," Proc. Int. Conf. Medical Image Computing and Computer-Assisted Intervention, 234-241 (2015).
  7. G. Huang, Z. Liu, K. Q. Weinberger, and L. Maaten, "Densely connected convolutional networks," Proc. CVPR. 4700-4708 (2017).
  8. D. Stoller, S. Ewert, and S. Dixon, "Wave-u-net: A multi-scale neural network for end-to-end audio source separation," Proc. ISMIR. (2018).
  9. D. Ward, R. D. Mason, R. C. Kim, F.-R. Stoter, A. Liutkus, and M. D. Plumbley, "SISEC 2018: State of the art in musical audio source separation-subjective selection of the best algorithm," Proc. 4th Workshop on Intelligent Music Production, (2018).
  10. N. Takahashi, P. Agrawal, N. Goswami, and Y. Mitsufuji, "PhaseNet: Discretized phase modeling with deep neural networks for audio source separation," Proc. Interspeech, 2713-2717 (2018).
  11. N. Takahashi, N. Goswami, and Y. Mitsufuji, "MM DenseLSTM: An efficient combination of convolutional and recurrent neural networks for audio source separation," Proc. IWAENC. 106-110 (2018).
  12. F. Yu and V. Koltun, "Multi-scale context aggregation by dilated convolutions," Proc. Int. Conf. Learn. Representations, (2016).
  13. S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," Proc. ICML. 448-456 (2015).
  14. X. Glorot, A. Bordes, and Y. Bengio, "Deep sparse rectifier neural networks," Proc. AISTATS. 315-323 (2011).
  15. V. Dumoulin and F. Visin, "A guide to convolution arithmetic for deep learning," arXiv preprint arXiv: 1603.07285 (2016).
  16. H. Kim, J. Kim, and J. Park, "Music-speech separation based background music identification in TV programs" (in Korean), Proc. HCI KOREA, 1158-1161 (2019).
  17. A. Liutkus, F. Stoter, Z. Rafii, D. Kitamura, B. Rivet, N. Ito, N. Ono, and J. Fontecave, "The 2016 Signal separation evaluation campaign," Proc. LVA/ICA. 66-70 (2017).