Acknowledgement
이 논문은 2023년 정부(방위사업청)의 재원으로 국방기술진흥연구소의 지원을 받아 수행된 연구임(20-106-00-003).
References
- F. Bahmaninezhad, J. Wu, R. Gu, S.-X. Zhang, Y. Xu, M. Yu, and D. Yu, "A comprehensive study of speech separation: spectrogram vs waveform separation," Proc. Interspeech, 4574-4578 (2019).
- P.-S. Huang, M. Kim, M. Hasegawa-Johnson, and P. Smaragdis, "Singing-voice separation from monaural recordings using deep recurrent neural networks," Proc. ISMIR, 477-482 (2014).
- B. Gao, W. L. Woo, and S. S. Dlay, "Adaptive sparsity non-negative matrix factorization for single-channel source separation," IEEE J. Sel. Top. Signal Process, 5, 989-1001 (2011). https://doi.org/10.1109/JSTSP.2011.2160840
- N. mitianoudis and M. E. davies, "Audio source separation of convolutive mixtures," IEEE trans. Speech, Audio, Process. 11, 489-497 (2003). https://doi.org/10.1109/TSA.2003.815820
- D. Stoller, S. Ewert, and S. Dixon, "Wave-u-net: A multi-scale neural network for end-to-end audio source separation," Proc. ISMIR, 1-7 (2018).
- Y. Luo, Z. Chen, and T. Yoshioka, "Dual-path rnn: efficient long sequence modeling for time-domain single-channel speech separation," Proc. ICASSP, 46-50 (2020).
- S. Venkataramani, J. Casebeer, and P. Smaragdis, "End-to-end source separation with adaptive frontends," Proc. 52nd Asilomar Conf. Sig. Sys. Comput. 684-688 (2018).
- F. Lluis, J. Pons, and X. Serra, "End-to-end music source separation: Is it possible in the waveform domain?," Proc. Interspeech, 4619-4623 (2018).
- I. Kavalerov, S. Wisdom, H. Erdogan, B. Patton, K. Wilson, J. Le Roux, and J. R. Hershey, "Universal sound separation," Proc. IEEE WASPAA, 175-179 (2019).
- K. He, X. Zhang, S. Ren, and J. Sun, "Identity mappings in deep residual networks," Proc. ECCV, 630-645 (2016).
- Y. Luo and N. Mesgarani, "Tasnet: time-domain audio separation network for real-time, single-channel speech separation," IEEE ICASSP, 696-700 (2018).
- D. Santos-Dominguez, S. Torres-Guijarro, A. Cardenal-Lopez, A. Pena-Gimenez, "ShipsEar: An underwater vessel noise database," Appl. Acoust. 113, 64-69 (2016). https://doi.org/10.1016/j.apacoust.2016.06.008
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, "Pytorch: An imperative style, high-performance deep learning library," Proc. NeurIPS, 1-12 (2019).
- D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," Proc. ICLR, 1-15 (2014).
- Y Luo and N. Mesgarani, "Conv-tasnet: surpassing ideal time-frequency magnitude masking for speech separation," IEEE/ACM Trans. Audio, Speech, and Lang. Process. 27, 1256-1266 (2019). https://doi.org/10.1109/TASLP.2019.2915167
- M. Kolbaek, D. Yu, Z.-H. Tan, and J. Jensen, "Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks," IEEE/ACM Trans. Audio, Speech, and Lang. Process. 25, 1901-1913 (2017). https://doi.org/10.1109/TASLP.2017.2726762