Acknowledgement
이 논문은 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 연구임(NRF-2021R1F1A1063347).
References
- J. Lim and A. Oppenheim, "All-pole modeling of degraded speech," IEEE Trans. Acoust. Speech Signal Process. 26, 197-210 (1978). https://doi.org/10.1109/TASSP.1978.1163086
- S. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. Acoust. Speech Signal Process. 27, 113-120 (1979). https://doi.org/10.1109/TASSP.1979.1163209
- K. Tan and D. Wang, "A convolutional recurrent neural network for real-time speech enhancement," Proc. Interspeech, 3229-3233 (2018).
- Y. Hu, Y. Liu, S. Lv, M. Xing, S. Zhang, Y. Fu, J. Wu, B. Zhang, and L. Xie, "DCCRN: Deep complex convolution recurrent network for phase-aware speech enhancement," Proc. Interspeech, 2472-2476 (2020).
- H. S. Choi, J. H. Kim, J. Huh, A. Kim, J. W. Ha, and K. Lee, "Phase-aware speech enhancement with deep complex u-net," Proc. ICLR, 1-20 (2019).
- D. Wang and J. Chen, "Supervised speech separation based on deep learning: An overview," IEEE/ACM Trans. Audio, Speech, Language Process. 26, 1702-1726 (2018). https://doi.org/10.1109/TASLP.2018.2842159
- K. Paliwal, K. Wojcicki, and B. Shannon, "The importance of phase in speech enhancement," Speech Communication, 53, 465-494 (2011). https://doi.org/10.1016/j.specom.2010.12.003
- Y. Wang and D. L. Wang, "A deep neural network for time-domain signal reconstruction," Proc. ICASSP, 4390-4394 (2015).
- A. Li, C. Zheng, C. Fan, R. Peng, and X. Li, "A recursive network with dynamic attention for monaural speech enhancement," Proc. Interspeech, 2422-2426 (2020).
- Y. Koizumi, K. Yatabe, M. Delcroix, Y. Masuyama, and D. Takeuchi, "Speech enhancement using self-adaptation and multi-head self-attention," Proc. ICASSP, 181-185 (2020).
- Z. Qiquan, S. Qi, N. Zhaoheng, N. Aaron, and L. Haizhou, "Time-Frequency Attention for Monaural Speech Enhancement," Proc. ICASSP, 7852-7856 (2022).
- O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S. McDonagh, N. Y. Hammerla, B. Kainz, B. Glocker, and D. Rueckert, "Attention u-net: Learning where to look for the pancreas," Proc. MIDL, 1-10 (2018).
- Y. Luo and N. Mesgarani, "Conv-tasnet: Surpassing ideal time-frequency magnitude masking for speech separation," IEEE/ACM Trans. Audio, Speech, Language Process. 27, 1256-1266 (2019). https://doi.org/10.1109/TASLP.2019.2915167
- J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, and N. L. Dahlgren, "Acoustic-phonetic continuous speech corpus CD-ROM NIST speech disc 1-1.1," DARPA TIMIT, NIST Interagenct/Internal Rep., (NISTIR) 4930, 1993.
- E. Vincent, R. Gribonval, and C. Fevotte, "Performance measurement in blind audio source separation," IEEE Trans. Audio, Speech, Language Process. 14, 1462-1469 (2006). https://doi.org/10.1109/TSA.2005.858005
- A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, "Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs," Proc. ICASSP, 749-752 (2001).
- C. H. Taal, R. C. Hendriks, and R. Heusdens, "A short-time objective intelligibility measure for time-frequency weighted noisy speech," Proc. ICASSP, 4214-4217 (2010).
- O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation," Proc. MICCAI, 234-241 (2015).