Acknowledgement
본 논문은 인천대학교 2018년자체연구비 지원에 의하여 연구되었음.
References
- J. Lim and A. Oppenheim, "All-pole modeling of degraded speech," IEEE Trans. on Acoustics, Speech, and Signal Process. 26, 197-210 (1978). https://doi.org/10.1109/TASSP.1978.1163086
- R. Martin, "Spectral subtraction based on minimum statistics," Proc. EUSIPCO, 1182-1185 (1994).
- D. L. Wang and J. Chen, "Supervised speech separation based on deep learning: An overview," IEEE/ACM Trans. on Audio, Speech, and Lang. Process. 26, 1702-1726 (2018). https://doi.org/10.1109/TASLP.2018.2842159
- H. S. Choi, J. H. Kim, J. Huh, A. Kim, J. W. Ha, and K. Lee, "Phase-aware speech enhancement with deep complex u-net," Proc. ICLR, 1-20 (2019).
- C. Trabelsi, O. Bilaniuk, Y. Zhang, D. Serdyuk, S. Subramanian, J. F. Santos, S. Mehri, N. Rostamzadeh, Y. Bengio, and C. J. Pal, "Deep complex networks," Proc. ICLR, 1-19 (2018).
- K. Paliwal, K. Wojcicki, and B. Shannon, "The importance of phase in speech enhancement," Speech Communication, 53, 465-494 (2011).
- Y. Wang and D. L. Wang, "A deep neural network for time-domain signal reconstruction," Proc. IEEE ICASSP, 4390-4394 (2015).
- H. Wang, X. Zhang, and D. L. Wang, "Attention-based fusion for bone-conducted and air-conducted speech enhancement in the complex domain," Proc. IEEE ICASSP, 7757-7761 (2022).
- S. Zhao, B. Ma, K. N. Watcharasupat, and W. S. Gan, "FRCRN: Boosting feature representation using frequency recurrence for monaural speech enhancement," Proc. IEEE ICASSP, 9281-9285 (2022).
- V. Kothapally and J. H. Hansen, "Complex-valued time-frequency self-attention for speech dereverberation," Proc. Interspeech, 2543-2547 (2022).
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," Proc. NIPS, 6000-6010 (2017).
- C. Tang, C. Luo, Z. Zhao, W. Xie, and W. Zeng, "Joint time-frequency and time domain learning for speech enhancement," Proc. 29th IJCAI, 3816-3822 (2021).
- V. Kothapally, W. Xia, S. Ghorbani, J. H. Hansen, W. Xue, and J. Huang, "Skipconvnet: Skip convolutional neural network for speech dereverberation using optimally smoothed spectral mapping," Proc. Interspeech, 3935-3939 (2020).
- M. Drozdzal, E. Vorontsov, G. Chartrand, S. Kadoury, and C. Pal, "The importance of skip connections in biomedical image segmentation," Proc. DLMIA, 179-187 (2016).
- T. Tong, G. Li, X. Liu, and Q. Gao, "Image super-resolution using dense skip connections," Proc. IEEE ICCV, 4799-4807 (2017).
- Y. Luo and N. Mesgarani, "Conv-tasnet: surpassing ideal time-frequency magnitude masking for speech separation," IEEE/ACM Trans. on Audio, Speech, and Lang. Process. 27, 1256-1266 (2019). https://doi.org/10.1109/TASLP.2019.2915167
- J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, and N. L. Dahlgren, "Acoustic-phonetic continuous speech corpus CD-ROM NIST speech disc 1-1.1," DARPA TIMIT, NIST Interagenct/Internal Rep., (NISTIR) 4930, 1993.
- E. Vincent, R. Gribonval, and C. Fevotte, "Performance measurement in blind audio source separation," IEEE Trans. on Audio, Speech, and Lang. Process. 14, 1462-1469 (2006). https://doi.org/10.1109/TSA.2005.858005
- A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, "Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs," Proc. IEEE ICASSP, 749-752 (2001).
- C. H. Taal, R. C. Hendriks, and R. Heusdens, "A short-time objective intelligibility measure for timefrequency weighted noisy speech," Proc. IEEE ICASSP, 4214-4217 (2010).