Acknowledgement
이 논문은 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 연구임(No. 2019R1F1A106299513).
References
- M.Narinen, Active noise cancellation of drone propeller noise through waveform approximation and Pitch-shifting, (Ph.D. thesis, Georgia State University, 2020).
- J. Lim and A. Oppenheim, "All-pole modeling of degraded speech," IEEE Trans. on Acoustics, Speech, and Signal Process. 26, 197-210 (1978). https://doi.org/10.1109/TASSP.1978.1163086
- Y. Ephraim and D. Malah, "Speech enhancement using minimum mean square error short time spectral amplitude estimator," IEEE Trans. on Acoustics, Speech and Signal Process. 32, 1109-1121 (1984). https://doi.org/10.1109/TASSP.1984.1164453
- S. F. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. on Acoustics, Speech and Signal Proccess. 27, 113-120 (1979). https://doi.org/10.1109/TASSP.1979.1163209
- R. Martin, "Spectral subtraction based on minimum statistics," Proc. EUSIPCO, 1182-1185 (1994).
- P. J. Moreno, B. Raj, and R. M. Stern, "Data-driven environmental compensation for speech recognition: a unified approach," Speech Communication, 24, 267-285 (1998). https://doi.org/10.1016/S0167-6393(98)00025-9
- Q. Wang, H. Muckenhirn, K. Wilson, P. Sridhar, Z. Wu, J. Hershey, R. A. Saurous, R. J. Weiss, Y. Jia, and I. L. Moreno, "Voicefilter: Targeted voice separation by speaker-conditioned spectrogram masking," arXiv preprint arXiv:1810.04826 (2018).
- C. Deng, H. Song, Y. Zhang, Y. Sha, and X. Li, "DNN-based mask estimation integrating spectral and spatial features for robust beamforming," Proc. IEEE ICASSP, 4647-4651 (2020).
- N. Saleem, M. I. Khattak, M. Al-Hasan, and A. B. Qazi, "On learning spectral masking for single channel speech enhancement using feedforward and recurrent neural networks," in IEEE Access, 8, 160581-160595 (2020). https://doi.org/10.1109/access.2020.3021061
- M. Hasannezhad, Z. Ouyang, W. -P. Zhu, and B. Champagne, "Speech enhancement with phase sensitive mask estimation using a novel hybrid neural network," IEEE Open Journal of Signal Processing, 2, 136-150 (2021). https://doi.org/10.1109/OJSP.2021.3067147
- M. Hasannezhad, Z. Ouyang, W. -P. Zhu, and B. Champagne, "An integrated CNN-GRU framework for complex ratio mask estimation in speech enhancement," Proc. APSIPA ASC, 764-768 (2020).
- Y. Koizumi, K. Yatabe, M. Delcroix, Y. Masuyama, and D. Takeuchi, "Speech enhancement using self-adaptation and multi-head self-attention," Proc. IEEE ICASSP, 181-185 (2020).
- X. Hao, C. Shan, Y. Xu, S. Sun, and L. Xie, "An attention-based neural network approach for single channel speech enhancement," Proc. IEEE ICASSP, 6895-6899 (2019).
- S. K. Roy, A. Nicolson, and K. K. Paliwal, "Deep LPC-MHANet: Multi-head self-attention for augmented kalman filter-based speech enhancement," IEEE Access, 9, 70516-70530 (2021). https://doi.org/10.1109/ACCESS.2021.3077281
- A. Pandey and D. Wang, "Dense CNN with self-attention for time-domain speech enhancement," IEEE/ACM Trans Audio Speech Lang Process. 29, 1270-1279 (2021). https://doi.org/10.1109/TASLP.2021.3064421
- J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, and N. L. Dahlgren, "DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1," NASA STI/Recon Tech. Rep. 1993.
- Syma X5C-1, https://youtube.com/watch?v=aR3NgjOwzAo&feature=share, (Last viewed August 19, 2021).
- E. Vincent, R. Gribonval, and C. Fevotte, "Performance measurement in blind audio source separation," IEEE Trans. on audio, speech, and lang. process. 14, 1462-1469 (2006). https://doi.org/10.1109/TSA.2005.858005
- A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, "Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs," Proc. IEEE ICASSP, 01CH37221 (2001).
- C. H. Taal, R. C. Hendriks, R, Heusdens, and J. Jensen, "A short-time objective intelligibility measure for time- frequency weighted noisy speech," Proc. IEEE ICASSP, 4214-4217 (2010).