참고문헌
- G. Hinton et al., Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups, Signal Process Mag. 29 (2012), no. 6, 82-97. https://doi.org/10.1109/MSP.2012.2205597
- G. E. Dahl et al., Context‐dependent pre‐trained deep neural networks for large‐vocabulary speech recognition, IEEE Trans. Audio Speech Language Process. 20 (2012), no. 1, 30-42. https://doi.org/10.1109/TASL.2011.2134090
- L. Deng et al., Recent advances in deep learning for speech research at Microsoft, in IEEE Int. Conf. Acoustics, Speech, Signal Process. (ICASSP), Vancouver, Canada, May 26-31, 2013, pp. 8604-8608.
- J. Pan et al., Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: why DNN surpasses GMMS in acoustic modeling, in IEEE Int. Symp. Chinese Spoken Language Process (ISCSLP), Kowloon, China, Dec. 2012, pp. 301-305.
- A. L. Maas et al., Building DNN acoustic models for large vocabulary speech recognition, Comput. Speech Lang. 41 (2017), pp. 195-213. https://doi.org/10.1016/j.csl.2016.06.007
- T. N. Sainath et al., Deep convolutional neural networks for LVCSR, in IEEE Int. Conf. Acoustics, Speech Signal Processing (ICASSP), Vancouver, Canada, May 2013, pp. 8614-8618.
- H. Sak, A. Senior, and F. Beaufays, Long short-term memory recurrent neural network architectures for large scale acoustic modeling, in Annu. Conf. Int. Speech Commun. Assoc., Singapore, Sept. 14-18, 2014, pp. 338-342.
- T. N. Sainath et al., Convolutional, long short-term memory, fully connected deep neural networks, in IEEE Int. Conf. Acoustics, Speech Signal Process. (ICASSP), Brisbane, Australia, Apr. 19-24, 2015, pp. 4580-4584.
- Y. Shinohara, Adversarial multi-task learning of deep neural networks for robust speech recognition, in INTERSPEECH, San Francisco, CA, USA, Sept. 8-12, 2016, pp. 2369-2372.
- D. Povey, X. Zhang, and S. Khudanpur, Parallel training of deep neural networks with natural gradient and parameter averaging, arXiv preprint, 2014.
- X. Cui, V. Goel, and B. Kingsbury, Data augmentation for deep neural network acoustic modeling, IEEE/ACM Trans. Audio Speech Language Process. 23 (2015), no. 9, 1469-1477. https://doi.org/10.1109/TASLP.2015.2438544
- V. Nair, and G. E. Hinton, Rectified linear units improve restricted Boltzmann machines, in Proc. Int. Conf. Mach. Learn. (ICML-10), Haifa, Israel, June 21-24, 2010, pp. 807-814.
- K. Hermus, and P. Wambacq, A review of signal subspace speech enhancement and its application to noise robust speech recognition, EURASIP J. Appl. Signal Process. 2007 (2007), 1-15.
- K. Hermus et al., Fully adaptive SVD-based noise removal for robust speech recognition, in Eur. Conf. Speech Commun. Technol., Budapest, Hungary, Sept. 5-9, 1999, pp. 1-4.
- T. Schanze, Compression and noise reduction of biomedical signals by singular value decomposition, IFAC‐PapersOnLine 51 (2018), no. 2, 361-366. https://doi.org/10.1016/j.ifacol.2018.03.062
- S. Chirtmay, and M. Tahernezhadi, Speech enhancement using wiener filtering, Acoustics lett. 21, (1997), 110-115.
- J. Chen et al., New insights into the noise reduction wiener filter, IEEE Trans. Audio Speech Language Process. 14 (2006), no. 4, 1218-1234. https://doi.org/10.1109/TSA.2005.860851
- S. Lee et al., Statistical model‐based noise reduction approach for car interior applications to speech recognition, ETRI J. 32 (2010), no. 5, 801-809. https://doi.org/10.4218/etrij.10.1510.0024
- D. Palaz et al., Analysis of CNN-based speech recognition system using raw speech as input, in INTERSPEECH, Dresden, Germany, Sept. 6-10, 2015, pp. 11-15.
- P. Golik et al., Convolutional neural networks for acoustic modeling of raw time signal in LVCSR, in INTERSPEECH, Dresden, Germany, Sept. 6-10, 2015, pp. 26-30.
- T. N. Sainath et al., Learning the speech front-end with raw waveform CLDNNs, in INTERSPEECH, Dresden, Germany, Sept. 6-10, 2015, pp. 1-5.
- G. H. Golub, C. Reinsch, Singular value decomposition and least squares solutions, Numerische Mathematik 14 (1970), no. 5, 403-420. https://doi.org/10.1007/BF02163027
- D. Povey et al., The Kaldi speech recognition toolkit, in IEEE Workshop Automatic Speech Recogn. Understanding, Waikoloa, HI, USA, Dec. 11-15, 2011, no. EPFL-CONF192584.
- D. B. Paul, J. M. Baker, The design for the wall street journal-based CSR corpus, in Proc. Workshop Speech Natural Language, Harriman, NY, USA, Feb. 23-26, 1992, pp. 357-362.
- C. Lopes, F. Perdigao, Phoneme recognition on the TIMIT database, in Speech Technologies, InTech, 2011.
피인용 문헌
- CGNet: A graph-knowledge embedded convolutional neural network for detection of pneumonia vol.58, pp.1, 2019, https://doi.org/10.1016/j.ipm.2020.102411