1 |
G. Hinton et al., Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups, Signal Process Mag. 29 (2012), no. 6, 82-97.
DOI
|
2 |
G. E. Dahl et al., Context‐dependent pre‐trained deep neural networks for large‐vocabulary speech recognition, IEEE Trans. Audio Speech Language Process. 20 (2012), no. 1, 30-42.
DOI
|
3 |
L. Deng et al., Recent advances in deep learning for speech research at Microsoft, in IEEE Int. Conf. Acoustics, Speech, Signal Process. (ICASSP), Vancouver, Canada, May 26-31, 2013, pp. 8604-8608.
|
4 |
J. Pan et al., Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: why DNN surpasses GMMS in acoustic modeling, in IEEE Int. Symp. Chinese Spoken Language Process (ISCSLP), Kowloon, China, Dec. 2012, pp. 301-305.
|
5 |
A. L. Maas et al., Building DNN acoustic models for large vocabulary speech recognition, Comput. Speech Lang. 41 (2017), pp. 195-213.
DOI
|
6 |
T. N. Sainath et al., Deep convolutional neural networks for LVCSR, in IEEE Int. Conf. Acoustics, Speech Signal Processing (ICASSP), Vancouver, Canada, May 2013, pp. 8614-8618.
|
7 |
H. Sak, A. Senior, and F. Beaufays, Long short-term memory recurrent neural network architectures for large scale acoustic modeling, in Annu. Conf. Int. Speech Commun. Assoc., Singapore, Sept. 14-18, 2014, pp. 338-342.
|
8 |
T. N. Sainath et al., Convolutional, long short-term memory, fully connected deep neural networks, in IEEE Int. Conf. Acoustics, Speech Signal Process. (ICASSP), Brisbane, Australia, Apr. 19-24, 2015, pp. 4580-4584.
|
9 |
Y. Shinohara, Adversarial multi-task learning of deep neural networks for robust speech recognition, in INTERSPEECH, San Francisco, CA, USA, Sept. 8-12, 2016, pp. 2369-2372.
|
10 |
D. Povey, X. Zhang, and S. Khudanpur, Parallel training of deep neural networks with natural gradient and parameter averaging, arXiv preprint, 2014.
|
11 |
X. Cui, V. Goel, and B. Kingsbury, Data augmentation for deep neural network acoustic modeling, IEEE/ACM Trans. Audio Speech Language Process. 23 (2015), no. 9, 1469-1477.
DOI
|
12 |
V. Nair, and G. E. Hinton, Rectified linear units improve restricted Boltzmann machines, in Proc. Int. Conf. Mach. Learn. (ICML-10), Haifa, Israel, June 21-24, 2010, pp. 807-814.
|
13 |
K. Hermus, and P. Wambacq, A review of signal subspace speech enhancement and its application to noise robust speech recognition, EURASIP J. Appl. Signal Process. 2007 (2007), 1-15.
|
14 |
K. Hermus et al., Fully adaptive SVD-based noise removal for robust speech recognition, in Eur. Conf. Speech Commun. Technol., Budapest, Hungary, Sept. 5-9, 1999, pp. 1-4.
|
15 |
T. Schanze, Compression and noise reduction of biomedical signals by singular value decomposition, IFAC‐PapersOnLine 51 (2018), no. 2, 361-366.
DOI
|
16 |
S. Chirtmay, and M. Tahernezhadi, Speech enhancement using wiener filtering, Acoustics lett. 21, (1997), 110-115.
|
17 |
J. Chen et al., New insights into the noise reduction wiener filter, IEEE Trans. Audio Speech Language Process. 14 (2006), no. 4, 1218-1234.
DOI
|
18 |
S. Lee et al., Statistical model‐based noise reduction approach for car interior applications to speech recognition, ETRI J. 32 (2010), no. 5, 801-809.
DOI
|
19 |
D. Palaz et al., Analysis of CNN-based speech recognition system using raw speech as input, in INTERSPEECH, Dresden, Germany, Sept. 6-10, 2015, pp. 11-15.
|
20 |
P. Golik et al., Convolutional neural networks for acoustic modeling of raw time signal in LVCSR, in INTERSPEECH, Dresden, Germany, Sept. 6-10, 2015, pp. 26-30.
|
21 |
T. N. Sainath et al., Learning the speech front-end with raw waveform CLDNNs, in INTERSPEECH, Dresden, Germany, Sept. 6-10, 2015, pp. 1-5.
|
22 |
G. H. Golub, C. Reinsch, Singular value decomposition and least squares solutions, Numerische Mathematik 14 (1970), no. 5, 403-420.
DOI
|
23 |
D. Povey et al., The Kaldi speech recognition toolkit, in IEEE Workshop Automatic Speech Recogn. Understanding, Waikoloa, HI, USA, Dec. 11-15, 2011, no. EPFL-CONF192584.
|
24 |
D. B. Paul, J. M. Baker, The design for the wall street journal-based CSR corpus, in Proc. Workshop Speech Natural Language, Harriman, NY, USA, Feb. 23-26, 1992, pp. 357-362.
|
25 |
C. Lopes, F. Perdigao, Phoneme recognition on the TIMIT database, in Speech Technologies, InTech, 2011.
|