References
- G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition," IEEE Signal Process. Mag. 29, 82-97 (2012).
- G. E. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependent pre-trained deep neural networks for large vocabulary speech recognition," IEEE Trans. Audio, Speech, and Lang. Process. 20, 33-42 (2012).
- C. Weng, D. Yu, S. Watanabe, and B. H. F. Juang, "Recurrent deep neural networks for robust speech recognition," in Proc. IEEE ICASSP, 5532-5536 (2014).
- Y. Lei, N. Scheffer, L. Ferrer, and M. McLaren, "A novel scheme for speaker recognition using a phonetically-aware deep neural network," in Proc. IEEE ICASSP, 1695-1699 (2014).
- D. G. Romero and A. McCree, "Insight into deep neural networks for speaker recognition," in Proc. Interspeech, 1141-1145 (2015).
- S. J. Pan and Q. Yang, "A survey on transfer learning," IEEE Trans. Knowl. Data Eng. 22, 1345-1359 (2010). https://doi.org/10.1109/TKDE.2009.191
- L. Deng and X. Li, "Machine learning paradigms for speech recognition: An overview," IEEE Trans. Audio, Speech, Lang. Process. 21, 1060-1089 (2013). https://doi.org/10.1109/TASL.2013.2244083
- A. Das and M. Hasegawa-Johnson, "Cross-lingual transfer learning during supervised training in low resource scenarios," in Proc. Interspeech, 3531-3535 (2015).
- J. T. Huang, J. Li, D. Yu, L. Deng, and Y. Gong, "Crosslanguage knowledge transfer using multilingual deep neural network with shared hidden layers," in Proc. IEEE ICASSP, 7304-7308 (2013).
- O. Gencoglu, T. Virtanen, and H. Huttunen, "Recognition of acoustic events using deep neural networks," in Proc. IEEE European Signal Process. Conf, 506-510 (2014).
- M. Espi, M. Fujimoto, K. Kinoshita, and T. Nakatani, "Feature extraction strategies in deep learning based acoustic event detection," in Proc. Interspeech, 2922-2926 (2015).
- S. Nakamura, K. Hiyane, F. Asano, T. Yamada, and T. Endo, "Data collection in real acoustical environments for sound scene understanding and hands-free speech recognition," in Proc. Eurospeech, 2255-2258 (1999).
- P. Price, W. M. Fisher, J. Bernstein, and D. S. Pallett, "The DARPA 1000-word resource management database for continuous speech recognition," in Proc. IEEE ICASSP, 651-654 (1988).
- G. Tzanetakis and P. Cook, "Musical genre classification of audio signals," IEEE Trans. Audio, Speech and Lang. Process. 10, 293-302 (2002). https://doi.org/10.1109/TSA.2002.800560
- Y. Miao, "Kaldi+PDNN: building DNN-based ASR systems with Kaldi and PDNN," arXiv:1401.6984, (2014).
- J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, "How transferable are features in deep neural networks?" in Proc. Neural Inform. Process. Syst., 3320-3328 (2014).