References
- Hinton, Geoffrey E. et al. "Reducing the dimensionality of data with neural networks." Science, (2006)
- Hinton, Geoffrey E., Simon Osindero, and Yee-Whye Teh. "A fast learning algorithm for deep belief nets." Neural computation 18.7 (2006): 1527-1554. https://doi.org/10.1162/neco.2006.18.7.1527
- Bengio, Yoshua, et al. "Greedy layer-wise training of deep networks." Advances in neural information processing systems 19 (2007): 153.
- Hinton, Geoffrey, et al. "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups." IEEE Signal Processing Magazine 29.6 (2012): 82-97. https://doi.org/10.1109/MSP.2012.2205597
- Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. (2012).
- Donahue, Jeffrey, et al. "Long-term recurrent convolutional networks for visual recognition and description." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2015).
- Dahl, George E., et al. "Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition." IEEE Transactions on Audio, Speech, and Language Processing 20.1 (2012): 30-42. https://doi.org/10.1109/TASL.2011.2134090
- Understanding LSTM Networks, http://colah.github.io/posts/2015-08-Understanding-LSTMs/
- The Unreasonable Effectiveness of Recurrent Neural Networks, http://karpathy.github.io/2015/05/21/rnn-effectiveness/
- Schuster, Mike, and Kuldip K. Paliwal. "Bidirectional recurrent neural networks." IEEE Transactions on Signal Processing, IEEE, (1997)
- Mikolov, Tomas, et al. "Strategies for training large scale neural network language models." Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on. IEEE, (2011).
- Graves, Alex, Abdel-rahman Mohamed, and Geoffrey Hinton. "Speech recognition with deep recurrent neural networks." 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, (2013).
- Hannun, Awni, et al. "Deep speech: Scaling up end-to-end speech recognition." arXiv preprint arXiv:1412.5567(2014).
- Amodei, Dario, et al. "Deep speech 2: End-to-end speech recognition in English and mandarin." arXiv preprint arXiv:1512.02595(2015).
- A.W. Black, H. Zen, K. Tokuda, "Statistical parametric speech synthesis." 2007 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, (2007).
- Speech Synthesis, http://slideplayer.com/slide/3148265/
- http://www.slideshare.net/danilosoba1/statistical-parametric-speech-synthesis-heiga-zen
- Zen, Heiga, Andrew Senior, and Mike Schuster. "Statistical parametric speech synthesis using deep neural networks." 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, (2013).
- van den Oord, Aaron, et al. "Wavenet: A generative model for raw audio." arXiv preprint arXiv:1609.03499(2016)
- Soroush Mehri, et al. " SampleRNN: An Unconditional End-to-End Neural Audio Generation Model." https://openreview.net/forum?id=SkxKPDv5, under review on ICLR 2017.
- Southall, Carl, Ryan Stables, and Jason Hockman. "AUTOMATIC DRUM TRANSCRIPTION USING BI-DIRECTIONAL RECURRENT NEURAL NETWORKS." Proceedings of the International Society for Music Information Retrieval Conference (ISMIR). (2016).
- Vogl, Richard, Matthias Dorfer, and Peter Knees. "RECURRENT NEURAL NETWORKS FOR DRUM TRANSCRIPTION." Proceedings of the International Society for Music Information Retrieval Conference (ISMIR). (2016).
- Choi, Keunwoo, George Fazekas, and Mark Sandler. "Automatic tagging using deep convolutional neural networks." Proceedings of the International Society for Music Information Retrieval Conference (ISMIR). (2016).
- Schluter, Jan, Karen Ullrich, and Thomas Grill. "Structural segmentation with convolutional neural networks mirex submission." 10th running of the Music Information Retrieval Evaluation eXchange (MIREX 2014) (2014).
- Schluter, Jan. "Learning to pinpoint singing voice from weakly labeled examples." Proceedings of the International Society for Music Information Retrieval Conference (ISMIR). (2016).
- Leimeister, Matthias. "Feature learning for classifying drum components from nonnegative matrix factorization." Audio Engineering Society Convention 138. Audio Engineering Society, (2015).