1 |
Y. Bengio, "A Neural Probabilistic Language Model", Journal of Machine Learning Research, Vol. 3, pp. 1137-1155, March 2003.
|
2 |
Y. Bengio, P. Simard, and P. Frasconi "Learning Long-Term Dependencies with Gradient Descent is Difficult," IEEE Transactions on Neural Networks, Vol. 51, No. 2, pp. 157-166, March 1994.
|
3 |
S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, Vol. 9, No. 8, pp. 1735-1780, 1997.
DOI
|
4 |
A. Graves, "Supervised Sequence Labelling with Recurrent Neural Networks," Textbook, Studies in Computational Intelligence, Springer, 2012.
|
5 |
T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient Estimation of Word Representations in Vector Space," arXiv preprint, arXiv:1301.3781, 2013.
|