1 |
Cauchy, A. "Methode generale pour la resolution des systemes d'equations simultanees." Comp. Rend. Sci. Paris, Vol.25 (1847), 536-538.
|
2 |
Chollet, F. "Keras." Available at https://github.com/fchollet/keras (downloaded 1 December, 2016).
|
3 |
Chung, J., Cho, K., and Bengio, Y. "A Character-Level Decoder without Explicit Segmentation for Neural Machine Translation." arXiv:1603.06147 (2016).
|
4 |
Olah, Christopher. "Understanding LSTM Networks." Colah's Blog. Available at http://colah.github.io/posts/2015-08-Understan ding-LSTMs/ (downloaded 1 December, 2016).
|
5 |
Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., Senior, A., Tucker, P., Yang, K., Le, Q. V., et al. "Large Scale Distributed Deep Networks." In Advances in neural information processing systems, (2012), 1223-1231.
|
6 |
Dozat, T. "Incorporating Nesterov Momentum into Adam." Technical report, Stanford University, Available at http://cs229.stanford. edu/proj2015/054report.pdf (2015).
|
7 |
Duchi, J., Hazan, E., and Singer, Y. "Adaptive Subgradient Methods for Online Learning and Stochastic Optimization." Journal of Machine Learning Research, Vol. 12 (2011), 2121- 2159.
|
8 |
Gers, F. A. and Schmidhuber, E. "LSTM Recurrent Networks Learn Simple Context-Free and Context-Sensitive Languages." IEEE Transactions on Neural Networks, Vol. 12, No. 6 (2001), 1333-1340.
DOI
|
9 |
Goodfellow, I., Bengio, Y., and Courville, A. "Deep Learning." MIT Press, Massachusetts, 2016.
|
10 |
Hinton, G., Srivastava, N., and Swersky, K. "Neural networks for machine learning." Coursera, video lectures, Available at https://www.coursera.org/learn/neural-networks (downloaded 1 December, 2016).
|
11 |
Hochreiter, S. and Schmidhuber, J. "Long Short-Term Memory." Neural Computation, Vol. 9, No. 8 (1997), 1735-1780.
DOI
|
12 |
Hutter, M. "The Human Knowledge Compression Prize." Available at http://prize.hutter1.net/ (2006).
|
13 |
Jozefowicz, R., Vinyals, O., Schuster, M., Shazeer, N., and Wu, Y. "Exploring the Limits of Language Modeling." arXiv:1602.02410 (2016).
|
14 |
Kim, Y., Jernite, Y., Sontag, D., and Rush, A. M. "Character-Aware Neural Language Models." arXiv:1508.06615 (2015).
|
15 |
Kingma, D. and Ba, J. "Adam: A Method for Stochastic Optimization." arXiv:1412.6980 (2014).
|
16 |
Kim, Y.-h., Hwang, Y.-k., Kang, T.-g., and Jung, K.-m. "LSTM Language Model Based Korean Sentence Generation." The Journal of Korean Institute of Communications and Information Sciences, Vol. 41, No. 5 (2016), 592-601.
DOI
|
17 |
Ahn, S. "Deep Learning Architectures and Applications." Journal of Intelligence and Information Systems, Vol. 22, No. 2 (2016), 127-142.
DOI
|
18 |
Bojanowski, P., Joulin, A., and Mikolov, T. "Alternative Structures for Character-Level RNNs." arXiv:1511.06303 (2015).
|
19 |
Lankinen, M., Heikinheimo, H., Takala, P., and Raiko, T. "A Character-Word Compositional Neural Language Model for Finnish." arXiv:1612.03266 (2016).
|
20 |
Lee, D., Oh, Kh., and Choi, H.-J. "Measuring the Syntactic Similarity between Korean Sentences Using RNN." In Proceedings of Korea Computer Congress (2016a), 792-794.
|
21 |
Lee, J., Cho, K., and Hofmann, T. "Fully Character-Level Neural Machine Translation without Explicit Segmentation." arXiv:1610. 03017 (2016b).
|
22 |
Ling, W., Luis, T., Marujo, L., Astudillo, R. F., Amir, S., Dyer, C., Black, A. W., and Trancoso, I. "Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation." arXiv: 1508.02096 (2015).
|
23 |
Mikolov, T., Karafiat, M., Burget, L., Cernocky, J., and Khudanpur, S. "Recurrent Neural Network Based Language Model." In Proceedings of Interspeech (2010), 1045-1048.
|
24 |
Mikolov, T. and Zweig, G. "Context Dependent Recurrent Neural Network Language Model." SLT (2012), 234-239.
|
25 |
Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. "Dropout: A Simple Way to Prevent Neural Networks from Overfitting." Journal of Machine Learning Research, Vol. 15, No. 1 (2014), 1929-1958.
|
26 |
Polyak, B. T. "Some Methods of Speeding Up the Convergence of Iteration Methods." USSR Computational Mathematics and Mathematical Physics, Vol. 4, No. 5 (1964), 1-17.
DOI
|
27 |
Rissanen, J. and Langdon, G. G. "Arithmetic Coding." IBM Journal of research and development, Vol.23, No. 2 (1979), 149-162.
DOI
|
28 |
Socher, R. and Mundra, R. S. "CS 224D: Deep Learning for NLP1." Available at http://cs224d.stanford.edu/ (downloaded 1 December, 2016).
|
29 |
Sundermeyer, M., Schlu ̈ter, R., and Ney, H. "LSTM Neural Networks for Language Modeling." In Proceedings of Interspeech (2012), 194-197.
|
30 |
Sutskever, I. and Martens, J. "Generating Text with Recurrent Neural Networks." In Proceedings of the 28th International Conference on Machine Learning (2011), 1017-1024.
|
31 |
Theano Development Team. "Theano: A Python Framework for Fast Computation of Mathematical Expressions." arXiv:1605. 02688 (2016).
|
32 |
Ward, D. J., Blackwell, A. F., and MacKay, D. J. "Dasher-a Data Entry Interface Using Continuous Gestures and Language Models." In Proceedings of the 13th annual ACM symposium on User interface software and technology (2000), 129-137.
|
33 |
Zeiler, M. D. "ADADELTA: An Adaptive Learning Rate Method." arXiv:1212.5701 (2012).
|