1 |
Hodosh, Micah, Young, Peter, and Hockenmaier, Julia. Framing image description as a ranking task: Data, models and evaluation metrics. JAIR, 47: 853-899, 2013.
|
2 |
Mao, Junhua, Xu, Wei, Yang, Yi, Wang, Jiang, and Yuille, Alan. Deep captioning with multimodal recurrent neural networks (m-rnn). arXiv:1412.6632, 2014.
|
3 |
Cho, Kyunghyun, van Merrienboer, Bart, Gulcehre, Caglar, Bougares, Fethi, Schwenk, Holger, and Bengio, Yoshua. Learning phrase representations using RNN encoder-decoder for statistical machine translation. EMNLP, 2014.
|
4 |
Vinyals, Oriol, Toshev, Alexander, Bengio, Samy, and Erhan, Dumitru. Show and tell: A neural image caption generator. arXiv:1411.4555, 2014.
|
5 |
Karpathy, Andrej and Li, Fei-Fei. Deep visualsemantic alignments for generating image descriptions. arXiv:1412.2306, 2014.
|
6 |
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. ICML, 2015.
|
7 |
Bahdanau, D. et al., "Neural machine translation by jointly learning to align and translate," Proc. of ICLR'15, arXiv:1409.0473, 2015.
|
8 |
Simonyan, K. and Zisserman, A. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.
|
9 |
Young, Peter, Lai, Alice, Hodosh, Micah, and Hockenmaier, Julia. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. TACL, 2:67-78, 2014.
|
10 |
Lin, et al. Microsoft coco: Common objects in context. arXiv preprint arXiv:1405.0312, 2014.
|
11 |
Papineni K., Rouskos S, Ward T, Zhu WJ. BLEU: a method for automatic evaluation of machine translation. ACL, 2002.
|
12 |
Bastien, F. et al. Theano: new features and speed improvements. Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop. 2012.
|
13 |
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K. and Kuksa, P. Natural Language Processing (Almost) from Scratch, Journal of Machine Learning Research (JMLR), 2011.
|