1 |
W. Qi et al., Image captioning and visual question answering based on attributes and external knowledge, IEEE Trans. Pattern Anal. Mach. Intell. 40 (2018), no. 6, 1367-1381.
DOI
|
2 |
Y. Youngjae et al., End-to-end concept word detection for video captioning, retrieval, and question answering in IEEE Conf. Comput. Vision Pattern Recogn., Honolulu, HI, USA, July 2017, pp. 3261-3269.
|
3 |
P. Anderson et al., Bottom-up and top-down attention for image captioning and VQA, arXiv preprint arXiv: 1707.07998, 2017.
|
4 |
L. Jiasen et al., Knowing when to look: Adaptive attention via a visual sentinel for image captioning, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Honolulu, HI, USA, July 2017, pp. 3242-3250.
|
5 |
T. Yao et al., Boosting image captioning with attributes, in IEEE Int. Conf. Comput. Vision, Venice, Italy, Oct. 2017, pp. 22-29.
|
6 |
C. Wang, H. Yang, and C. Meinel, Image captioning with deep bidirectional lstms and multi-task learning, ACM Trans. Multimedia Comput., Commun., Applicat., 14 (2018), no. 2s, 1-20.
|
7 |
C. Szegedy et al., Going deeper with convolutions, in Proc. IEEE Conf. Computer Vision Pattern Recogn., Boston, MA, USA, June 2015, pp. 1-9.
|
8 |
S. Reed et al., Learning deep representations of fine-grained visual descriptions, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Las Vegas, NV, USA, June 2016, pp. 49-58.
|
9 |
L. Zhang et al., Learning a deep embedding model for zero-shot learning, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Honolulu, HI, USA, July 2017, pp. 3010-3019.
|
10 |
X. He and Y. Peng, Fine-grained image classification via combining vision and language, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Honolulu, HI, USA, July 2017, pp. 7332-7340.
|
11 |
R. Kiros, R. Salakhutdinov, and R.S. Zemel, Unifying visual-semantic embeddings with multimodal neural language models, arXiv preprint arXiv: abs/1411.2539, 2014.
|
12 |
J. Mao et al., Learning like a child: Fast novel visual concept learning from sentence descriptions of images, in Proc. IEEE Int. Conf. Comput. Vision, Santiago, Chile, 2015, pp. 2533-2541.
|
13 |
R. Vedantam et al., Context-aware captions from context-agnostic supervision, in Proc. IEEE, Conf. Comput. Vision Pattern Recogn., Honolulu, HI, USA, July 2017, pp. 1070-1079.
|
14 |
A.H. Abdulnabi et al., Multi-task CNN model for attribute prediction, IEEE Trans. Multimedia 17 (2015), no. 11, 1949-1959.
DOI
|
15 |
T.-H. Chen et al., Show adapt and tell: Adversarial training of cross-domain image captioner, in IEEE, Int. Conf. Comput. Vision, Venice, Italy, Oct. 2017, pp. 521-530.
|
16 |
R.R. Selvaraju et al., Grad-CAM: Visual explanations from deep networks via gradient-based localization, in IEEE Int. Conf. Comput. Vision, Venice, Italy, Oct. 2017, pp. 618-626.
|
17 |
Y.-C. Yoon et al., Fine-grained mobile application clustering model using retrofitted document embedding, ETRI J. 39 (2017), no. 4, 443-454.
DOI
|
18 |
S. Kong and C. Fowlkes, Low-rank bilinear pooling for fine-grained classification, in IEEE Comput. Vision Pattern Recogn., Honolulu, HI, USA, July 2017, pp. 7025-7034.
|
19 |
X.-S. Wei et al., Selective convolutional descriptor aggregation for fine-grained image retrieval, IEEE Trans. Image Process. 26 (2017), no. 6, 2868-2881.
DOI
|
20 |
Y. Shaoyong et al., A model for fine-grained vehicle classification based on deep learning, Neurocomput. 257 (2017), 97-103.
DOI
|
21 |
G.-S. Xie et al., LG-CNN: from local parts to global discrimination for fine-grained recognition, Pattern Recogn. 71 (2017), 118-131.
DOI
|
22 |
S.H. Lee, HGO-CNN: Hybrid generic-organ convolutional neural network for multi-organ plant classification, in IEEE Int. Conf. Image Process., Beijing, China, Sept. 2017, pp. 4462-4466.
|
23 |
A. Li et al., Zero-shot fine-grained classification by deep feature learning with semantics, arXiv preprint arXiv: abs/1707.00785, 2017.
|
24 |
Z. Akata et al., Evaluation of output embeddings for fine-grained image classification, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Boston, MA, USA, June 2015, pp. 2927-2936.
|
25 |
R. Ranjan, V. M. Patel, and R. Chellappa, Hyperface: A deep multitask learning framework for face detection, landmark localization, pose estimation, and gender recognition, IEEE Trans. Pattern Anal. Mach. Intell. 41 (2018), 121-135.
DOI
|
26 |
K. Hashimoto et al., A joint many-task model: Growing a neural network for multiple NLP tasks, arXiv preprint arXiv: abs/1611.01587, 2016.
|
27 |
R. Caruana, Multitask learning: a knowledge-based source of inductive bias, in Proc. Int. Conf. Mach. Learn., Amherst, MA, USA, June 1993, pp. 41 - 48.
|
28 |
C. Wah et al., The Caltech-UCSD Birds-200-2011 Dataset, Tech. Report CNS-TR-2011-001, California Institute of Technology, 2011.
|
29 |
L. Duong et al., Low resource dependency parsing: Cross-lingual parameter sharing in a neural network parser, in Proc. Annu. Meeting Association Computat. Linguistics Int. Joint Conf. Natural Language Process., Beijing, China, July 2015, pp. 845-850.
|
30 |
M. Nilsback and A. Zisserman, Automated flower classification over a large number of classes, in Proc. Indian Conf. Comput. Vision, Graphics Image Process., Bhubaneswar, India, Dec. 2008, pp. 722-729.
|
31 |
K. Papineni et al., Bleu: A method for automatic evaluation of machine translation, in Proc. Annu. Meeting Association Computat. Linguistics, Philadelphia, PA, USA, July 2002, pp. 311-318.
|
32 |
C.-Y. Lin, Rouge: a package for automatic evaluation of summaries, in Workshop Text Summarization Branches Out, Post-Conf. Workshop ACL, Barcelona, Spain, July 2004, pp. 74-81.
|
33 |
S. Banerjee and A. Lavie, Meteor: an automatic metric for MT evaluation with improved correlation with human judgments, in Proc. ACL Workshop Intrinsic Extrinsic Evaluation Measures Mach. Translation Summarization, Ann Arbor, MI, USA, 2005, pp. 65-72.
|
34 |
R. Lawrence, C.L. Zitnick, and D. Parikh, Cider: Consensus-based image description evaluation, arXiv preprint arXiv: abs/1411.5726 (2014).
|
35 |
C. Szegedy, S. Ioffe, and V. Vanhoucke, Inception-v4, Inception-Resnet and the impact of residual connections on learning, in Proc. AAAI Conf. Artif. Intell., San Francisco, CA, USA, Feb. 2017, pp. 2478-4284.
|
36 |
L.A. Hendricks et al., Generating visual explanations, in Eur. Conf. Comput. Vision, Amsterdam, The Netherlands, Oct. 2016, pp. 3-19.
|
37 |
A. Paszke et al., Automatic differentiation in PyTorch, in Proc. NIPS, Long Beach, CA, USA, 2017.
|
38 |
J. Donahue et al., Long-term recurrent convolutional networks for visual recognition and description, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Boston, MA, USA, June 2015, pp. 2625-2634.
|
39 |
O. Vinyals et al., Show and tell: A neural image caption generator, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Boston, MA, USA, June 2015, pp. 3156-3164.
|
40 |
Y. Dong et al., Improving interpretability of deep neural networks with semantic information, arXiv preprint arXiv: 1703.04096 (2017), 3-19.
|
41 |
L.A. Hendricks et al., Deep compositional captioning: Describing novel object categories without paired training data, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Las Vegas, NV, USA, June 2016, pp. 1-10.
|
42 |
Q. You et al., Image captioning with semantic attention, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Las Vegas, NV, USA, June 2016, pp. 4651-4659.
|
43 |
S.J. Rennie et al., Self-critical sequence training for image captioning, in IEEE Conf. Comput. Vision Pattern Recogn., Honolulu, HI, USA, July 2017, pp. 1179-1195.
|