[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3837/tiis.2019.09.019

Adaptive Attention Annotation Model: Optimizing the Prediction Path through Dependency Fusion

Wang, Fangxin (Institute of Automation, Chinese Academy of Sciences)
Liu, Jie (Institute of Automation, Chinese Academy of Sciences)
Zhang, Shuwu (Institute of Automation, Chinese Academy of Sciences)
Zhang, Guixuan (Institute of Automation, Chinese Academy of Sciences)
Zheng, Yang (Institute of Automation, Chinese Academy of Sciences)
Li, Xiaoqian (Institute of Automation, Chinese Academy of Sciences)
Liang, Wei (Institute of Automation, Chinese Academy of Sciences)
Li, Yuejun (Institute of Automation, Chinese Academy of Sciences)

Publication Information

KSII Transactions on Internet and Information Systems (TIIS) / v.13, no.9, 2019 , pp. 4665-4683 More about this Journal

Abstract

Previous methods build image annotation model by leveraging three basic dependencies: relations between image and label (image/label), between images (image/image) and between labels (label/label). Even though plenty of researches show that multiple dependencies can work jointly to improve annotation performance, different dependencies actually do not "work jointly" in their diagram, whose performance is largely depending on the result predicted by image/label section. To address this problem, we propose the adaptive attention annotation model (AAAM) to associate these dependencies with the prediction path, which is composed of a series of labels (tags) in the order they are detected. In particular, we optimize the prediction path by detecting the relevant labels from the easy-to-detect to the hard-to-detect, which are found using Binary Cross-Entropy (BCE) and Triplet Margin (TM) losses, respectively. Besides, in order to capture the inforamtion of each label, instead of explicitly extracting regional featutres, we propose the self-attention machanism to implicitly enhance the relevant region and restrain those irrelevant. To validate the effective of the model, we conduct experiments on three well-known public datasets, COCO 2014, IAPR TC-12 and NUSWIDE, and achieve better performance than the state-of-the-art methods.

Keywords

image annotation; multiple dependencies; self-attention; prediction path; Triplet Margin loss;

Citations & Related Records

Times Cited By KSCI : 2 (Citation Analysis)

Reference
Cited By KSCI

1	Yunchao Gong, Yangqing Jia, Thomas Leung, et al., "Deep convolutional ranking for multilabel image annotation," arXiv: 1312.4894 [cs], 2014.
2	Matthieu Guillaumin, Thomas Mensink, Jakob Verbeek, et al., "Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation," in Proc. of IEEE International Conference on Computer Vision, pp. 309-316, September 29 - October 2, 2009.
3	Jiajun Wu, Yinan Yu, Chang Huang, "Deep multiple instance learning for image classification and auto-annotation," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3460-3469, December 13-16, 2015.
4	Jiren Jin and Hideki Nakayama, "Annotation order matters: Recurrent image annotator for arbitrary length image tagging," in Proc. of the IEEE International Conference on Pattern Recognition, pp. 2452-2457, December 4-8, 2016.
5	Jiang Wang, Yi Yang, Junhua Mao, et al., "CNN-RNN: A unified framework for multi-label image classification," in Proc. the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2285-2294, June 27-30, 2016.
6	Venkatesh N Murthy, Subhransu Maji, and R Manmatha, "Automatic image annotation using deep learning representations," in Proc. of the 5th ACM on International Conference on Multimedia Retrieval, pp. 603-606, June 23-26, 2015.
7	Zhe Wang, Limin Wang, Yali Wang, et al., "Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition," IEEE Transactions on Image Processing, vol. 26, no. 4, pp. 2028-2041, 2017. DOI
8	Nicolas Usunier, David Buffoni, and Patrick Gallinari, "Rankin ghted pairwise classification," in Proc. of the 26th International Conference on Machine Learning, pp. 1057-1064, June 14-18, 2009.
9	Itti, Laurent, Christof Koch, and Ernst Niebur, "A model of saliency-based visual attention for rapid scene analysis," IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 20, no. 11, pp. 1254-1259, 1998. DOI
10	Desimone, Robert, and John Duncan, "Neural mechanisms of selective visual attention," Annual Review of Neuroscience, vol. 18, no. 1, pp. 193-222, 1995. DOI
11	Raffel, Colin, and Daniel P.W. Ellis, "Feed-forward networks with attention can solve some long-term memory problems," arXiv: 1512.08756 [cs], 2015.
12	Vaswani, A., Shazeer, N., Parmar, N, et al., "Attention Is All You Need," arXiv: 1706.03762 [cs], 2017.
13	Jing Liu, Tongwei Ren, Yuantian Wang, et al., "Object proposal on RGB-D images via elastic edge boxes," Neurocomputing, vol. 236, pp. 134-146, 2017. DOI
14	Tiberio Uricchio, Lamberto Ballan, Lorenzo Seidenari, et al., "Automatic Image Annotation via Label Transfer in the Semantic Space," Pattern Recognition, vol. 71, pp. 144-157, 2017. DOI
15	Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas, "Random k-Labelsets for Multi-Label Classification," IEEE Transactions on Knowledge and Data Enginerring, vol. 23, pp. 1079 - 1089, 2011.
16	Yunchao Wei, Wei Xia, Junshi Huang, "CNN: Single-label to multi-label," arXiv: 1406.5726 [cs], 2014.
17	Kelvin Xu, Jimmy Lei Ba, Ryan Kiros, et al., "Show, attend and tell: neural image caption generation with visual attention," in Proc. of the 32nd International Conference on Machine Learning, pp. 2048-2057, July 06-11, 2015.
18	Ning Sun, Feng Jiang, Hengchao Yan, et al., "Proposal generation method for object detection in infrared image", Infrared Physics & Technology, vol. 81, pp. 117-127, 2017. DOI
19	J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, et al., "Selective Search for Object Recognition," International Journal of Computer Vision, vol. 104, no. 2, pp. 154-171, 2013. DOI
20	Karen Simonyan, Andrew Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," arXiv: 1409.1556 [cs], 2015.
21	J. Deng, W. Dong, R. Socher, et al., "ImageNet: A large-scale hierarchical image database," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 248-255, June 20-25, 2009.
22	Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, et al., "Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization," in Proc. of the IEEE International Conference on Computer Vision, pp. 618-626, October 22-29, 2017.
23	Feng Zhu, Hongsheng Li, Wanli Ouyang, Nenghai Yu and Xiaogang Wang, "Learning Spatial Regularization with Image-Level Supervisions for Multi-label Image Classification," in Proc. the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2027-2036, July 21-26, 2017.
24	Baoyuan Wu , Weidong Chen, Peng Sun, Wei Liu, Bernard Ghanem, and Siwei Lyu, "Tagging like Humans: Diverse and Distinct Image Annotation," in Proc. the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7967-7975, June 18-22, 2018.
25	S. Hamid Rezatofighi, Vijay Kumar B G, Anton Milan, Ehsan Abbasnejad, Anthony Dick and Ian Reid, "DeepSetNet: Predicting Sets with Deep Neural Networks," in Proc. of the IEEE International Conference on Computer Vision, pp. 5257-5266, October 22-29, 2017.
26	Feng Liu, Tao Xiang, Timothy M. Hospedales, Wankou Yang and Changyin Sun, "Semantic Regularisation for Recurrent Image Annotation," in Proc. the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4160-4168, July 21-26, 2017.
27	Wang M, Xia X, Le J, et al., "Effective automatic image annotation via integrated discriminative and generative models," Information Sciences, vol. 262, pp. 159-171, 2014. DOI
28	YongHeng Chen, Fuquan Zhang and WanLi Zuo, "Deep Image Annotation and Classification by Fusing Multi-Modal Semantic Topics," KSII Transactions on Internet and Information Systems, vol. 12, no. 1, pp. 392-412, 2018. DOI
29	Minxian Li, Jinhui Tang and Chunxia Zhao, "Active Learning on Sparse Graph for Image Annotation," KSII Transactions on Internet and Information Systems, vol. 6, no. 10, pp. 2650-2662, 2012.
30	Bin Wang and Yuncai Liu, "Collaborative Similarity Metric Learning for Semantic Image Annotation and Retrieval," KSII Transactions on Internet and Information Systems, vol. 7, no. 5, pp. 1252-1271, 2013. DOI
31	Yonghao He, Jian Wang, Cuicui Kang, et al., "Large scale image annotation via deep representation learning and tag embedding learning," in Proc. of the 5th ACM on International Conference on Multimedia Retrieval, pp. 523-526, June 23-26, 2015.
32	Venkatesh N Murthy, Subhransu Maji, and R Manmatha, "Automatic image annotation using deep learning representations," in Proc. of the 5th ACM on International Conference on Multimedia Retrieval, pp. 603-606, June 23-26, 2015.
33	Changhu Wang, Shuicheng Yan, Lei Zhang, et al., "Multi-label sparse coding for automatic image annotation," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1643-1650, June 20-25, 2009.
34	S. Hamid Amiri, Mansour Jamzad, "Automatic image annotation using semi-supervised generative modeling," Pattern Recognition, vol. 48, no. 1, pp. 174-188, 2015. DOI
35	Michael Grubinger, Paul Clough, Henning Muller, et al., "The IAPR TC-12 benchmark: A new evaluation resource for visual information systems," in Proc. of International Workshop OntoImage, pp. 13-23, May 22-23, 2006.
36	Jiang Wang, Yi Yang, Junhua Mao, et al., "CNN-RNN: A unified framework for multi-label image classification," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2285-2294, June 27-30, 2016.
37	Shiliang Zhang, Qi Tian, Guang Hua et al., "ObjectPatchNet: Towards scalable and semantic image annotation and retrieval," Computer Vision and Image Understanding, vol. 118, pp. 16-29, 2014. DOI
38	Tsung-Yi Lin, Michael Maire, Serge Belongie, et al., "Microsoft coco: Common objects in context," in Proc. of European Conference on Computer Vision, pp. 740-755, September 6-12, 2014.
39	Tat-Seng Chua, Jinhui Tang, Richang Hong, et al., "NUS-WIDE: A Real-World Web Image Database from National University of Singapore," in Proc. of ACM International Conference on Image and Video Retrieval, pp. 48, July 8-10, 2009.
40	Scott Deerwester, "Improving information retrieval with latent semantic indexing," Information Sciences, vol. 100, no. 1-4, pp. 105-137, 1988. DOI
41	Thomas Hofmann, "Unsupervised learning by probabilistic latent semantic analysis," Machine Learning, vol. 42, no. 1-2, pp. 177-196, 2001. DOI
42	David M Blei, Andrew Y Ng, and Michael I Jordan, "Latent dirichlet allocation," Journal of Machine Learning Research, vol. 3, pp. 993-1022, 2003.
43	Yin Zheng, Yu-Jin Zhang, and Hugo Larochelle, "Topic modeling of multimodal data: an autoregressive approach," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1370-1377, June 23-28, 2014.
44	Sunho Park and Seungjin Choi, "Max-margin embedding for multi-label learning," Pattern Recognition Letter, vol. 34, no. 3, pp. 292-298, 2013. DOI