DOI QR코드

DOI QR Code

Collaborative Similarity Metric Learning for Semantic Image Annotation and Retrieval

  • Wang, Bin (Department of Automation, Shanghai Jiao Tong University) ;
  • Liu, Yuncai (Department of Automation, Shanghai Jiao Tong University)
  • Received : 2012.11.19
  • Accepted : 2013.04.30
  • Published : 2013.05.30

Abstract

Automatic image annotation has become an increasingly important research topic owing to its key role in image retrieval. Simultaneously, it is highly challenging when facing to large-scale dataset with large variance. Practical approaches generally rely on similarity measures defined over images and multi-label prediction methods. More specifically, those approaches usually 1) leverage similarity measures predefined or learned by optimizing for ranking or annotation, which might be not adaptive enough to datasets; and 2) predict labels separately without taking the correlation of labels into account. In this paper, we propose a method for image annotation through collaborative similarity metric learning from dataset and modeling the label correlation of the dataset. The similarity metric is learned by simultaneously optimizing the 1) image ranking using structural SVM (SSVM), and 2) image annotation using correlated label propagation, with respect to the similarity metric. The learned similarity metric, fully exploiting the available information of datasets, would improve the two collaborative components, ranking and annotation, and sequentially the retrieval system itself. We evaluated the proposed method on Corel5k, Corel30k and EspGame databases. The results for annotation and retrieval show the competitive performance of the proposed method.

Keywords

References

  1. M. Jaber and E. Saber, "Probabilistic approach for extracting regions of interest in digital images," Journal of Electronic Imaging, vol. 19, 2010.
  2. S. Zhang, J. Huang, H. Li and D. Metaxas, "Automatic image annotation and retrieval using group sparsity," IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 99, pp.1-12, 2012.
  3. D. Putthividhy, H. Attias and S. Nagarajan, "Topic regression multi-modal latent dirichlet allocation for image annotation," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3408-3415, 2010.
  4. O. Yakhnenko and V. Honavar, "Annotating images and image objects using a hierarchical dirichlet process model," in Proc. of the 9th International Workshop on Multimedia Data Mining: held in conjunction with the ACM SIGKDD, pp. 1-7, 2008.
  5. D. Grangier and S. Bengio, "A discriminative kernel-based model to rank images from text queries," IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 30, no. 8, pp. 1371-1384, 2008. https://doi.org/10.1109/TPAMI.2007.70791
  6. C. Cusano, G. Ciocca and R. Schettini, "Image annotation using svm," in Proc. of Internet imaging IV, vol. SPIE 5304, 2004.
  7. S. Hoi,W. Liu and S. Chang, "Semi-supervised distance metric learning for collaborative image retrieval," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-7, 2008.
  8. H. Cheng, Z. Liu and J. Yang, "Sparsity induced similarity measure for label propagation," in Proc. of IEEE International Conference on Computer Vision, pp. 317-324, 2009.
  9. T. Mei, Y. Wang, X. Hua, S. Gong and S. Li, "Coherent image annotation by learning semantic distance," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-8, 2008.
  10. C. Yang, M. Dong and J. Hua, "Region-based image annotation using asymmetrical support vector machine-based multiple-instance learning," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2057-2063, 2006.
  11. T. Joachims, T. Finley and C. Yu, "Cutting plane training of structural svms," Machine Learning, vol. 77, no. 1, pp. 27-59, 2009. https://doi.org/10.1007/s10994-009-5108-8
  12. F. Monay and D. Gatica-Perez, "Plsa-based image auto-annotation: constraining the latent space," in Proc. of ACM International Conference on Multimedia, ACM, pp. 348-351, 2004.
  13. G. Carneiro, A. Chan, P. Moreno and N. Vasconcelos, "Supervised learning of semantic classes for image annotation and retrieval," IEEE Transaction on Pattern Analysis and Machine Intelligenc, vol. 29, no. 3, pp. 394-410, 2007. https://doi.org/10.1109/TPAMI.2007.61
  14. J. Jeon, V. Lavrenko and R. Manmatha, "Automatic image annotation and retrieval using cross-media relevance models," in Proc. of ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 119-126, 2003.
  15. S. Feng, R.Manmatha and V. Lavrenko, "Multiple bernoulli relevance models for image and video annotation," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 2004.
  16. V. Lavrenko, R. Manmatha and J. Jeon, "A model for learning the semantics of pictures," Advances in Neural Information Processing Systems, 2003.
  17. N.Loe and A. Farhadi, "Scene discovery by matrix factorization," in Proc. of European Conference on Computer Vision, pp. 451-464, 2008.
  18. H. Zhang, A. Berg, M. Maire and J. Malik, "Svm-knn: Discriminative nearest neighbor classification for visual category recognition," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2126-2136, 2006.
  19. J. Liu, M. Li, Q. Liu, H. Lu and S. Ma, "Image annotation via graph learning," Pattern recognition, vol. 42, no. 2, pp. 218-228, 2009. https://doi.org/10.1016/j.patcog.2008.04.012
  20. J. Pan, H. Yang, C. Faloutsos and P. Duygulu, "Automatic multimedia cross-modal correlation discovery," in Proc. of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 653-658, 2004.
  21. M. Guillaumin, T. Mensink, J. Verbeek and C. Schmid, "Tagprop: discriminative metric learning in nearest neighbor models for image auto-annotation," in Proc. of IEEE International Conference on Computer Vision, pp. 309-316,2009.
  22. A. Makadia, V. Pavlovic and S. Kumar, "A new baseline for image annotation," in Proc. of European Conference on Computer Vision, 2008.
  23. B. McFee and G. Lanckriet, "Metric learning to rank," in Proc. of International Conference on Machine Learning, 2010.
  24. T. Joachims, "A support vector method for multivariate performance measures," in Proc. of International Conference on Machine learning, pp. 377-384, 2005.
  25. F. Kang, R. Jin and R. Sukthankar, "Correlated label propagation with application to multi-label learning," in Proc. of IEEE Conference Computer Vision and Pattern Recognition, pp. 1719-1726, 2006.
  26. P. Duygulu and K. Barnard, "Object recognition as machine translation: learning a lexicon for a fixed image vocabulary, in Proc. of European Conference on Computer Vision," pp. 97-112, 2002.
  27. B.Wang, Y. Shen and Y. Liu, "Integrating distance metric learning into label propagation model for multi-label image annotation," in Proc. of IEEE Conference Image Processing, 2011.
  28. J. Tang, H. Li, G. Qi and T. Chua, "Image annotation by graph-based inference with integrated multiple/single instance representation," IEEE Transactions on Multimedia, vol. 12, no. 2, pp. 131-141, 2010. https://doi.org/10.1109/TMM.2009.2037373
  29. K. Van De Sande, T. Gevers and C. Snoek, "Evaluating color descriptors for object and scene recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, 32 (9) (2010) 1582-1596. https://doi.org/10.1109/TPAMI.2009.154
  30. C. Wang, S. Yan, L. Zhang and H. Zhang, "Multi-label Sparse Coding for Automatic Image Annotation," in Proc. of IEEE Conference Computer Vision and Pattern Recognition, pp. 1643-1650, 2009.
  31. Z. Li, Z. Shi, X. Liu and Z. Shi, "Modeling continuous visual features for semantic image annotation and retrieval, " Pattern Recognition Letters, vol. 32, no. 3, pp. 516-523, 2010.
  32. D. Metzler and R. Manmatha, "An inference network approach to image retrieval," Image and Video Retrieval, vol. 3115, pp 42-50, 2004. https://doi.org/10.1007/978-3-540-27814-6_9
  33. Z. Li, Z. Shi, X. Liu, Z. Li and Z. Shi, "Fusing semantic aspects for image annotation and retrieval, " Journal of Visual Communication and Image Representation, vol. 21, no.8, pp. 798-805, 2010. https://doi.org/10.1016/j.jvcir.2010.06.004
  34. M. Fukui, N. Kato and W. Qi, "Multi-class labeling improved by random forest for automatic image annotation," in Proc. of IAPR Conference on Machine Vision Applications, pp. 202-205, 2011.
  35. S. Zhang, J. Huang, Y. Huang, Y. Yu, H. L and D. Metaxas, "Automatic image annotation using group sparsity," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3312-3319, 2011.

Cited by

  1. An Extended Generative Feature Learning Algorithm for Image Recognition vol.11, pp.8, 2013, https://doi.org/10.3837/tiis.2017.08.013
  2. Adaptive Attention Annotation Model: Optimizing the Prediction Path through Dependency Fusion vol.13, pp.9, 2019, https://doi.org/10.3837/tiis.2019.09.019