Browse > Article
http://dx.doi.org/10.3837/tiis.2019.03.016

Semi-supervised Cross-media Feature Learning via Efficient L2,q Norm  

Zong, Zhikai (Department of Computer Science and Technology, Shandong University)
Han, Aili (Department of Computer Science and Technology, Shandong University)
Gong, Qing (Department of Computer Science and Technology, Shandong University)
Publication Information
KSII Transactions on Internet and Information Systems (TIIS) / v.13, no.3, 2019 , pp. 1403-1417 More about this Journal
Abstract
With the rapid growth of multimedia data, research on cross-media feature learning has significance in many applications, such as multimedia search and recommendation. Existing methods are sensitive to noise and edge information in multimedia data. In this paper, we propose a semi-supervised method for cross-media feature learning by means of $L_{2,q}$ norm to improve the performance of cross-media retrieval, which is more robust and efficient than the previous ones. In our method, noise and edge information have less effect on the results of cross-media retrieval and the dynamic patch information of multimedia data is employed to increase the accuracy of cross-media retrieval. Our method can reduce the interference of noise and edge information and achieve fast convergence. Extensive experiments on the XMedia dataset illustrate that our method has better performance than the state-of-the-art methods.
Keywords
Cross-media retrieval; semi-supervised regularization; $L_{2,q}$ norm; sparse regularization;
Citations & Related Records
연도 인용수 순위
  • Reference
1 S. Battiato et al., "Bags of phrases with codebooks alignment for near duplicate image detection," in Proc. of 2nd ACM Workshop Multimedia Forensics, Secur., Intell., pp. 65-70, 2010.
2 Y. X. Peng, X. H, and J. W. Qi, "Cross-Media Shared Representation by Hierarchical learning with Multiple Depp Networks," in Proc. of the Tweety-Fifth International Joint Conference on Artificial Intelligence, pp. 3846-3853. 2016.
3 M. Lew, N. Sebe, C. Djeraba, and R. Jain, "Content-based multimedia information retrieval: State-of-the-art and challenges," ACM Trans. Multimedia Comput. Commun., Applicat., vol. 2, no. 1, pp. 1-19, Feb. 2006.   DOI
4 D. Li, N. Dimitrova, M. Li, and I. K. Sethi, "Multimedia content processing through cross-modal association," in Proc. of 11th ACM Int. Conf. Multimedia (ACM-MM), pp. 604-611, 2003.
5 S. Battiato, G. M. Farinella, G. Giuffrida, C. Sismeiro, and G. Tribulato, "Using visual and text features for direct marketing on multimedia messaging service," Multimedia Tool &Application, Vol. 42, no. 1, pp. 5-30, Mar. 2009.   DOI
6 Y. Liu, W.-L. Zhao, C.-W. Ngo, C.-S. Xu, and H.-Q. Lu, "Coherent bagof audio words model for efficient large-scale video copy detection," in Proc. of ACM Int. Conf. Image Video Retr., pp. 89-96, 2010.
7 A. Znaidia, A. Shabou, H. Le Borgne, C. Hudelot, and N. Paragios, "Bag-of-multimedia-words for image classification," in Proc. of 21st Int. Conf. Pattern Recognit. (ICPR), pp. 1509-1512, Nov. 2012.
8 Lei Huang and Yuxin Peng, "Cross-media retrieval by exploiting fine-grained correlation at entity level," Neurocomputing, Vol. 236, pp. 123-133, May, 2017.   DOI
9 Yuxin Yuan and Yuxin Peng, "Recursive pyramid network with joint attention for cross-media retrieval," in Proc. of 24th International Conference on Multimedia Modeling (MMM), pp. 405-416, Bangkok, Thailand, Feb. 5-7, 2018.
10 D.-Y. Chen, X.-P. Tian, Y.-T. Shen, and M. Ouhyoung, "On visual similarity based 3D model retrieval," Comput. Graph. Forum, vol. 22,no. 3, pp. 223-232, 2003.   DOI
11 A. Moffat and J. Zobel, "Self-indexing inverted files for fast text retrieval," ACM Trans. Inf. Syst., vol. 14, no. 4, pp. 349-379, 1996.   DOI
12 H. Greenspan, J. Goldberger, and A. Mayer, "Probabilistic space-time video modeling via piecewise GMM," IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 3, pp. 384-396, Mar. 2004.   DOI
13 J. Jeon, V. Lavrenko, and R. Manmatha, "Automatic image annotation and retrieval using cross-media relevance models," in Proc. of 26th Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retr, pp. 119-126, 2003.
14 Pan, J., Yang, H., Faloutsos, C., Duygulu, P, "Automatic multimedia cross-modal correlation discovery," in Proc. of ACM International Conference on Knowledge Discovery and Data Mining, SIGKDD, 2004.
15 J. Yu and Q. Tian, "Semantic subspace projection and its applications in image retrieval," IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 4, pp. 544-548, Apr. 2008.   DOI
16 R. Typke, F. Wiring, and R. C. Veltkamp, "A survey of music information retrieval systems," in Proc. of ISMIR, pp. 153-160, 2005.
17 Y. G. Jiang. C. W. Ngo and J. Yang, "Towards optimal bag-of-features for object categorization and semantic video retrieval," in Proc. of the 6th ACM international conference on Image and video retrieval, pp. 494-501, Jul. 2007.
18 M. W. Jian, K. M. Lam and J. Y. Dong, "Facial-Feature Detection and Localization Based on a Hierarchical Scheme," Information Sciences, vol. 262, pp. 1-14, 2014.   DOI
19 M. W. Jian, Y. L. Yin, J. Y. Dong and K. M. Lam, "Content-based image retrieval via a hierarchical-local-feature extraction scheme," Multimedia Tools and Applications, 53 (1), May 2018.
20 M. W. Jian, K. M. Lam, J. Y. Dong and L. L. Shen, "Visual-patch-attention-aware Saliency Detection," IEEE Transactions on Cybernetics, Vol. 45, No. 8, pp. 1575-1586, 2015.   DOI
21 M. W. Jian, Q. Qi, J. Y. Dong, Y. L. Yin and K. M. Lam, "Integrating QDWD with Pattern Distinctness and Local Contrast for Underwater Saliency Detection," Journal of Visual Communication and Image Representation, vol. 53, pp. 31-41, 2018.   DOI
22 M. W. Jian, Q. Qi, J. Y. Dong, X. Sun, Y. J. Sun and K. M. Lam, "Saliency Detection Using Quaternionic Distance Based Weber Local Descriptor and Level Priors," Multimedia Tools and Applications, 77 (11), pp. 14343-14360, 2018.   DOI
23 T. Li, Z. J. Meng, B. B. Ni, J. B. Shen and M. Wang, "Robust geometric zp-norm feature pooling for image classification and action recognition," Image Vision Comput., pp. 64-76. 2016.
24 X. Zhai, Y. Peng, and J. Xiao, "Cross-Modality Correlation Propagation for Cross-Media Retrieval," in Proc. of Int. Conf. Comput. Vision, pp. 2407-2414, Nov, 2011.
25 S. Moran and V. Lavrenko, "A sparse kernel relevance model for automatic image annotation," International Journal of Multimedia Information Retrieval, Vol. 3, no. 4, pp. 209-229, Nov 2014.   DOI
26 D. Grangier and S. Bengio, "A discriminative kernel-based model to rank images from text queries," IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 8, pp. 1371-1384, Aug. 2008.   DOI
27 Y. T. Zhuang, Y. F. Wang, F. Wu, Y. Zhang, and W. M. Lu, "Supervised coupled dictionary learning with group structures for multi-modal retrieval," in Proc. of 27th AAAI Conf. Artif. Intell., pp. 1070-1076, 2013.
28 N. Rasiwasia, J. Pereira, E. Coviello, G. Doyle, G. Lanckriet, R. Levy, et al., "A new approach to cross-modal multimedia retrieval," in Proc. of ACM Int. Conf. Multimedia, pp. 251-260, 2010.
29 X. Zhai, Y. Peng, and J. Xiao, "Heterogeneous metric learning with joint graph regularization for cross-media retrieval," in Proc. of 27th AAAI Conf. Artif. Intell., pp. 1198-1204, 2013.
30 X. Zhai, Y. Peng, and J. Xiao, "Learning cross-media joint representation with sparse and semi-supervised regularization," IEEE Trans. Circuits Syst. Video Technol., vol. 24, no. 6, pp. 965-978, Jun. 2014.   DOI
31 Y. Hu, X. Cheng, L.-T. Chia, X. Xie, D. Rajan, and A.-H. Tan, "Coherent phrase model for efficient image near-duplicate retrieval," IEEE Trans. Multimedia, vol. 11, no. 8, pp. 1434-1445, Dec. 2009.   DOI
32 Y. X. Peng, X. H. Zhai, Y. Z. Zhao, and X. Huang, "Semi-Supervised Cross-Media Feature Learning With Unified Patch Graph Regularization," IEEE transactions on circuits and systems for video technology, Vol. 26, no. 3, Mar 2016.