DOI QR코드

DOI QR Code

Deep Image Annotation and Classification by Fusing Multi-Modal Semantic Topics

  • Chen, YongHeng (College of Computer Science, Minnan Normal University) ;
  • Zhang, Fuquan (School of Software,Beijing Institute of Technology) ;
  • Zuo, WanLi (College of Computer Science and Technology, Jilin University)
  • Received : 2017.05.17
  • Accepted : 2017.09.11
  • Published : 2018.01.31

Abstract

Due to the semantic gap problem across different modalities, automatically retrieval from multimedia information still faces a main challenge. It is desirable to provide an effective joint model to bridge the gap and organize the relationships between them. In this work, we develop a deep image annotation and classification by fusing multi-modal semantic topics (DAC_mmst) model, which has the capacity for finding visual and non-visual topics by jointly modeling the image and loosely related text for deep image annotation while simultaneously learning and predicting the class label. More specifically, DAC_mmst depends on a non-parametric Bayesian model for estimating the best number of visual topics that can perfectly explain the image. To evaluate the effectiveness of our proposed algorithm, we collect a real-world dataset to conduct various experiments. The experimental results show our proposed DAC_mmst performs favorably in perplexity, image annotation and classification accuracy, comparing to several state-of-the-art methods.

Keywords

References

  1. J. Deng, W. Dong and et al., "ImageNet: A large-scale hierarchical image database," Computer Vision and Pattern Recognition, pp. 248-255, Jun., 2009.
  2. N. Rasiwasia, P. J. Moreno, and N. Vasconcelos, "Bridging the Gap: Query by Semantic Example," IEEE Transactions on Multimedia, vol. 9, no. 5, pp. 923-938, July, 2007. https://doi.org/10.1109/TMM.2007.900138
  3. Q. Liu and Z. Li, "Projective nonnegative matrix factorization for social image retrieval," Neurocomputing, vol. 172, pp. 19-26, Jan., 2016. https://doi.org/10.1016/j.neucom.2014.09.094
  4. B. Wu, S. L. yu, B.G. Hu and Q. Ji, "Multi-label learning with missing labels for image annotation and facial action unit recognition," Pattern Recognition, vol. 48, no. 7, pp. 2279-2289, July, 2015. https://doi.org/10.1016/j.patcog.2015.01.022
  5. X. Y. Jing and et al, "Multi-label Dictionary Learning for Image Annotation," IEEE Transactions on Image Processing, vol. 25, no. 6, pp.2712-2725, June, 2016. https://doi.org/10.1109/TIP.2016.2549459
  6. D. M. Blei and M. I. Jordan, "Modeling annotated data," in Proc. of International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 127-134, July, 2003.
  7. D. Putthividhy, H. T. Attias and S. S. Nagarajan, "Topic regression multi-modal Latent Dirichlet Allocation for image annotation," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, vol. 238, no. 6, pp. 3408-3415, June, 2010.
  8. Y. Q. Jia and M. Salzmann and T. Darrell, "Learning cross-modality similarity for multinomial data," in Proc. of IEEE International Conference on Computer Vision, vol. 32, no. 14, pp. 2407-2414, Nov., 2011.
  9. Y. Wang and G. Mori, "Human Action Recognition by Semilatent Topic Models," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 10, pp. 1762-1774, Oct. , 2009. https://doi.org/10.1109/TPAMI.2009.43
  10. S. Chonglin and et al., "Efficient Methods for Multi-label Classification," in Proc. of Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 164-175, Apr., 2015.
  11. S. H. Amiri and M. Jamzad, "Automatic image annotation using semi-supervised generative modeling," Pattern Recognition, vol. 48, no. 1, pp. 174-188, Jan, 2015. https://doi.org/10.1016/j.patcog.2014.07.012
  12. Y. Lin and et al, "Large-scale image classification: Fast feature extraction and SVM training," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1689-1696, June, 2011.
  13. B. F. Guo and et al., "Customizing Kernel Functions for SVM-Based Hyperspectral Image Classification," IEEE Transactions on Image Processing, vol. 17, no. 4, pp. 622-629, Apr., 2008. https://doi.org/10.1109/TIP.2008.918955
  14. A. Bosch, A. Zisserman and X. Munoz, "Image Classification using Random Forests and Ferns," in Proc. of International Conference on Computer Vision IEEE, pp. 1-8, Oct., 2007.
  15. B. Xu, Y. Ye and L. Nie, "An improved random forest classifier for image classification," in Proc. of IEEE International Conference on Information and Automation, pp. 795-800, Jun., 2012.
  16. T. N. Hong, C. Barat and C. Ducottet, "Approximate image matching using strings of bag-of-visual words representation," in Proc. of International Conference on Computer Vision Theory and Applications, vol. 2, pp. 345-353, Jan., 2014.
  17. F. F. Li and P. Perona, "A Bayesian Hierarchical Model for Learning Natural Scene Categories," in Proc. of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 524-531, June, 2005.
  18. L. J. Li and F. F. Li, "What, where and who? Classifying events by scene and object recognition," in Proc. of IEEE International Conference on Computer Vision, pp. 1-8, Oct., 2007.
  19. X. Liu and et al., "Boosting image classification with LDA-based feature combination for digital photograph management," Pattern Recognition, vol. 38, no.6, pp. 887-901, Jun., 2005. https://doi.org/10.1016/j.patcog.2004.11.008
  20. A. Bosch, A. Zisserman, and X. Munoz, "Scene classification via pLSA," in Proc. of European Conference on Computer Vision, vol. 3954, pp. 517-530, May, 2006.
  21. L. J. Li, R. Socher and F. F. Li, "Towards total scene understanding: classification, annotation and sementation in an automatic framework," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Jun., 2009.
  22. K. Xu and et al., "Unsupervised Satellite Image Classification Using Markov Field Topic Model," IEEE Geoscience and Remote Sensing Letters, vol. 10, no. 1, pp. 130-134, Jan., 2013. https://doi.org/10.1109/LGRS.2012.2194770
  23. N. Rasiwasia and N. Vasconcelos, "Latent Dirichlet allocation models for image classification," IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 35, no. 11, pp. 2665-2679, Nov., 2013. https://doi.org/10.1109/TPAMI.2013.69
  24. Y. Wang and G. Mori, "Max-margin latent dirichlet allocation for image classification and annotation," Lecture Notes in Computer Science, vol. 1674, no. 1, pp. 39-48, Sep., 2011.
  25. C. Wang, D. Blei and F. F. Li, "Simultaneous image classification and annotation," in Proc. of IEEE Conference on Computer Vision & Pattern Recognition, vol. 19, no. 2, pp. 1903-1910, Jun., 2009.
  26. W. Hua, H. Heng and C. Ding, "Image annotation using bi-relational graph of images and semantic labels," in Proc. of IEEE Conference on Computer Vision & Pattern Recognition, vol. 42, no. 7, pp. 793-800, Jun., 2011.
  27. X. Cai and et al. "Joint stage recognition and anatomical annotation of drosophila gene expression patterns," Bioinformatics, vol. 28, no. 12, pp. 16-24, Jun., 2012.
  28. J. Paisley and et al., "Nested Hierarchical Dirichlet Processes," IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 37, no. 2, pp. 256-270, Feb., 2015. https://doi.org/10.1109/TPAMI.2014.2318728
  29. D. M Blei, A. Y. Ng and M. I. Jordan, "Latent dirichlet allocation," Journal of Machine Learning Research, vol. 3, pp. 993-1022, May, 2003.
  30. J. Kandola, T. Graepel and J. Shawetaylor, "Reducing Kernel Matrix Diagonal Dominance Using Semi-definite Programming," Lecture Notes in Computer Science, vol. 2777, pp. 288-302, 2003.
  31. Z. Wei, X. Luo and F. Zhou, "Ontology Based Automatic Image Annotation Using Multi-class SVM," in Proc. of International Conference on Image and Graphics, pp. 434-438, Jul., 2013.
  32. G. Carneiro and N. Vasconcelos, "Formulating Semantic Image Annotation as a Supervised Learning Problem," in Proc. of IEEE Computer Society Conference on Computer Vision & Pattern Recognition, vol. 2, pp. 163-168, 2005.
  33. J. Li and J. Z. Wang, "Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach," IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 25, no. 9, pp. 1075-1088, Sep., 2003. https://doi.org/10.1109/TPAMI.2003.1227984
  34. D. R. Hardoon and et al., "A Correlation Approach for Automatic Image Annotation," in Proc. of Conference on Advanced Data Mining and Applications, pp. 681-692, Aug., 2006.
  35. X. Li, Q. Lv and W. Huang, "Learning Similarity with Probabilistic Latent Semantic Analysis for Image Retrieval," Ksii Transactions on Internet & Information Systems, vol. 9, no. 4, pp. 424-440, Apr., 2015.
  36. J. Zhu, "MedLDA: Max-Margin Supervised Topic Models," Journal of Machine Learning Research, vol. 13, no. 4, pp. 2237-2278, 2009.
  37. W. Fan, N. Bouguila, "Online Data Clustering Using Variational Learning of a Hierarchical Dirichlet Process Mixture of Dirichlet Distributions," in Proc. of International Conference on Database Systems for Advanced Applications, pp. 18-32, July, 2014.
  38. X. Liu, J. Zeng and et al., "Scalable Parallel EM Algorithms for Latent Dirichlet Allocation in Multi-Core Systems," in Proc. of International Conference on World Wide Web, pp. 669-679, May, 2015.
  39. D. M. Blei, A. Kucukelbir and J. D. Mcauliffe "Variational Inference: A Review for Statisticians," Journal of the American Statistical Association, vol. 112, no. 518, pp. 859-877, Feb., 2017. https://doi.org/10.1080/01621459.2017.1285773
  40. J. Taghia, Z. Ma and A. Leijon, "Bayesian Estimation of the von-Mises Fisher Mixture Model with Variational Inference," IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 36, no. 9, pp. 1701-1715, Sep., 2014. https://doi.org/10.1109/TPAMI.2014.2306426
  41. J. Huang, "Maximum Likelihood Estimation of Dirichlet Distribution Parameters," Distribution Cmu Technique Report, vol. 44, no. 5, pp. 1049-1050, 2005.
  42. N. Rasiwasia and et al., "A new approach to cross-modal multimedia retrieval," ACM International Conference on Multimedia, pp. 251-260, Oct., 2010.
  43. O. Yakhnenko, V. Honavar, "Multi-Instance Multi-Label Learning for Image Classification with Large Vocabularies," in Proc. of the British Machine Vision Conference, pp. 1-12, Sep., 2011.

Cited by

  1. Adaptive Attention Annotation Model: Optimizing the Prediction Path through Dependency Fusion vol.13, pp.9, 2019, https://doi.org/10.3837/tiis.2019.09.019