Correlation-based Automatic Image Captioning

상호 관계 기반 자동 이미지 주석 생성


Abstract

This paper presents correlation-based automatic image captioning. Given a training set of annotated images, we want to discover correlations between visual features and textual features, so that we can automatically generate descriptive textual features for a new unseen image. We develop models with multiple design alternatives such as 1) adaptively clustering visual features, 2) weighting visual features and textual features, and 3) reducing dimensionality for noise sup-Pression. We experiment thoroughly on 10 data sets of various content styles from the Corel image database, about 680MB. The major contributions of this work are: (a) we show that careful weighting visual and textual features, as well as clustering visual features adaptively leads to consistent performance improvements, and (b) our proposed methods achieve a relative improvement of up to 45% on annotation accuracy over the state-of-the-art, EM approach.

본 논문에서는 상호 관계에 기반한 자동 이미지 주석 생성 방법을 보인다 새로운 실험 이미지를 위한 자동 주석의 생성은 훈련 데이타 내의 주석과 함께 주어진 이미지들을 이용하여 이미지의 시각적 속성과 텍스트 속성의 상호 관계를 발견해 냄으로 수행된다. 본 논문에서 제시하는 상호 관계 기반 자동주석 생성 모델은 1) 시각적 속성의 적절한 군집화, 2) 시각적 속성과 텍스트 속성의 가중치 부여, 3) 노이즈 제거를 위한 차원 축소 등의 요소를 고려하여 설계된다. 실험은 680 MB의 Corel 이미지 데이터를 이용하여 각 10개의 데이타 집합에 대해 수행되었으며, 실험 결과, 시각적 속성과 텍스트 속성에 대한 가중치 부여와 시각적 속성의 적절한 군집화가 모델의 성능을 향상시키며, 본 논문에서 제시한 상호 관계기반 모델이 기존의 EM을 이용한 자동 주석 생성 모델에 비해 45%의 상대적 성능 향상을 보인다.

Keywords

References

  1. Benitez, A. B. and Chang, S.-F., 'Image Classification Using Multimedia Knowledge Networks,' Proceeding of the International Conference on Image Processing (ICIP-2003), 2003 https://doi.org/10.1109/ICIP.2003.1247319
  2. Jaimes, A., Tseng, B., and Smith, J., 'Modal Keywords, Ontologies, and Reasoning for Video Understanding,' CIVR 2003, pp.248-259, 2003
  3. Na, Y, 'Image Content Modeling for Meaningbased Retrieval,' Journal of Korean Information Science Society, Vol. 30, No.2, pp. 145-156, 2003
  4. Cho, M., Choi, J., Shin, J., and Kim, P., 'Concept-based image retrieval using similarity measurement between concepts,' Proc. of Korean Information Science Society Conference, No. 2483, pp, 253-255, 2003
  5. Blei, D.M. and Jordan, M. I., 'Modeling Annotated Data', 26th Annual International ACM SIGIR Conference', 2003
  6. Barnard, K. and Forsyth, D. A., 'Learning the semantics of words and pictures', Int. Conf. on Computer Vision',pp. 408-15, 2001
  7. Jeon, J., Lavrenko, V. and Manmatha, R, 'Automatic Image Annotation and Retrieval using Cross-Media Relevance Models,' 26th Annual International ACM SIGIR Conference, 2003 https://doi.org/10.1145/860435.860459
  8. Lee, J., and Oh, H., 'Design of Indexing Agent for Semantic-based Video Retrieval,' Journal of Korean Information Processing Society, Vol. 10, No.6, pp.687-694, 2003 https://doi.org/10.3745/KIPSTB.2003.10B.6.687
  9. Barnard, K, Duygulu, P., Guru, R, Gabbur, P. and Forsyth, D. A, 'The effects of segmentation and feature choice in a translation model of object recognition,' IEEE Conf, on Computer Vision and Pattern Recognition, 2003 https://doi.org/10.1109/CVPR.2003.1211532
  10. Duygulu, P., Barnard, K., Freitas, J. F. G. de and Forsyth, D. A., 'Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary,' The Proceedings of the Seventh European Conference on Computer Vision, pp. IV:97-112, 2002
  11. Li, J. and Wang, J. Z., 'Automatic linguistic indexing of pictures by a statistical modeling approach,' IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 25, No. 10, 2003
  12. Mori, Y. and Takahashi, H. and Oka, R. 'Imageto-word transformation based on dividing and vector quantizing images with words,' First International Workshop on Multimedia Intelligent Storage and Retrieval Management, 1999
  13. Maron, O. and Ratan, A. L., 'Multiple-Instance Learning for Natural Scene Classification,' The Fifteenth International Conference on Machine Learning, 1998
  14. Wenyin, L., Dumais, S., Sun, Y., Zhang, H., Czerwinski, M. and Field, B., 'Semi-Automatic Image Annotation,' INTERACT2001, 8th IFIP TC.13 Conference on Human-Computer Interaction, 2001
  15. Brown, P. F., Pietra, S. A., Della, P. and Mercer, R. L., 'The mathematics of statistical machine translation: Parameter estimation,' Computational Linguistics, Vol. 19, No.2, pp. 263-311, 1993
  16. Barnard, K., Duygulu, P. and Forsyth, D. A., 'Clustering art,' IEEE Conf. on Computer Vision and Pattern Recognition, pp. 434-441, 2001
  17. Hofmann, T., 'Unsupervised Learning by Probabilistic Latent Semantic Analysis,' Machine Learning Journal, Vol. = 42, No.1, pp. 177-196, 2001 https://doi.org/10.1023/A:1007617005950
  18. Lavrenko, V., Manmatha, R. and Jeon, J. 'A Model for Learning the Semantics of Pictures,' NIPS, 2003
  19. Carbonetto, P., Freitas, N. de and Barnard, K., 'A Statistical Model for General Contextual Object Recognition,' ECCV 2004
  20. Han, J. and Kamber, M., Data Mining: Concepts and Techniques, Morgan Kaufmann, 2000
  21. Zhang, B., 'Generalized K-Harmonic Means Dynamic Weighting of Data in Unsupervised Learning,' Proceeding of the First SIAM Intl. Conf. On Data Mining, 2001
  22. Ankerst, M., Breung, M. M., Kriegel, H. and Sander, J., 'OPTICS: Ordering Points to Identify the Clustering Structure,' Proc. ACM SIGMOD '99, 1999 https://doi.org/10.1145/304181.304187
  23. Foss, A. and Zaane, O. 'A Parameterless Method for Efficiently Discovering Clusters of Arbitrary Shape in Large Datasets', Proc. of the IEEE International Conference on Data Mining (ICDM '2002), pp. 179-186, 2002
  24. Pelleg, Dan and Moore, A., 'X -means: Extending K -rneans with Efficient Estimation of the Number of Clusters,' Proceedings of the Seventeenth International Conference on Machine Learning, 2000
  25. Hamerly, G. and Elkan, C. 'Learning the k in k-means,' Proceedings of the NIPS, 2003
  26. Shi, J. and Malik, J., 'Normalized cuts and image segrnenatation,' IEEE Trans. on Pattern Analysis and Machine Itelligence, Vol 22, No.8, pp = '888-905', 2000 https://doi.org/10.1109/34.868688
  27. Furmas, G. W., Deerwester, S., Dumais, S. T., Landauer, T., Harshman, R. A., Streeter, L. A., and Lochbaum, K. E., 'Information retrieval using a singular value decomposition model of latent semantic structure,' Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 465-480, 1998 https://doi.org/10.1145/62437.62487
  28. Monay, F. and Gatica-Perez, D. 'On Image AutoAnnotation with Latent Space Models,' Proc. ACM Int. Conf. on Multimedia (ACM MM), 2003
  29. Velliste, M. and Murphy, R.F., 'Automated Determination of Protein Subcellular Locations from 3D Fluorescence Microscope Images,' Proc. 2002 IEEE Inti Syrnp Biomed Imaging (ISBI 2002), pp. 867-870, 2002 https://doi.org/10.1109/ISBI.2002.1029397