Browse > Article

Correlation-based Automatic Image Captioning  

Hyungjeong, Yang
Pinar, Duygulu
Christos, Falout
Abstract
This paper presents correlation-based automatic image captioning. Given a training set of annotated images, we want to discover correlations between visual features and textual features, so that we can automatically generate descriptive textual features for a new unseen image. We develop models with multiple design alternatives such as 1) adaptively clustering visual features, 2) weighting visual features and textual features, and 3) reducing dimensionality for noise sup-Pression. We experiment thoroughly on 10 data sets of various content styles from the Corel image database, about 680MB. The major contributions of this work are: (a) we show that careful weighting visual and textual features, as well as clustering visual features adaptively leads to consistent performance improvements, and (b) our proposed methods achieve a relative improvement of up to 45% on annotation accuracy over the state-of-the-art, EM approach.
Keywords
Image annotation; correlation; Singular Value Decomposition; Clustering;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Hamerly, G. and Elkan, C. 'Learning the k in k-means,' Proceedings of the NIPS, 2003
2 Shi, J. and Malik, J., 'Normalized cuts and image segrnenatation,' IEEE Trans. on Pattern Analysis and Machine Itelligence, Vol 22, No.8, pp = '888-905', 2000   DOI   ScienceOn
3 Furmas, G. W., Deerwester, S., Dumais, S. T., Landauer, T., Harshman, R. A., Streeter, L. A., and Lochbaum, K. E., 'Information retrieval using a singular value decomposition model of latent semantic structure,' Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 465-480, 1998   DOI
4 Monay, F. and Gatica-Perez, D. 'On Image AutoAnnotation with Latent Space Models,' Proc. ACM Int. Conf. on Multimedia (ACM MM), 2003
5 Velliste, M. and Murphy, R.F., 'Automated Determination of Protein Subcellular Locations from 3D Fluorescence Microscope Images,' Proc. 2002 IEEE Inti Syrnp Biomed Imaging (ISBI 2002), pp. 867-870, 2002   DOI
6 Foss, A. and Zaane, O. 'A Parameterless Method for Efficiently Discovering Clusters of Arbitrary Shape in Large Datasets', Proc. of the IEEE International Conference on Data Mining (ICDM '2002), pp. 179-186, 2002
7 Pelleg, Dan and Moore, A., 'X -means: Extending K -rneans with Efficient Estimation of the Number of Clusters,' Proceedings of the Seventeenth International Conference on Machine Learning, 2000
8 Wenyin, L., Dumais, S., Sun, Y., Zhang, H., Czerwinski, M. and Field, B., 'Semi-Automatic Image Annotation,' INTERACT2001, 8th IFIP TC.13 Conference on Human-Computer Interaction, 2001
9 Brown, P. F., Pietra, S. A., Della, P. and Mercer, R. L., 'The mathematics of statistical machine translation: Parameter estimation,' Computational Linguistics, Vol. 19, No.2, pp. 263-311, 1993
10 Barnard, K., Duygulu, P. and Forsyth, D. A., 'Clustering art,' IEEE Conf. on Computer Vision and Pattern Recognition, pp. 434-441, 2001
11 Hofmann, T., 'Unsupervised Learning by Probabilistic Latent Semantic Analysis,' Machine Learning Journal, Vol. = 42, No.1, pp. 177-196, 2001   DOI
12 Lavrenko, V., Manmatha, R. and Jeon, J. 'A Model for Learning the Semantics of Pictures,' NIPS, 2003
13 Carbonetto, P., Freitas, N. de and Barnard, K., 'A Statistical Model for General Contextual Object Recognition,' ECCV 2004
14 Han, J. and Kamber, M., Data Mining: Concepts and Techniques, Morgan Kaufmann, 2000
15 Zhang, B., 'Generalized K-Harmonic Means Dynamic Weighting of Data in Unsupervised Learning,' Proceeding of the First SIAM Intl. Conf. On Data Mining, 2001
16 Lee, J., and Oh, H., 'Design of Indexing Agent for Semantic-based Video Retrieval,' Journal of Korean Information Processing Society, Vol. 10, No.6, pp.687-694, 2003   과학기술학회마을   DOI
17 Ankerst, M., Breung, M. M., Kriegel, H. and Sander, J., 'OPTICS: Ordering Points to Identify the Clustering Structure,' Proc. ACM SIGMOD '99, 1999   DOI
18 Barnard, K. and Forsyth, D. A., 'Learning the semantics of words and pictures', Int. Conf. on Computer Vision',pp. 408-15, 2001
19 Jeon, J., Lavrenko, V. and Manmatha, R, 'Automatic Image Annotation and Retrieval using Cross-Media Relevance Models,' 26th Annual International ACM SIGIR Conference, 2003   DOI
20 Barnard, K, Duygulu, P., Guru, R, Gabbur, P. and Forsyth, D. A, 'The effects of segmentation and feature choice in a translation model of object recognition,' IEEE Conf, on Computer Vision and Pattern Recognition, 2003   DOI
21 Duygulu, P., Barnard, K., Freitas, J. F. G. de and Forsyth, D. A., 'Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary,' The Proceedings of the Seventh European Conference on Computer Vision, pp. IV:97-112, 2002
22 Li, J. and Wang, J. Z., 'Automatic linguistic indexing of pictures by a statistical modeling approach,' IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 25, No. 10, 2003
23 Mori, Y. and Takahashi, H. and Oka, R. 'Imageto-word transformation based on dividing and vector quantizing images with words,' First International Workshop on Multimedia Intelligent Storage and Retrieval Management, 1999
24 Na, Y, 'Image Content Modeling for Meaningbased Retrieval,' Journal of Korean Information Science Society, Vol. 30, No.2, pp. 145-156, 2003
25 Maron, O. and Ratan, A. L., 'Multiple-Instance Learning for Natural Scene Classification,' The Fifteenth International Conference on Machine Learning, 1998
26 Benitez, A. B. and Chang, S.-F., 'Image Classification Using Multimedia Knowledge Networks,' Proceeding of the International Conference on Image Processing (ICIP-2003), 2003   DOI
27 Jaimes, A., Tseng, B., and Smith, J., 'Modal Keywords, Ontologies, and Reasoning for Video Understanding,' CIVR 2003, pp.248-259, 2003
28 Cho, M., Choi, J., Shin, J., and Kim, P., 'Concept-based image retrieval using similarity measurement between concepts,' Proc. of Korean Information Science Society Conference, No. 2483, pp, 253-255, 2003   과학기술학회마을
29 Blei, D.M. and Jordan, M. I., 'Modeling Annotated Data', 26th Annual International ACM SIGIR Conference', 2003