Correlation-based Automatic Image Captioning

Hyungjeong, Yang;Pinar, Duygulu;Christos, Falout;

Journal of KIISE:Software and Applications (한국정보과학회논문지:소프트웨어및응용)

Volume 31 Issue 10
/
Pages.1386-1399
/
2004
/
1229-6848(pISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

Correlation-based Automatic Image Captioning

상호 관계 기반 자동 이미지 주석 생성

Published : 2004.10.01

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

This paper presents correlation-based automatic image captioning. Given a training set of annotated images, we want to discover correlations between visual features and textual features, so that we can automatically generate descriptive textual features for a new unseen image. We develop models with multiple design alternatives such as 1) adaptively clustering visual features, 2) weighting visual features and textual features, and 3) reducing dimensionality for noise sup-Pression. We experiment thoroughly on 10 data sets of various content styles from the Corel image database, about 680MB. The major contributions of this work are: (a) we show that careful weighting visual and textual features, as well as clustering visual features adaptively leads to consistent performance improvements, and (b) our proposed methods achieve a relative improvement of up to 45% on annotation accuracy over the state-of-the-art, EM approach.

본 논문에서는 상호 관계에 기반한 자동 이미지 주석 생성 방법을 보인다 새로운 실험 이미지를 위한 자동 주석의 생성은 훈련 데이타 내의 주석과 함께 주어진 이미지들을 이용하여 이미지의 시각적 속성과 텍스트 속성의 상호 관계를 발견해 냄으로 수행된다. 본 논문에서 제시하는 상호 관계 기반 자동주석 생성 모델은 1) 시각적 속성의 적절한 군집화, 2) 시각적 속성과 텍스트 속성의 가중치 부여, 3) 노이즈 제거를 위한 차원 축소 등의 요소를 고려하여 설계된다. 실험은 680 MB의 Corel 이미지 데이터를 이용하여 각 10개의 데이타 집합에 대해 수행되었으며, 실험 결과, 시각적 속성과 텍스트 속성에 대한 가중치 부여와 시각적 속성의 적절한 군집화가 모델의 성능을 향상시키며, 본 논문에서 제시한 상호 관계기반 모델이 기존의 EM을 이용한 자동 주석 생성 모델에 비해 45%의 상대적 성능 향상을 보인다.

Keywords

References

Benitez, A. B. and Chang, S.-F., 'Image Classification Using Multimedia Knowledge Networks,' Proceeding of the International Conference on Image Processing (ICIP-2003), 2003 https://doi.org/10.1109/ICIP.2003.1247319
Jaimes, A., Tseng, B., and Smith, J., 'Modal Keywords, Ontologies, and Reasoning for Video Understanding,' CIVR 2003, pp.248-259, 2003
Na, Y, 'Image Content Modeling for Meaningbased Retrieval,' Journal of Korean Information Science Society, Vol. 30, No.2, pp. 145-156, 2003
Cho, M., Choi, J., Shin, J., and Kim, P., 'Concept-based image retrieval using similarity measurement between concepts,' Proc. of Korean Information Science Society Conference, No. 2483, pp, 253-255, 2003
Blei, D.M. and Jordan, M. I., 'Modeling Annotated Data', 26th Annual International ACM SIGIR Conference', 2003
Barnard, K. and Forsyth, D. A., 'Learning the semantics of words and pictures', Int. Conf. on Computer Vision',pp. 408-15, 2001
Jeon, J., Lavrenko, V. and Manmatha, R, 'Automatic Image Annotation and Retrieval using Cross-Media Relevance Models,' 26th Annual International ACM SIGIR Conference, 2003 https://doi.org/10.1145/860435.860459
Lee, J., and Oh, H., 'Design of Indexing Agent for Semantic-based Video Retrieval,' Journal of Korean Information Processing Society, Vol. 10, No.6, pp.687-694, 2003 https://doi.org/10.3745/KIPSTB.2003.10B.6.687
Barnard, K, Duygulu, P., Guru, R, Gabbur, P. and Forsyth, D. A, 'The effects of segmentation and feature choice in a translation model of object recognition,' IEEE Conf, on Computer Vision and Pattern Recognition, 2003 https://doi.org/10.1109/CVPR.2003.1211532
Duygulu, P., Barnard, K., Freitas, J. F. G. de and Forsyth, D. A., 'Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary,' The Proceedings of the Seventh European Conference on Computer Vision, pp. IV:97-112, 2002
Li, J. and Wang, J. Z., 'Automatic linguistic indexing of pictures by a statistical modeling approach,' IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 25, No. 10, 2003
Mori, Y. and Takahashi, H. and Oka, R. 'Imageto-word transformation based on dividing and vector quantizing images with words,' First International Workshop on Multimedia Intelligent Storage and Retrieval Management, 1999
Maron, O. and Ratan, A. L., 'Multiple-Instance Learning for Natural Scene Classification,' The Fifteenth International Conference on Machine Learning, 1998
Wenyin, L., Dumais, S., Sun, Y., Zhang, H., Czerwinski, M. and Field, B., 'Semi-Automatic Image Annotation,' INTERACT2001, 8th IFIP TC.13 Conference on Human-Computer Interaction, 2001
Brown, P. F., Pietra, S. A., Della, P. and Mercer, R. L., 'The mathematics of statistical machine translation: Parameter estimation,' Computational Linguistics, Vol. 19, No.2, pp. 263-311, 1993
Barnard, K., Duygulu, P. and Forsyth, D. A., 'Clustering art,' IEEE Conf. on Computer Vision and Pattern Recognition, pp. 434-441, 2001
Hofmann, T., 'Unsupervised Learning by Probabilistic Latent Semantic Analysis,' Machine Learning Journal, Vol. = 42, No.1, pp. 177-196, 2001 https://doi.org/10.1023/A:1007617005950
Lavrenko, V., Manmatha, R. and Jeon, J. 'A Model for Learning the Semantics of Pictures,' NIPS, 2003
Carbonetto, P., Freitas, N. de and Barnard, K., 'A Statistical Model for General Contextual Object Recognition,' ECCV 2004
Han, J. and Kamber, M., Data Mining: Concepts and Techniques, Morgan Kaufmann, 2000
Zhang, B., 'Generalized K-Harmonic Means Dynamic Weighting of Data in Unsupervised Learning,' Proceeding of the First SIAM Intl. Conf. On Data Mining, 2001
Ankerst, M., Breung, M. M., Kriegel, H. and Sander, J., 'OPTICS: Ordering Points to Identify the Clustering Structure,' Proc. ACM SIGMOD '99, 1999 https://doi.org/10.1145/304181.304187
Foss, A. and Zaane, O. 'A Parameterless Method for Efficiently Discovering Clusters of Arbitrary Shape in Large Datasets', Proc. of the IEEE International Conference on Data Mining (ICDM '2002), pp. 179-186, 2002
Pelleg, Dan and Moore, A., 'X -means: Extending K -rneans with Efficient Estimation of the Number of Clusters,' Proceedings of the Seventeenth International Conference on Machine Learning, 2000
Hamerly, G. and Elkan, C. 'Learning the k in k-means,' Proceedings of the NIPS, 2003
Shi, J. and Malik, J., 'Normalized cuts and image segrnenatation,' IEEE Trans. on Pattern Analysis and Machine Itelligence, Vol 22, No.8, pp = '888-905', 2000 https://doi.org/10.1109/34.868688
Furmas, G. W., Deerwester, S., Dumais, S. T., Landauer, T., Harshman, R. A., Streeter, L. A., and Lochbaum, K. E., 'Information retrieval using a singular value decomposition model of latent semantic structure,' Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 465-480, 1998 https://doi.org/10.1145/62437.62487
Monay, F. and Gatica-Perez, D. 'On Image AutoAnnotation with Latent Space Models,' Proc. ACM Int. Conf. on Multimedia (ACM MM), 2003
Velliste, M. and Murphy, R.F., 'Automated Determination of Protein Subcellular Locations from 3D Fluorescence Microscope Images,' Proc. 2002 IEEE Inti Syrnp Biomed Imaging (ISBI 2002), pp. 867-870, 2002 https://doi.org/10.1109/ISBI.2002.1029397

Journal of KIISE:Software and Applications (한국정보과학회논문지:소프트웨어및응용)

Correlation-based Automatic Image Captioning

상호 관계 기반 자동 이미지 주석 생성

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)