• Title/Summary/Keyword: text image

Search Result 982, Processing Time 0.034 seconds

Meme Analysis using Image Captioning Model and GPT-4

  • Marvin John Ignacio;Thanh Tin Nguyen;Jia Wang;Yong-Guk Kim
    • Annual Conference of KIPS
    • /
    • 2023.11a
    • /
    • pp.628-631
    • /
    • 2023
  • We present a new approach to evaluate the generated texts by Large Language Models (LLMs) for meme classification. Analyzing an image with embedded texts, i.e. meme, is challenging, even for existing state-of-the-art computer vision models. By leveraging large image-to-text models, we can extract image descriptions that can be used in other tasks, such as classification. In our methodology, we first generate image captions using BLIP-2 models. Using these captions, we use GPT-4 to evaluate the relationship between the caption and the meme text. The results show that OPT6.7B provides a better rating than other LLMs, suggesting that the proposed method has a potential for meme classification.

Injection of Cultural-based Subjects into Stable Diffusion Image Generative Model

  • Amirah Alharbi;Reem Alluhibi;Maryam Saif;Nada Altalhi;Yara Alharthi
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.2
    • /
    • pp.1-14
    • /
    • 2024
  • While text-to-image models have made remarkable progress in image synthesis, certain models, particularly generative diffusion models, have exhibited a noticeable bias to- wards generating images related to the culture of some developing countries. This paper introduces an empirical investigation aimed at mitigating the bias of image generative model. We achieve this by incorporating symbols representing Saudi culture into a stable diffusion model using the Dreambooth technique. CLIP score metric is used to assess the outcomes in this study. This paper also explores the impact of varying parameters for instance the quantity of training images and the learning rate. The findings reveal a substantial reduction in bias-related concerns and propose an innovative metric for evaluating cultural relevance.

Image Based Text Matching Using Local Crowdedness and Hausdorff Distance (지역 밀집도 및 Hausdorff 거리를 이용한 영상기반 텍스트 매칭)

  • Son, Hwa-Jeong;Kim, Ji-Soo;Park, Mi-Seon;Yoo, Jae-Myeong;Kim, Soo-Hyung
    • The Journal of the Korea Contents Association
    • /
    • v.6 no.10
    • /
    • pp.134-142
    • /
    • 2006
  • In this paper, we investigate a Hausdorff distance, which is used for the measurement of image similarity, to see whether it is also effective for document retrieval. The proposed method uses a local crowdedness and a Hausdorff distance to locate text images by determining whether a pair of images scanned at different time comes from the same text or not. To reduce the processing time, which is one of the disadvantages of a Hausdorff distance algorithm, we adopt a local crowdedness for feature point extraction. We apply the proposed method to 190 pairs of the same class and 190 pairs of the different class collected from postal envelop images. The results show that the modified Hausdorff distance proposed in this paper performed well in locating the tort region and calculating the degree of similarity between two images. An improvement of accuracy by 2.7% and 9.0% has been obtained, compared to a binary correlation method and the original Hausdorff distance method, respectively.

  • PDF

A Study on the Improvement of Retrieval Efficiency Based on the CRFMD (공통기술표현포맷에 기반한 다매체자료의 검색효율 향상에 관한 연구)

  • Park, Il-Jong;Jeong, Ki-Tai
    • Journal of the Korean Society for information Management
    • /
    • v.23 no.3 s.61
    • /
    • pp.5-21
    • /
    • 2006
  • In recent years, theories of image and sound analysis have been proposed to work with text retrieval systems and have progressed quickly with the rapid progress in data processing speeds. This study proposes a common representation format for multimedia documents (CRFMD) composed of both images and text to form a single data structure. It also shows that image classification of a given test set is dramatically improved when text features are encoded together with image features. CRFMD might be applicable to other areas of multimedia document retrieval and processing, such as medical image retrieval, World Wide Web searching, and museum collection retrieval.

The sound analysis of (<이야기 속의 이야기> 사운드 분석)

  • Mok, Hae-Jung
    • Cartoon and Animation Studies
    • /
    • s.20
    • /
    • pp.87-104
    • /
    • 2010
  • Animation creates meaning and affection by combinig image and sound like film. directed by Yuri Norstein is a good text for analyzing animation sound in that it combines image and various music and sound effects well. This study focuses on analyzing the way that sound function to make meaning in this text. Generally sound is categorized into dialogue, music, and sound effect. And animation has its own characteristic in each category. The voice for dialogue is created corresponding to the image of the character and the rhythm is very important in Animation. Plus Sound effect in animation can be said to mimic not just sound but also movement. This study analyzes sound based on three sound factors and the concepts of the point of listening, subjective sound, and sound bridge. Subjective sound using the point of listening of the wolf and the baby bestows a special position on the main characters in the text. It is the overall characteristic of the sound use of this text that the repetitive combination of sound and image, the linguistic and annotative function of sound effect, and comparatively conventional use of music and sound effect enhance the affection and readability.

  • PDF

Using similarity based image caption to aid visual question answering (유사도 기반 이미지 캡션을 이용한 시각질의응답 연구)

  • Kang, Joonseo;Lim, Changwon
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.2
    • /
    • pp.191-204
    • /
    • 2021
  • Visual Question Answering (VQA) and image captioning are tasks that require understanding of the features of images and linguistic features of text. Therefore, co-attention may be the key to both tasks, which can connect image and text. In this paper, we propose a model to achieve high performance for VQA by image caption generated using a pretrained standard transformer model based on MSCOCO dataset. Captions unrelated to the question can rather interfere with answering, so some captions similar to the question were selected to use based on a similarity to the question. In addition, stopwords in the caption could not affect or interfere with answering, so the experiment was conducted after removing stopwords. Experiments were conducted on VQA-v2 data to compare the proposed model with the deep modular co-attention network (MCAN) model, which showed good performance by using co-attention between images and text. As a result, the proposed model outperformed the MCAN model.

Character Region Detection in Natural Image Using Edge and Connected Component by Morphological Reconstruction (에지 및 형태학적 재구성에 의한 연결요소를 이용한 자연영상의 문자영역 검출)

  • Gwon, Gyo-Hyeon;Park, Jong-Cheon;Jun, Byoung-Min
    • Journal of Korea Entertainment Industry Association
    • /
    • v.5 no.1
    • /
    • pp.127-133
    • /
    • 2011
  • Characters in natural image are an important information with various context. Previous work of character region detection algorithms is not detect of character region in case of image complexity and the surrounding lighting, similar background to character, so this paper propose an method of character region detection in natural image using edge and connected component by morphological reconstructions. Firstly, we detect edge using Canny-edge detector and connected component with local min/max value by morphological reconstructed-operation in gray-scale image, and labeling each of detected connected component elements. lastly, detected candidate of text regions was merged for generation for one candidate text region, Final text region detected by checking the similarity and adjacency of neighbor of text candidate individual character. As the results of experiments, proposed algorithm improved the correctness of character regions detection using edge and connected components.

A Categorization Scheme of Tag-based Folksonomy Images for Efficient Image Retrieval (효과적인 이미지 검색을 위한 태그 기반의 폭소노미 이미지 카테고리화 기법)

  • Ha, Eunji;Kim, Yongsung;Hwang, Eenjun
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.6
    • /
    • pp.290-295
    • /
    • 2016
  • Recently, folksonomy-based image-sharing sites where users cooperatively make and utilize tags of image annotation have been gaining popularity. Typically, these sites retrieve images for a user request using simple text-based matching and display retrieved images in the form of photo stream. However, these tags are personal and subjective and images are not categorized, which results in poor retrieval accuracy and low user satisfaction. In this paper, we propose a categorization scheme for folksonomy images which can improve the retrieval accuracy in the tag-based image retrieval systems. Consequently, images are classified by the semantic similarity using text-information and image-information generated on the folksonomy. To evaluate the performance of our proposed scheme, we collect folksonomy images and categorize them using text features and image features. And then, we compare its retrieval accuracy with that of existing systems.

Text Verification Based on Sub-Image Matching (부분 영상 매칭에 기반한 텍스트 검증)

  • Son Hwa Jeong;Jeong Seon Hwa;Kim Soo Hyung
    • The KIPS Transactions:PartB
    • /
    • v.12B no.2 s.98
    • /
    • pp.115-122
    • /
    • 2005
  • The sub-mage matching problem in which one image contains some part of the other image, has been mostly investigated on natural images. In this paper, we propose two sub-image matching techniques: mesh-based method and correlation-based method, that are efficiently used to match text images. Mesh-based method consists of two stages, box alignment and similarity measurement by extracting the mesh feature from the two images. Correlation-based method determines the similarity using the correlation of the two images based on FFT function. We have applied the two methods to the text verification in a postal automation system and observed that the accuracy of correlation-based method is $92.7\%$ while that of mesh-based method is $90.1\%$.

A Development design Image DataBase (디자인 이미지데이터베이스 구축사례 연구)

  • 정지홍
    • Archives of design research
    • /
    • v.13 no.3
    • /
    • pp.313-320
    • /
    • 2000
  • Currently, The new wave of information technology has enormously influenced every field. In the Held of design, it is time to strive possible efforts in order to accumulate the design-related knowledge by maintaining, managing and controlling design information in a systematic manner, getting out of the old stage of mere use of data itself. Due to remarkable progress in communication media and speed, and file compression technology, text-centric data has been shifting to multimedia data such as image and motion picture. So it is currently required that methologies be developed to effectively utilize the related information. With respect to the processing of image data, it is certain that the optimal method should be come up with reflecting the unique characteristics and utilization of image data, apart from the traditional way of processing and storing the legacy text-based data. The study suggests the system of indexing and implementing design image information through the case of analyzing design image data, abstracting data elements of image itself, and finally applying it to building image-oriented database for use.

  • PDF