• Title/Summary/Keyword: Image Caption

Search Result 51, Processing Time 0.021 seconds

Automated Story Generation with Image Captions and Recursiva Calls (이미지 캡션 및 재귀호출을 통한 스토리 생성 방법)

  • Isle Jeon;Dongha Jo;Mikyeong Moon
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.24 no.1
    • /
    • pp.42-50
    • /
    • 2023
  • The development of technology has achieved digital innovation throughout the media industry, including production techniques and editing technologies, and has brought diversity in the form of consumer viewing through the OTT service and streaming era. The convergence of big data and deep learning networks automatically generated text in format such as news articles, novels, and scripts, but there were insufficient studies that reflected the author's intention and generated story with contextually smooth. In this paper, we describe the flow of pictures in the storyboard with image caption generation techniques, and the automatic generation of story-tailored scenarios through language models. Image caption using CNN and Attention Mechanism, we generate sentences describing pictures on the storyboard, and input the generated sentences into the artificial intelligence natural language processing model KoGPT-2 in order to automatically generate scenarios that meet the planning intention. Through this paper, the author's intention and story customized scenarios are created in large quantities to alleviate the pain of content creation, and artificial intelligence participates in the overall process of digital content production to activate media intelligence.

Meme Analysis using Image Captioning Model and GPT-4

  • Marvin John Ignacio;Thanh Tin Nguyen;Jia Wang;Yong-Guk Kim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.628-631
    • /
    • 2023
  • We present a new approach to evaluate the generated texts by Large Language Models (LLMs) for meme classification. Analyzing an image with embedded texts, i.e. meme, is challenging, even for existing state-of-the-art computer vision models. By leveraging large image-to-text models, we can extract image descriptions that can be used in other tasks, such as classification. In our methodology, we first generate image captions using BLIP-2 models. Using these captions, we use GPT-4 to evaluate the relationship between the caption and the meme text. The results show that OPT6.7B provides a better rating than other LLMs, suggesting that the proposed method has a potential for meme classification.

Membership Inference Attack against Text-to-Image Model Based on Generating Adversarial Prompt Using Textual Inversion (Textual Inversion을 활용한 Adversarial Prompt 생성 기반 Text-to-Image 모델에 대한 멤버십 추론 공격)

  • Yoonju Oh;Sohee Park;Daeseon Choi
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.33 no.6
    • /
    • pp.1111-1123
    • /
    • 2023
  • In recent years, as generative models have developed, research that threatens them has also been actively conducted. We propose a new membership inference attack against text-to-image model. Existing membership inference attacks on Text-to-Image models produced a single image as captions of query images. On the other hand, this paper uses personalized embedding in query images through Textual Inversion. And we propose a membership inference attack that effectively generates multiple images as a method of generating Adversarial Prompt. In addition, the membership inference attack is tested for the first time on the Stable Diffusion model, which is attracting attention among the Text-to-Image models, and achieve an accuracy of up to 1.00.

Image Caption Area extraction using Saliency Map and Max Filter (중요도 맵과 최댓값 필터를 이용한 영상 자막 영역 추출)

  • Kim, Youngjin;Kim, Manbae
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2014.11a
    • /
    • pp.63-64
    • /
    • 2014
  • 본 논문에서는 Saliency map과 Max Filter를 이용한 영상의 자막영역을 추출 한다. Saliency map은 눈에 띄는 영역, 즉 영상에서 주변영역에 비해 밝기 차이가 심한 영역과 윤곽선에 대한 특징이 강한 영역을 돌출하는 것을 말하며, MaxFilter는 중심 픽셀을 최대 윈도우 값을 사용하는 것으로 극단적인 Impulse Noise를 제거하는데 효과적이며 특히 어두운 스파이크를 제거하는데 유용하게 사용된다. 이 두 가지의 특징들을 이용하여 영상의 자막 영역을 추출한다.

  • PDF

Image Processing Algorithm for Crack Detection of Sewer with low resolution (저해상도 하수관거의 균열 탐지를 위한 영상처리 알고리즘)

  • Son, Byung Jik;Jeon, Joon Ryong;Heo, Gwang Hee
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.2
    • /
    • pp.590-599
    • /
    • 2017
  • In South Korea, sewage pipeline exploration devices have been developed using high resolution digital cameras of 2 mega-pixels or more. On the other hand, most devices are less than 300 kilo-pixels. Moreover, because 100 kilo-pixels devices are used widely, the environment for image processing is very poor. In this study, very low resolution ($240{\times}320$ = 76,800 pixels) images were adapted when it is difficult to detect cracks. Considering that the images of sewers in South Korea have very low resolution, this study selected low resolution images to be investigated. An automatic crack detection technique was studied using digital image processing technology for low resolution images of sewage pipelines. The authors developed a program to automatically detect cracks as 6 steps based on the MATLAB functions. In this study, the second step covers an algorithm developed to find the optimal threshold value, and the fifth step deals with an algorithm to determine cracks. In step 2, Otsu's threshold for images with a white caption was higher than that for an image without caption. Therefore, the optimal threshold was found by decreasing the Otsu threshold by 0.01 from the beginning. Step 5 presents an algorithm that detects cracks by judging that the length is 10 mm (40 pixels) or more and the width is 1 mm (4 pixels) or more. As a result, the crack detection performance was good despite the very low-resolution images.

A Novel Image Captioning based Risk Assessment Model (이미지 캡셔닝 기반의 새로운 위험도 측정 모델)

  • Jeon, Min Seong;Ko, Jae Pil;Cheoi, Kyung Joo
    • The Journal of Information Systems
    • /
    • v.32 no.4
    • /
    • pp.119-136
    • /
    • 2023
  • Purpose We introduce a groundbreaking surveillance system explicitly designed to overcome the limitations typically associated with conventional surveillance systems, which often focus primarily on object-centric behavior analysis. Design/methodology/approach The study introduces an innovative approach to risk assessment in surveillance, employing image captioning to generate descriptive captions that effectively encapsulate the interactions among objects, actions, and spatial elements within observed scenes. To support our methodology, we developed a distinctive dataset comprising pairs of [image-caption-danger score] for training purposes. We fine-tuned the BLIP-2 model using this dataset and utilized BERT to decipher the semantic content of the generated captions for assessing risk levels. Findings In a series of experiments conducted with our self-constructed datasets, we illustrate that these datasets offer a wealth of information for risk assessment and display outstanding performance in this area. In comparison to models pre-trained on established datasets, our generated captions thoroughly encompass the necessary object attributes, behaviors, and spatial context crucial for the surveillance system. Additionally, they showcase adaptability to novel sentence structures, ensuring their versatility across a range of contexts.

Efficient Content-Based Image Retrieval Method using Shape and Color feature (형태와 칼러성분을 이용한 효율적인 내용 기반의 이미지 검색 방법)

  • Youm, Sung-Ju;Kim, Woo-Saeng
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.4
    • /
    • pp.733-744
    • /
    • 1996
  • Content-based image retrieval(CBIR) is an image data retrieval methodology using characteristic values of image data those are generated by system automatically without any caption or text information. In this paper, we propose a content-based image data retrieval method using shape and color features of image data as characteristic values. For this, we present some image processing techniques used for feature extraction and indexing techniques based on trie and R tree for fast image data retrieval. In our approach, image query result is more reliable because both shape and color features are considered. Also, we how an image database which implemented according to our approaches and sample retrieval results which are selected by our system from 200 sample images, and an analysis about the result by considering the effect of characteristic values of shape and color.

  • PDF

The Examination of Reliability of Lower Limb Joint Angles with Free Software ImageJ

  • Kim, Heung Youl
    • Journal of the Ergonomics Society of Korea
    • /
    • v.34 no.6
    • /
    • pp.583-595
    • /
    • 2015
  • Objective: The purpose of this study was to determine the reliability of lower limb joint angles computed with the software ImageJ during jumping movements. Background: Kinematics is the study of bodies in motion without regard to the forces or torques that may produce the motion. The most common method for collecting motion data uses an imaging and motion-caption system to record the 2D or 3D coordinates of markers attached to a moving object, followed by manual or automatic digitizing software. Above all, passive optical motion capture systems (e.g. Vicon system) have been regarded as the gold standards for collecting motion data. On the other hand, ImageJ is used widely for an image analysis as free software, and can collect the 2D coordinates of markers. Although much research has been carried out into the utilizations of the ImageJ software, little is known about their reliability. Method: Seven healthy female students participated as the subject in this study. Seventeen reflective markers were attached on the right and left lower limbs to measure two and three-dimensional joint angular motions. Jump performance was recorded by ten-vicon camera systems (250Hz) and one digital video camera (240Hz). The joint angles of the ankle and knee joints were calculated using 2D (ImageJ) and 3D (Vicon-MX) motion data, respectively. Results: Pearson's correlation coefficients between the two methods were calculated, and significance tests were conducted (${\alpha}=1%$). Correlation coefficients between the two were over 0.98. In Vicon-MX and ImageJ, there is no systematic error by examination of the validity using the Bland-Altman method, and all data are in the 95% limits of agreement. Conclusion: In this study, correlation coefficients are generally high, and the regression line is near the identical line. Therefore, it is considered that motion analysis using ImageJ is a useful tool for evaluation of human movements in various research areas. Application: This result can be utilized as a practical tool to analyze human performance in various fields.

The Evaluation Structure of Auditory Images on the Streetscapes - The Semantic Issues of Soundscape based on the Students' Fieldwork - (거리경관에 대한 청각적 이미지의 평가구조 - 대학생들의 음풍경 체험을 통한 의미론적 고찰 -)

  • Han Myung-Ho
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.8
    • /
    • pp.481-491
    • /
    • 2005
  • The purpose of this study is to interpret the evaluation structure of auditory images about streetscapes in urban area on the basis of the semantic view of soundscapes. Using the caption evaluation method. which is a new method, from 2001 to 2005, a total of 45 college students participated in a fieldwork to find out the images of sounds while walking on the main streets of Namwon city. It was able get various data which include elements, features, impressions, and preferences about auditory scene. In Namwon city, the elements of the formation of auditory images are classified into natural sound and artificial sound which include machinery sounds, community sounds. and signal sounds. Also, the features of the auditory scene are classified by kind of sound, behavior, condition, character, relationship of circumference and image. Finally, the impression of auditory scene is classified into three categories, which are the emotions of humans, atmosphere of the streets, and the characteristics of the sound itself. From the relationship between auditory scene and estimation, the elements, features and impressions of auditory scene consist of the items which are positive, neutral, and negative images. Also, it was able to grasp the characteristics of auditory image of place or space through the evaluation model of streetscapes in Namwon city.

Learning and Transferring Deep Neural Network Models for Image Caption Generation (이미지 캡션 생성을 위한 심층 신경망 모델 학습과 전이)

  • Kim, Dong-Ha;Kim, Incheol
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2016.10a
    • /
    • pp.617-620
    • /
    • 2016
  • 본 논문에서는 이미지 캡션 생성과 모델 전이에 효과적인 심층 신경망 모델을 제시한다. 본 모델은 멀티 모달 순환 신경망 모델의 하나로서, 이미지로부터 시각 정보를 추출하는 컨볼루션 신경망 층, 각 단어를 저차원의 특징으로 변환하는 임베딩 층, 캡션 문장 구조를 학습하는 순환 신경망 층, 시각 정보와 언어 정보를 결합하는 멀티 모달 층 등 총 5 개의 계층들로 구성된다. 특히 본 모델에서는 시퀀스 패턴 학습과 모델 전이에 우수한 LSTM 유닛을 이용하여 순환 신경망 층을 구성하고, 컨볼루션 신경망 층의 출력을 임베딩 층뿐만 아니라 멀티 모달 층에도 연결함으로써, 캡션 문장 생성을 위한 매 단계마다 이미지의 시각 정보를 이용할 수 있는 연결 구조를 가진다. Flickr8k, Flickr30k, MSCOCO 등의 공개 데이터 집합들을 이용한 다양한 비교 실험을 통해, 캡션의 정확도와 모델 전이의 효과 면에서 본 논문에서 제시한 멀티 모달 순환 신경망 모델의 우수성을 입증하였다.