• Title/Summary/Keyword: Captions

Search Result 64, Processing Time 0.022 seconds

Indexing and Retrieving of Video Data (비디오 데이터의 색인과 검색)

  • Heo, Jin-Yong;Park, Dong-Won;An, Syung-Og
    • The Journal of Engineering Research
    • /
    • v.3 no.1
    • /
    • pp.107-116
    • /
    • 1998
  • Video data are retrieved and stored in various compressed forms according to their characteristics. In this paper, we present a generic data model that captures the structure of a video document and that provides a means for indexing a video stream. Using this model, we design and implement CVIMS (the MPEG-2 Compressed Video Information Management System) to store and retrieve video documents. CVIMS extracts I-frames from MPEG-2 TS files, selects key-frames from the I-frames, and stores in database the index information such as thumbnails, captions, and picture descriptors of the key-frames. And, CVIMS retrieves MPEG-2 video data using the thumbnails of key-frames and various labels of queries. And also, the system is accessible by a web interface.

  • PDF

3D stereoscopic representation of title in broadcasting, the distance standardize for the study of parallax (입체영상 방송텍스트에서 입체감을 위한 패럴렉스 데이터 표준화에 관한 연구)

  • Oh, Moon Seok;Lee, Yun Sang
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.7 no.4
    • /
    • pp.111-118
    • /
    • 2011
  • Recent advances in the media have no special change is the development of the 3D stereoscopic image, which started in the movie is coming over now to the broadcast. Confusing variety having, in the production of 3D images that are waiting for the standardized production. 3D images of them being used in broadcast subtitles, first because there is no standardized production systems, making it look is dedicated to the time and effort. This research necessary to create 3D images of these subtitles, titles, text-based objects, such as Rig imaging using a standardized way to synthesize the most stable is proposed. First, with captions or titles, and the readability and understanding of the uniqueness to the human eye to create an environment that is kind of crowd. Because of this, excessive camera Ferrell Rex (gap) created a branch bunch of snow, work should not hurt readability. 100 adult men and women throughout the experiment.

Design and Implementation of MPEG-2 Compressed Video Information Management System (MPEG-2 압축 동영상 정보 관리 시스템의 설계 및 구현)

  • Heo, Jin-Yong;Kim, In-Hong;Bae, Jong-Min;Kang, Hyun-Syug
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.6
    • /
    • pp.1431-1440
    • /
    • 1998
  • Video data are retrieved and stored in various compressed forms according to their characteristics, In this paper, we present a generic data model that captures the structure of a video document and that provides a means for indexing a video stream, Using this model, we design and implement CVIMS (the MPEG-2 Compressed Video Information Management System) to store and retrieve video documents, CVIMS extracts I-frames from MPEG-2 files, selects key-frames from the I -frames, and stores in database the index information such as thumbnails, captions, and picture descriptors of the key-frames, And also, CVIMS retrieves MPEG- 2 video data using the thumbnails of key-frames and v31ious labels of queries.

  • PDF

A Research of Character Graphic Design on Larger Television Screens -Based on Analysis of its Visual Perception- (TV화면 대형화에 따른 문자그래픽 표현 연구 -시각인지도 분석 기반-)

  • Lee, Kook-Se;Moon, Nam-Mee
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.9 no.4
    • /
    • pp.129-138
    • /
    • 2009
  • Character graphic design of TV screen, the major visual element, has become greatly important in its roles to help viewers understand visual information better and to enhance the qualities of the program. This research is to figure our a way of changing and improving the attributes of TV captions and graphics such as fonts, size, and caption speed appropriate to bigger and better qualified TV screen. Based on two Delphi surveys of graphics experts along with the theoretical studies, this article analyzes the relevance of visual perception to various visual elements of TV screen, and proposes a better plan in visual effects for various media under OSMU (One Source Multi Use).

  • PDF

An Image Retrieving Scheme Using Salient Features and Annotation Watermarking

  • Wang, Jenq-Haur;Liu, Chuan-Ming;Syu, Jhih-Siang;Chen, Yen-Lin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.1
    • /
    • pp.213-231
    • /
    • 2014
  • Existing image search systems allow users to search images by keywords, or by example images through content-based image retrieval (CBIR). On the other hand, users might learn more relevant textual information about an image from its text captions or surrounding contexts within documents or Web pages. Without such contexts, it's difficult to extract semantic description directly from the image content. In this paper, we propose an annotation watermarking system for users to embed text descriptions, and retrieve more relevant textual information from similar images. First, tags associated with an image are converted by two-dimensional code and embedded into the image by discrete wavelet transform (DWT). Next, for images without annotations, similar images can be obtained by CBIR techniques and embedded annotations can be extracted. Specifically, we use global features such as color ratios and dominant sub-image colors for preliminary filtering. Then, local features such as Scale-Invariant Feature Transform (SIFT) descriptors are extracted for similarity matching. This design can achieve good effectiveness with reasonable processing time in practical systems. Our experimental results showed good accuracy in retrieving similar images and extracting relevant tags from similar images.

AI photo storyteller based on deep encoder-decoder architecture (딥인코더-디코더 기반의 인공지능 포토 스토리텔러)

  • Min, Kyungbok;Dang, L. Minh;Lee, Sujin;Moon, Hyeonjoon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.10a
    • /
    • pp.931-934
    • /
    • 2019
  • Research using artificial intelligence to generate captions for an image has been studied extensively. However, these systems are unable to create creative stories that include more than one sentence based on image content. A story is a better way that humans use to foster social cooperation and develop social norms. This paper proposes a framework that can generate a relatively short story to describe based on the context of an image. The main contributions of this paper are (1) An unsupervised framework which uses recurrent neural network structure and encoder-decoder model to construct a short story for an image. (2) A huge English novel dataset, including horror and romantic themes that are manually collected and validated. By investigating the short stories, the proposed model proves that it can generate more creative contents compared to existing intelligent systems which can produce only one concise sentence. Therefore, the framework demonstrated in this work will trigger the research of a more robust AI story writer and encourages the application of the proposed model in helping story writer find a new idea.

Membership Inference Attack against Text-to-Image Model Based on Generating Adversarial Prompt Using Textual Inversion (Textual Inversion을 활용한 Adversarial Prompt 생성 기반 Text-to-Image 모델에 대한 멤버십 추론 공격)

  • Yoonju Oh;Sohee Park;Daeseon Choi
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.33 no.6
    • /
    • pp.1111-1123
    • /
    • 2023
  • In recent years, as generative models have developed, research that threatens them has also been actively conducted. We propose a new membership inference attack against text-to-image model. Existing membership inference attacks on Text-to-Image models produced a single image as captions of query images. On the other hand, this paper uses personalized embedding in query images through Textual Inversion. And we propose a membership inference attack that effectively generates multiple images as a method of generating Adversarial Prompt. In addition, the membership inference attack is tested for the first time on the Stable Diffusion model, which is attracting attention among the Text-to-Image models, and achieve an accuracy of up to 1.00.

A Study on Image Indexing Method based on Content (내용에 기반한 이미지 인덱싱 방법에 관한 연구)

  • Yu, Won-Gyeong;Jeong, Eul-Yun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.2 no.6
    • /
    • pp.903-917
    • /
    • 1995
  • In most database systems images have been indexed indirectly using related texts such as captions, annotations and image attributes. But there has been an increasing requirement for the image database system supporting the storage and retrieval of images directly by content using the information contained in the images. There has been a few indexing methods based on contents. Among them, Pertains proposed an image indexing method considering spatial relationships and properties of objects forming the images. This is the expansion of the other studies based on '2-D string. But this method needs too much storage space and lacks flexibility. In this paper, we propose a more flexible index structure based on kd-tree using paging techniques. We show an example of extracting keys using normalization from the from the raw image. Simulation results show that our method improves in flexibility and needs much less storage space.

  • PDF

A Generalized Method for Extracting Characters and Video Captions (일반화된 문자 및 비디오 자막 영역 추출 방법)

  • Chun, Byung-Tae;Bae, Young-Lae;Kim, Tai-Yun
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.6
    • /
    • pp.632-641
    • /
    • 2000
  • Conventional character extraction methods extract character regions using methods such as color reduction, region split and merge and texture analysis from the whole image. Because these methods use many heuristic variables and thresholding values derived from a priori knowledge, it is difficult to generalize them algorithmically. In this paper, we propose a method that can extract character regions using a topographical feature extraction method and a point-line-region extension method. The proposed method can also solve the problems of conventional methods by reducing heuristic variables and generalizing thresholding values. We see that character regions can be extracted by generalized variables and thresolding values without using a priori knowledge of character region. Experimental results show that the candidate region extraction rate is 100%, and the character region extraction rate is over 98%.

  • PDF

Web Image Caption Extraction using Positional Relation and Lexical Similarity (위치적 연관성과 어휘적 유사성을 이용한 웹 이미지 캡션 추출)

  • Lee, Hyoung-Gyu;Kim, Min-Jeong;Hong, Gum-Won;Rim, Hae-Chang
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.4
    • /
    • pp.335-345
    • /
    • 2009
  • In this paper, we propose a new web image caption extraction method considering the positional relation between a caption and an image and the lexical similarity between a caption and the main text containing the caption. The positional relation between a caption and an image represents how the caption is located with respect to the distance and the direction of the corresponding image. The lexical similarity between a caption and the main text indicates how likely the main text generates the caption of the image. Compared with previous image caption extraction approaches which only utilize the independent features of image and captions, the proposed approach can improve caption extraction recall rate, precision rate and 28% F-measure by including additional features of positional relation and lexical similarity.