• Title/Summary/Keyword: 이미지 캡션

Search Result 24, Processing Time 0.02 seconds

A study on the Problems of Overcomputation in Deep Networks (심층 네트워크의 과계산 문제에 대한 고찰)

  • Park, Da-Sol;Son, Jeong-Woo;Kim, Sun-Joong;Cha, Jeong-Won
    • Annual Conference on Human and Language Technology
    • /
    • 2019.10a
    • /
    • pp.120-124
    • /
    • 2019
  • 딥러닝은 자연어처리, 이미지 처리, 음성인식 등에서 우수한 성능을 보이고 있다. 그렇지만 복잡한 인공신경망 내부에서 어떠한 동작이 일어나는지 검증하지 못하고 있다. 본 논문에서는 비디오 캡셔닝 분야에서 인공신경망 내부에서 어떠한 동작이 이루어지는지 검사한다. 이를 위해서 우리는 각 단계에서 출력층을 추가하였다. 출력된 결과를 검토하여 인공 신경망의 정상동작 여부를 검증한다. 우리는 한국어 MSR-VTT에 적용하여 우리의 방법을 평가하였다. 이러한 방법을 통해 인공 신경망의 동작을 이해하는데 도움을 줄 수 있을 것으로 기대된다.

  • PDF

Image Retrieval: Access and Use in Information Overload (이미지 검색: 정보과다 환경에서의 접근과 이용)

  • Park, Minsoo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.6
    • /
    • pp.703-708
    • /
    • 2022
  • Tables and figures in academic literature contain important and valuable information. Tables and figures represent the essence of the refined study, which is the closest to the raw dataset. If so, can researchers easily access and utilize these image data through the search system? In this study, we try to identify user perceptions and needs for image data through user and case studies. Through this study we also explore expected effects and utilizations of image search systems. It was found that the majority of researchers prefer a system that combines table and figure indexing functions with traditional search functions. They valued the provision of an advanced search function that would allow them to limit their searches to specific object types (pictures and tables). Overall, researchers discovered many potential uses of the system for indexing tables and figures. It has been shown to be helpful in finding special types of information for teaching, presentation, research and learning. It should be also noticed that the usefulness of these systems is highest when features are integrated into existing systems, seamlessly link to fulltexts, and include high-quality images with full captions. Expected effects and utilizations for user-centered image search systems are also discussed.

Design and Implementation of MPEG-2 Compressed Video Information Management System (MPEG-2 압축 동영상 정보 관리 시스템의 설계 및 구현)

  • Heo, Jin-Yong;Kim, In-Hong;Bae, Jong-Min;Kang, Hyun-Syug
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.6
    • /
    • pp.1431-1440
    • /
    • 1998
  • Video data are retrieved and stored in various compressed forms according to their characteristics, In this paper, we present a generic data model that captures the structure of a video document and that provides a means for indexing a video stream, Using this model, we design and implement CVIMS (the MPEG-2 Compressed Video Information Management System) to store and retrieve video documents, CVIMS extracts I-frames from MPEG-2 files, selects key-frames from the I -frames, and stores in database the index information such as thumbnails, captions, and picture descriptors of the key-frames, And also, CVIMS retrieves MPEG- 2 video data using the thumbnails of key-frames and v31ious labels of queries.

  • PDF

Parallel Injection Method for Improving Descriptive Performance of Bi-GRU Image Captions (Bi-GRU 이미지 캡션의 서술 성능 향상을 위한 Parallel Injection 기법 연구)

  • Lee, Jun Hee;Lee, Soo Hwan;Tae, Soo Ho;Seo, Dong Hoan
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.11
    • /
    • pp.1223-1232
    • /
    • 2019
  • The injection is the input method of the image feature vector from the encoder to the decoder. Since the image feature vector contains object details such as color and texture, it is essential to generate image captions. However, the bidirectional decoder model using the existing injection method only inputs the image feature vector in the first step, so image feature vectors of the backward sequence are vanishing. This problem makes it difficult to describe the context in detail. Therefore, in this paper, we propose the parallel injection method to improve the description performance of image captions. The proposed Injection method fuses all embeddings and image vectors to preserve the context. Also, We optimize our image caption model with Bidirectional Gated Recurrent Unit (Bi-GRU) to reduce the amount of computation of the decoder. To validate the proposed model, experiments were conducted with a certified image caption dataset, demonstrating excellence in comparison with the latest models using BLEU and METEOR scores. The proposed model improved the BLEU score up to 20.2 points and the METEOR score up to 3.65 points compared to the existing caption model.