• Title/Summary/Keyword: 비디오 캡션

Search Result 16, Processing Time 0.024 seconds

Sports Highlight Abstraction (스포츠 하이라이트 생성)

  • Kim, Mi-Ho;Shin, Seong-Yoon;Jeon, Keun-Hwan;Rhee, Yang-Weon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2001.04b
    • /
    • pp.1233-1236
    • /
    • 2001
  • 다양한 장르의 비디오 데이터에서 하이라이트를 생성하는 것은 짧게 요약된 하이라이트 비디오 신을 생성하고자 하는 멀티미디어 컨텐츠의 제작자나 사용자에게 중요한 역할을 제공한다. 본 논문에서는 새로운 비디오 하이라이트의 생성 방법과 내용 기반, 즉 이벤트 기반의 비디오 인덱싱 방법을 제시한다. 경기 종목으로는 골을 넣어 득점하는 축구, 농구 그리고 핸드볼을 대상으로 하였으며 골을 넣어 득점하는 하이라이트 샷을 추출하기 위해 이벤트 규칙을 사용하였다. 비디오 인덱싱에서는 비디오 데이터 자체의 시각적 정보와 캡션 정보를 모두 이용하였다.

  • PDF

A Method for Recovering Text Regions in Video using Extended Block Matching and Region Compensation (확장적 블록 정합 방법과 영역 보상법을 이용한 비디오 문자 영역 복원 방법)

  • 전병태;배영래
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.11
    • /
    • pp.767-774
    • /
    • 2002
  • Conventional research on image restoration has focused on restoring degraded images resulting from image formation, storage and communication, mainly in the signal processing field. Related research on recovering original image information of caption regions includes a method using BMA(block matching algorithm). The method has problem with frequent incorrect matching and propagating the errors by incorrect matching. Moreover, it is impossible to recover the frames between two scene changes when scene changes occur more than twice. In this paper, we propose a method for recovering original images using EBMA(Extended Block Matching Algorithm) and a region compensation method. To use it in original image recovery, the method extracts a priori knowledge such as information about scene changes, camera motion and caption regions. The method decides the direction of recovery using the extracted caption information(the start and end frames of a caption) and scene change information. According to the direction of recovery, the recovery is performed in units of character components using EBMA and the region compensation method. Experimental results show that EBMA results in good recovery regardless of the speed of moving object and complexity of background in video. The region compensation method recovered original images successfully, when there is no information about the original image to refer to.

Analysis of Research Trends in Deep Learning-Based Video Captioning (딥러닝 기반 비디오 캡셔닝의 연구동향 분석)

  • Lyu Zhi;Eunju Lee;Youngsoo Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.13 no.1
    • /
    • pp.35-49
    • /
    • 2024
  • Video captioning technology, as a significant outcome of the integration between computer vision and natural language processing, has emerged as a key research direction in the field of artificial intelligence. This technology aims to achieve automatic understanding and language expression of video content, enabling computers to transform visual information in videos into textual form. This paper provides an initial analysis of the research trends in deep learning-based video captioning and categorizes them into four main groups: CNN-RNN-based Model, RNN-RNN-based Model, Multimodal-based Model, and Transformer-based Model, and explain the concept of each video captioning model. The features, pros and cons were discussed. This paper lists commonly used datasets and performance evaluation methods in the video captioning field. The dataset encompasses diverse domains and scenarios, offering extensive resources for the training and validation of video captioning models. The model performance evaluation method mentions major evaluation indicators and provides practical references for researchers to evaluate model performance from various angles. Finally, as future research tasks for video captioning, there are major challenges that need to be continuously improved, such as maintaining temporal consistency and accurate description of dynamic scenes, which increase the complexity in real-world applications, and new tasks that need to be studied are presented such as temporal relationship modeling and multimodal data integration.

Data Model, Query Language, and Indexing Scheme for Structured Video Documents (구조화된 비디오 문서의 데이터 모델 및 질의어와 색인 기법)

  • 류은숙;이규철
    • Journal of Korea Multimedia Society
    • /
    • v.1 no.1
    • /
    • pp.1-17
    • /
    • 1998
  • Video information is an important component of multimedia systems such as Digital Library, World-Wide Web (WWW), and Video-On-Demand (VOD) service system. Video information has hierarchical document structure inherently, so it is named "structure video document" in this paper. This paper proposes a data model, a query language, and an indexing scheme for structured video documents in order to store, retrieve, and share video documents efficiently. In representing structured video documents, the object-oriented data modeling technique is used since the hierarchical structure information can be modeled as complex objects. We also define object types for the structure information. Our query language supports not only content-based retrieval, which means the queries based on the structure of video documents, and spatial/temporal relation for video documents. In order to perform structure queries efficiently, as well as to reduce the storage overhead of indices, an optimized inverted index structure is proposed.

  • PDF

A Study on the Alternative Method of Video Characteristics Using Captioning in Text-Video Retrieval Model (텍스트-비디오 검색 모델에서의 캡션을 활용한 비디오 특성 대체 방안 연구)

  • Dong-hun, Lee;Chan, Hur;Hyeyoung, Park;Sang-hyo, Park
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.6
    • /
    • pp.347-353
    • /
    • 2022
  • In this paper, we propose a method that performs a text-video retrieval model by replacing video properties using captions. In general, the exisiting embedding-based models consist of both joint embedding space construction and the CNN-based video encoding process, which requires a lot of computation in the training as well as the inference process. To overcome this problem, we introduce a video-captioning module to replace the visual property of video with captions generated by the video-captioning module. To be specific, we adopt the caption generator that converts candidate videos into captions in the inference process, thereby enabling direct comparison between the text given as a query and candidate videos without joint embedding space. Through the experiment, the proposed model successfully reduces the amount of computation and inference time by skipping the visual processing process and joint embedding space construction on two benchmark dataset, MSR-VTT and VATEX.

A study on the Problems of Overcomputation in Deep Networks (심층 네트워크의 과계산 문제에 대한 고찰)

  • Park, Da-Sol;Son, Jeong-Woo;Kim, Sun-Joong;Cha, Jeong-Won
    • Annual Conference on Human and Language Technology
    • /
    • 2019.10a
    • /
    • pp.120-124
    • /
    • 2019
  • 딥러닝은 자연어처리, 이미지 처리, 음성인식 등에서 우수한 성능을 보이고 있다. 그렇지만 복잡한 인공신경망 내부에서 어떠한 동작이 일어나는지 검증하지 못하고 있다. 본 논문에서는 비디오 캡셔닝 분야에서 인공신경망 내부에서 어떠한 동작이 이루어지는지 검사한다. 이를 위해서 우리는 각 단계에서 출력층을 추가하였다. 출력된 결과를 검토하여 인공 신경망의 정상동작 여부를 검증한다. 우리는 한국어 MSR-VTT에 적용하여 우리의 방법을 평가하였다. 이러한 방법을 통해 인공 신경망의 동작을 이해하는데 도움을 줄 수 있을 것으로 기대된다.

  • PDF