Search | Korea Science

Analysis of Research Trends in Deep Learning-Based Video Captioning (딥러닝 기반 비디오 캡셔닝의 연구동향 분석)

Lyu Zhi;Eunju Lee;Youngsoo Kim
- KIPS Transactions on Software and Data Engineering
- /
- v.13 no.1
- /
- pp.35-49
- /
- 2024
Video captioning technology, as a significant outcome of the integration between computer vision and natural language processing, has emerged as a key research direction in the field of artificial intelligence. This technology aims to achieve automatic understanding and language expression of video content, enabling computers to transform visual information in videos into textual form. This paper provides an initial analysis of the research trends in deep learning-based video captioning and categorizes them into four main groups: CNN-RNN-based Model, RNN-RNN-based Model, Multimodal-based Model, and Transformer-based Model, and explain the concept of each video captioning model. The features, pros and cons were discussed. This paper lists commonly used datasets and performance evaluation methods in the video captioning field. The dataset encompasses diverse domains and scenarios, offering extensive resources for the training and validation of video captioning models. The model performance evaluation method mentions major evaluation indicators and provides practical references for researchers to evaluate model performance from various angles. Finally, as future research tasks for video captioning, there are major challenges that need to be continuously improved, such as maintaining temporal consistency and accurate description of dynamic scenes, which increase the complexity in real-world applications, and new tasks that need to be studied are presented such as temporal relationship modeling and multimodal data integration.
https://doi.org/10.3745/KTSDE.2024.13.1.35 인용 PDF

Content analysis of real-time simulation video observation records about cases of patients with chronic obstructive pulmonary disease-focusing on nursing skills performance (만성폐쇄성폐질환 환자 사례에 대한 실시간 시뮬레이션 동영상 관찰기록 내용분석-간호술 수행을 중심으로)

Hong, Ji-Yeon;Park, Jin-Ah
- Journal of Convergence for Information Technology
- /
- v.12 no.5
- /
- pp.40-50
- /
- 2022
This study is a qualitative study in which nursing students analyzed the contents of nursing skills recorded on a structured video observation record sheet while observing a colleague team's real-time video of chronic obstructive pulmonary disease scenario implementation during simulation practice. As a result of the analysis using the content analysis method, categories and topics of effective and ineffective aspects were derived in the areas of observation record: accuracy of procedures, adherence to aseptic technique, consideration of safety and safety, explanation and education, and purpose explanation and method education. This study is meaningful in that it presents factors that can increase the efficiency of nursing education through simulation-based practice.
https://doi.org/10.22156/CS4SMB.2022.12.05.040 인용 PDF KSCI

Design and Implementation of a Realistic Multi-View Scalable Video Coding Scheme (실감형 다시점 스케일러블 비디오 코딩 방법의 설계 및 구현)

Park, Min-Woo;Park, Gwang-Hoon
- Journal of Broadcast Engineering
- /
- v.14 no.6
- /
- pp.703-720
- /
- 2009
This paper proposes a realistic multi-view scalable video coding scheme designed for user's interest in 3D content services and the usage in the future computing environment. Future video coding schemes should support realistic services that make users feel the 3-D presence through stereoscopic or multi-view videos, as well as to accomplish the so-called one-source multi-use services in order to comprehensively support diverse transmission environments and terminals. Unlike the most of video coding methods which only support two-dimensional display, the proposed coding scheme in this paper is the method which can support such realistic services. This paper designs and also implements the proposed coding scheme through integrating Multi-view Video Coding scheme and Scalable Video Coding scheme, then shows its possibility of realization of 3D services by the simulation. The simulation results show the proposed structure remarkably improves the performance of random access with almost the same coding efficiency.
https://doi.org/10.5909/JBE.2009.14.6.703 인용 PDF KSCI

Cache-Friendly Adaptive Video Streaming Framework Exploiting Regular Expression in Content Centric Networks (콘텐트 중심 네트워크에서 정규표현식을 활용한 캐시친화적인 적응형 스트리밍 프레임워크)

Son, Donghyun;Choi, Daejin;Choi, Nakjung;Song, Junghwan;Kwon, Ted Taekyoung
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.40 no.9
- /
- pp.1776-1785
- /
- 2015
Content Centric Network (CCN) has been introduced as a new paradigm due to a shift of users's perspective of using Internet from host-centric to content-centric. On the other hand, a demand for video streaming has been increasing. Thus, Adaptive streaming has been introduced and researched for achieving higher user's satisfaction. If an architecture of Internet is replaced with CCN architecture, it is necessary to consider adaptive video streaming in CCN according to the demand of users. However, if the same rate decision algorithm used in Internet is deployed in CCN, there are a limitation of utilizing content store (CS) in CCN router and a problem of reflecting dynamic requirements. Therefore, this paper presents a framework adequate to CCN protocol and cache utilization, adapting content naming method of exploiting regular expression to the rate decision algorithm of the existing adaptive streaming. In addition, it also improves the quality of video streaming and verifies the performance through dynamic expression strategies and selection algorithm of the strategies.
https://doi.org/10.7840/kics.2015.40.9.1776 인용 PDF KSCI

A Study on Online Sharing Platforms and Sub-Contents in the Field of the Performing Arts - Focusing on the Case of 『Cirque du Soleil Entertainment』 (공연예술분야 온라인 공유 플랫폼 및 서브 콘텐츠 연구 - 『태양의 서커스 엔터테인먼트』 사례를 중심으로)

Kim, Ga-Eun;Park, Jin-Won
- The Journal of the Korea Contents Association
- /
- v.22 no.2
- /
- pp.22-34
- /
- 2022
This study examines the forms and current status of online performance content production in the field of the performing arts through diversified video media platforms. For this, it studied the leading case of Cirque du Soleil Entertainment and analyzed the unique brand value innovation elements of Cirque du Soleil, the background and current status of the digital hub platform of "Cirque Connect", and its various sub-contents that have diversified original contents. Digital platform applications and sub-content production in the field of the performing arts require an understanding of the needs of the public, who are familiar with media content appreciation, and strategic planning that takes into consideration everything from the initial stages of performance planning to the creation of varied sub-contents. This will promote the improvement of sub-content quality and increase the product value of digital contents in the performing arts through distinctions made from other various forms of cultural and artistic contents. environments in which information from various perspectives related to performance works can easily be accessed through online platforms will enhance the popularity of the performing arts field and allow the performing arts industry to expand its base in rapidly changing cultural enjoyment methods. For the performing arts field to be competitive within cultural trends that are being diversified, the most important tasks to be completed are gaining brand value innovation that enhances the artistic and cultural value of performance works and based on this, producing various sub-contents.
https://doi.org/10.5392/JKCA.2022.22.02.022 인용 PDF KSCI HTML

Clustering-based Hierarchical Scene Structure Construction for Movie Videos (영화 비디오를 위한 클러스터링 기반의 계층적 장면 구조 구축)

Choi, Ick-Won;Byun, Hye-Ran
- Journal of KIISE:Software and Applications
- /
- v.27 no.5
- /
- pp.529-542
- /
- 2000
Recent years, the use of multimedia information is rapidly increasing, and the video media is the most rising one than any others, and this field Integrates all the media into a single data stream. Though the availability of digital video is raised largely, it is very difficult for users to make the effective video access, due to its length and unstructured video format. Thus, the minimal interaction of users and the explicit definition of video structure is a key requirement in the lately developing image and video management systems. This paper defines the terms and hierarchical video structure, and presents the system, which construct the clustering-based video hierarchy, which facilitate users by browsing the summary and do a random access to the video content. Instead of using a single feature and domain-specific thresholds, we use multiple features that have complementary relationship for each other and clustering-based methods that use normalization so as to interact with users minimally. The stage of shot boundary detection extracts multiple features, performs the adaptive filtering process for each features to enhance the performance by eliminating the false factors, and does k-means clustering with two classes. The shot list of a result after the proposed procedure is represented as the video hierarchy by the intelligent unsupervised clustering technique. We experimented the static and the dynamic movie videos that represent characteristics of various video types. In the result of shot boundary detection, we had almost more than 95% good performance, and had also rood result in the video hierarchy.
PDF

Improvement of Retrieval Performance Using Adaptive Weighting of Key Frame Features (키 프레임 특징들에 적응적 가중치 부여를 이용한 검색 성능 개선)

Kim, Kang-Wook
- Journal of Korea Multimedia Society
- /
- v.17 no.1
- /
- pp.26-33
- /
- 2014
Video retrieval and indexing are performed by comparing feature similarities between key frames in shot after detecting a scene change and extracting key frames from the shot. Typical image features such as color, shape, and texture are used in content-based video and image retrieval. Many approaches for integrating these features have been studied. However, the issue of these approaches is how to appropriately assign weighting of key frame features at query time. Therefore, we propose a new video retrieval method using adaptively weighted image features. We performed computer simulations in test databases which consist of various kinds of key frames. The experimental results show that the proposed method has better performance than previous works in respect to several performance evaluations such as precision vs. recall, retrieval efficiency, and ranking measure.
https://doi.org/10.9717/kmms.2014.17.1.026 인용 PDF KSCI KPUBS

Fake News Detection on YouTube Using Related Video Information (관련 동영상 정보를 활용한 YouTube 가짜뉴스 탐지 기법)

Junho Kim;Yongjun Shin;Hyunchul Ahn
- Journal of Intelligence and Information Systems
- /
- v.29 no.3
- /
- pp.19-36
- /
- 2023
As advances in information and communication technology have made it easier for anyone to produce and disseminate information, a new problem has emerged: fake news, which is false information intentionally shared to mislead people. Initially spread mainly through text, fake news has gradually evolved and is now distributed in multimedia formats. Since its founding in 2005, YouTube has become the world's leading video platform and is used by most people worldwide. However, it has also become a primary source of fake news, causing social problems. Various researchers have been working on detecting fake news on YouTube. There are content-based and background information-based approaches to fake news detection. Still, content-based approaches are dominant when looking at conventional fake news research and YouTube fake news detection research. This study proposes a fake news detection method based on background information rather than content-based fake news detection. In detail, we suggest detecting fake news by utilizing related video information from YouTube. Specifically, the method detects fake news through CNN, a deep learning network, from the vectorized information obtained from related videos and the original video using Doc2vec, an embedding technique. The empirical analysis shows that the proposed method has better prediction performance than the existing content-based approach to detecting fake news on YouTube. The proposed method in this study contributes to making our society safer and more reliable by preventing the spread of fake news on YouTube, which is highly contagious.
https://doi.org/10.13088/jiis.2023.29.3.019 인용 PDF

A Personal Video Event Classification Method based on Multi-Modalities by DNN-Learning (DNN 학습을 이용한 퍼스널 비디오 시퀀스의 멀티 모달 기반 이벤트 분류 방법)

Lee, Yu Jin;Nang, Jongho
- Journal of KIISE
- /
- v.43 no.11
- /
- pp.1281-1297
- /
- 2016
In recent years, personal videos have seen a tremendous growth due to the substantial increase in the use of smart devices and networking services in which users create and share video content easily without many restrictions. However, taking both into account would significantly improve event detection performance because videos generally have multiple modalities and the frame data in video varies at different time points. This paper proposes an event detection method. In this method, high-level features are first extracted from multiple modalities in the videos, and the features are rearranged according to time sequence. Then the association of the modalities is learned by means of DNN to produce a personal video event detector. In our proposed method, audio and image data are first synchronized and then extracted. Then, the result is input into GoogLeNet as well as Multi-Layer Perceptron (MLP) to extract high-level features. The results are then re-arranged in time sequence, and every video is processed to extract one feature each for training by means of DNN.
https://doi.org/10.5626/JOK.2016.43.11.1281 인용 KSCI

Video Summarization Using Importance-based Fuzzy One-Class Support Vector Machine (중요도 기반 퍼지 원 클래스 서포트 벡터 머신을 이용한 비디오 요약 기술)

Kim, Ki-Joo;Choi, Young-Sik
- Journal of Internet Computing and Services
- /
- v.12 no.5
- /
- pp.87-100
- /
- 2011
In this paper, we address a video summarization task as generating both visually salient and semantically important video segments. In order to find salient data points, one can use the OC-SVM (One-class Support Vector Machine), which is well known for novelty detection problems. It is, however, hard to incorporate into the OC-SVM process the importance measure of data points, which is crucial for video summarization. In order to integrate the importance of each point in the OC-SVM process, we propose a fuzzy version of OC-SVM. The Importance-based Fuzzy OC-SVM weights data points according to the importance measure of the video segments and then estimates the support of a distribution of the weighted feature vectors. The estimated support vectors form the descriptive segments that best delineate the underlying video content in terms of the importance and salience of video segments. We demonstrate the performance of our algorithm on several synthesized data sets and different types of videos in order to show the efficacy of the proposed algorithm. Experimental results showed that our approach outperformed the well known traditional method.
PDF KSCI

Search Result 195, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)