Search | Korea Science

Video Captioning with Visual and Semantic Features

Lee, Sujin;Kim, Incheol
- Journal of Information Processing Systems
- /
- v.14 no.6
- /
- pp.1318-1330
- /
- 2018
Video captioning refers to the process of extracting features from a video and generating video captions using the extracted features. This paper introduces a deep neural network model and its learning method for effective video captioning. In this study, visual features as well as semantic features, which effectively express the video, are also used. The visual features of the video are extracted using convolutional neural networks, such as C3D and ResNet, while the semantic features are extracted using a semantic feature extraction network proposed in this paper. Further, an attention-based caption generation network is proposed for effective generation of video captions using the extracted features. The performance and effectiveness of the proposed model is verified through various experiments using two large-scale video benchmarks such as the Microsoft Video Description (MSVD) and the Microsoft Research Video-To-Text (MSR-VTT).
https://doi.org/10.3745/JIPS.02.0098 인용 PDF KSCI HTML

A Multiple Features Video Copy Detection Algorithm Based on a SURF Descriptor

Hou, Yanyan;Wang, Xiuzhen;Liu, Sanrong
- Journal of Information Processing Systems
- /
- v.12 no.3
- /
- pp.502-510
- /
- 2016
Considering video copy transform diversity, a multi-feature video copy detection algorithm based on a Speeded-Up Robust Features (SURF) local descriptor is proposed in this paper. Video copy coarse detection is done by an ordinal measure (OM) algorithm after the video is preprocessed. If the matching result is greater than the specified threshold, the video copy fine detection is done based on a SURF descriptor and a box filter is used to extract integral video. In order to improve video copy detection speed, the Hessian matrix trace of the SURF descriptor is used to pre-match, and dimension reduction is done to the traditional SURF feature vector for video matching. Our experimental results indicate that video copy detection precision and recall are greatly improved compared with traditional algorithms, and that our proposed multiple features algorithm has good robustness and discrimination accuracy, as it demonstrated that video detection speed was also improved.
https://doi.org/10.3745/JIPS.02.0042 인용 PDF KSCI

Novel Intent based Dimension Reduction and Visual Features Semi-Supervised Learning for Automatic Visual Media Retrieval

kunisetti, Subramanyam;Ravichandran, Suban
- International Journal of Computer Science & Network Security
- /
- v.22 no.6
- /
- pp.230-240
- /
- 2022
Sharing of online videos via internet is an emerging and important concept in different types of applications like surveillance and video mobile search in different web related applications. So there is need to manage personalized web video retrieval system necessary to explore relevant videos and it helps to peoples who are searching for efficient video relates to specific big data content. To evaluate this process, attributes/features with reduction of dimensionality are computed from videos to explore discriminative aspects of scene in video based on shape, histogram, and texture, annotation of object, co-ordination, color and contour data. Dimensionality reduction is mainly depends on extraction of feature and selection of feature in multi labeled data retrieval from multimedia related data. Many of the researchers are implemented different techniques/approaches to reduce dimensionality based on visual features of video data. But all the techniques have disadvantages and advantages in reduction of dimensionality with advanced features in video retrieval. In this research, we present a Novel Intent based Dimension Reduction Semi-Supervised Learning Approach (NIDRSLA) that examine the reduction of dimensionality with explore exact and fast video retrieval based on different visual features. For dimensionality reduction, NIDRSLA learns the matrix of projection by increasing the dependence between enlarged data and projected space features. Proposed approach also addressed the aforementioned issue (i.e. Segmentation of video with frame selection using low level features and high level features) with efficient object annotation for video representation. Experiments performed on synthetic data set, it demonstrate the efficiency of proposed approach with traditional state-of-the-art video retrieval methodologies.
https://doi.org/10.22937/IJCSNS.2022.22.6.32 인용 PDF KSCI

Video Quality Assessment based on Deep Neural Network

Zhiming Shi
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.17 no.8
- /
- pp.2053-2067
- /
- 2023
This paper proposes two video quality assessment methods based on deep neural network. (i)The first method uses the IQF-CNN (convolution neural network based on image quality features) to build image quality assessment method. The LIVE image database is used to test this method, the experiment show that it is effective. Therefore, this method is extended to the video quality assessment. At first every image frame of video is predicted, next the relationship between different image frames are analyzed by the hysteresis function and different window function to improve the accuracy of video quality assessment. (ii)The second method proposes a video quality assessment method based on convolution neural network (CNN) and gated circular unit network (GRU). First, the spatial features of video frames are extracted using CNN network, next the temporal features of the video frame using GRU network. Finally the extracted temporal and spatial features are analyzed by full connection layer of CNN network to obtain the video quality assessment score. All the above proposed methods are verified on the video databases, and compared with other methods.
https://doi.org/10.3837/tiis.2023.08.005 인용 PDF HTML

Video Expression Recognition Method Based on Spatiotemporal Recurrent Neural Network and Feature Fusion

Zhou, Xuan
- Journal of Information Processing Systems
- /
- v.17 no.2
- /
- pp.337-351
- /
- 2021
Automatically recognizing facial expressions in video sequences is a challenging task because there is little direct correlation between facial features and subjective emotions in video. To overcome the problem, a video facial expression recognition method using spatiotemporal recurrent neural network and feature fusion is proposed. Firstly, the video is preprocessed. Then, the double-layer cascade structure is used to detect a face in a video image. In addition, two deep convolutional neural networks are used to extract the time-domain and airspace facial features in the video. The spatial convolutional neural network is used to extract the spatial information features from each frame of the static expression images in the video. The temporal convolutional neural network is used to extract the dynamic information features from the optical flow information from multiple frames of expression images in the video. A multiplication fusion is performed with the spatiotemporal features learned by the two deep convolutional neural networks. Finally, the fused features are input to the support vector machine to realize the facial expression classification task. The experimental results on cNTERFACE, RML, and AFEW6.0 datasets show that the recognition rates obtained by the proposed method are as high as 88.67%, 70.32%, and 63.84%, respectively. Comparative experiments show that the proposed method obtains higher recognition accuracy than other recently reported methods.
https://doi.org/10.3745/JIPS.01.0067 인용 PDF KSCI

Design and Implementation of the Video Query Processing Engine for Content-Based Query Processing (내용기반 질의 처리를 위한 동영상 질의 처리기의 설계 및 구현)

Jo, Eun-Hui;Kim, Yong-Geol;Lee, Hun-Sun;Jeong, Yeong-Eun;Jin, Seong-Il
- The Transactions of the Korea Information Processing Society
- /
- v.6 no.3
- /
- pp.603-614
- /
- 1999
As multimedia application services on high-speed information network have been rapidly developed, the need for the video information management system that provides an efficient way for users to retrieve video data is growing. In this paper, we propose a video data model that integrates free annotations, image features, and spatial-temporal features for video purpose of improving content-based retrieval of video data. The proposed video data model can act as a generic video data model for multimedia applications, and support free annotations, image features, spatial-temporal features, and structure information of video data within the same framework. We also propose the video query language for efficiently providing query specification to access video clips in the video data. It can formalize various kinds of queries based on the video contents. Finally we design and implement the query processing engine for efficient video data retrieval on the proposed metadata model and the proposed video query language.
PDF

Action Recognition Method in Sports Video Shear Based on Fish Swarm Algorithm

Jie Sun;Lin Lu
- Journal of Information Processing Systems
- /
- v.19 no.4
- /
- pp.554-562
- /
- 2023
This research offers a sports video action recognition approach based on the fish swarm algorithm in light of the low accuracy of existing sports video action recognition methods. A modified fish swarm algorithm is proposed to construct invariant features and decrease the dimension of features. Based on this algorithm, local features and global features can be classified. The experimental findings on the typical sports action data set demonstrate that the key details of sports action can be successfully retained by the dimensionality-reduced fusion invariant characteristics. According to this research, the average recognition time of the proposed method for walking, running, squatting, sitting, and bending is less than 326 seconds, and the average recognition rate is higher than 94%. This proves that this method can significantly improve the performance and efficiency of online sports video motion recognition.
https://doi.org/10.3745/JIPS.04.0285 인용 PDF

An Efficient Video Retrieval Algorithm Using Color and Edge Features

Kim Sang-Hyun
- Journal of the Institute of Convergence Signal Processing
- /
- v.7 no.1
- /
- pp.11-16
- /
- 2006
To manipulate large video databases, effective video indexing and retrieval are required. A large number of video indexing and retrieval algorithms have been presented for frame-w]so user query or video content query whereas a relatively few video sequence matching algorithms have been proposed for video sequence query. In this paper, we propose an efficient algorithm to extract key frames using color histograms and to match the video sequences using edge features. To effectively match video sequences with low computational load, we make use of the key frames extracted by the cumulative measure and the distance between key frames, and compare two sets of key frames using the modified Hausdorff distance. Experimental results with several real sequences show that the proposed video retrieval algorithm using color and edge features yields the higher accuracy and performance than conventional methods such as histogram difference, Euclidean metric, Battachaya distance, and directed divergence methods.
PDF

Video Indexing using Motion vector and brightness features (움직임 벡터와 빛의 특징을 이용한 비디오 인덱스)

이재현;조진선
- Journal of the Korea Society of Computer and Information
- /
- v.3 no.4
- /
- pp.27-34
- /
- 1998
In this paper we present a method for automatic motion vector and brightness based video indexing and retrieval. We extract a representational frame from each shot and compute some motion vector and brightness based features. For each R-frame we compute the optical flow field; motion vector features are then derived from this flow field, BMA(block matching algorithm) is used to find motion vectors and Brightness features are related to the cut detection of method brightness histogram. A video database provided contents based access to video. This is achieved by organizing or indexing video data based on some set of features. In this paper the index of features is based on a B+ search tree. It consists of internal and leaf nodes stores in a direct access a storage device. This paper defines the problem of video indexing based on video data models.
PDF

Creation of Soccer Video Highlight Using The Structural Features of Caption (자막의 구조적 특징을 이용한 축구 비디오 하이라이트 생성)

Huh, Moon-Haeng;Shin, Seong-Yoon;Lee, Yang-Weon;Ryu, Keun-Ho
- The KIPS Transactions:PartD
- /
- v.10D no.4
- /
- pp.671-678
- /
- 2003
A digital video is usually very long temporally. requiring large storage capacity. Therefore, users want to watch pre-summarized video before they watch a large long video. Especially in the field of sports video, they want to watch a highlight video. Consequently, highlight video is used that the viewers decide whether it is valuable for them to watch the video or not. This paper proposes how to create soccer video highlight using the structural features of the caption such as temporal and spatial features. Caption frame intervals and caption key frames are extracted by using those structural features. And then, highlight video is created by using scene relocation, logical indexing and highlight creation rule. Finally. retrieval and browsing of highlight and video segment is performed by selection of item on browser.
https://doi.org/10.3745/KIPSTD.2003.10D.4.671 인용 PDF KSCI

Search Result 682, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)