• Title/Summary/Keyword: video fusion

Search Result 116, Processing Time 0.03 seconds

Video Expression Recognition Method Based on Spatiotemporal Recurrent Neural Network and Feature Fusion

  • Zhou, Xuan
    • Journal of Information Processing Systems
    • /
    • v.17 no.2
    • /
    • pp.337-351
    • /
    • 2021
  • Automatically recognizing facial expressions in video sequences is a challenging task because there is little direct correlation between facial features and subjective emotions in video. To overcome the problem, a video facial expression recognition method using spatiotemporal recurrent neural network and feature fusion is proposed. Firstly, the video is preprocessed. Then, the double-layer cascade structure is used to detect a face in a video image. In addition, two deep convolutional neural networks are used to extract the time-domain and airspace facial features in the video. The spatial convolutional neural network is used to extract the spatial information features from each frame of the static expression images in the video. The temporal convolutional neural network is used to extract the dynamic information features from the optical flow information from multiple frames of expression images in the video. A multiplication fusion is performed with the spatiotemporal features learned by the two deep convolutional neural networks. Finally, the fused features are input to the support vector machine to realize the facial expression classification task. The experimental results on cNTERFACE, RML, and AFEW6.0 datasets show that the recognition rates obtained by the proposed method are as high as 88.67%, 70.32%, and 63.84%, respectively. Comparative experiments show that the proposed method obtains higher recognition accuracy than other recently reported methods.

Gradient Fusion Method for Night Video Enhancement

  • Rao, Yunbo;Zhang, Yuhong;Gou, Jianping
    • ETRI Journal
    • /
    • v.35 no.5
    • /
    • pp.923-926
    • /
    • 2013
  • To resolve video enhancement problems, a novel method of gradient domain fusion wherein gradient domain frames of the background in daytime video are fused with nighttime video frames is proposed. To verify the superiority of the proposed method, it is compared to conventional techniques. The implemented output of our method is shown to offer enhanced visual quality.

Open Standard Based 3D Urban Visualization and Video Fusion

  • Enkhbaatar, Lkhagva;Kim, Seong-Sam;Sohn, Hong-Gyoo
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.28 no.4
    • /
    • pp.403-411
    • /
    • 2010
  • This research demonstrates a 3D virtual visualization of urban environment and video fusion for effective damage prevention and surveillance system using open standard. We present the visualization and interaction simulation method to increase the situational awareness and optimize the realization of environmental monitoring through the CCTV video and 3D virtual environment. New camera prototype was designed based on the camera frustum view model to project recorded video prospectively onto the virtual 3D environment. The demonstration was developed by the X3D, which is royalty-free open standard and run-time architecture, and it offers abilities to represent, control and share 3D spatial information via the internet browsers.

Anterior Cervical Discectomy and Fusion YouTube Videos as a Source of Patient Education

  • Ovenden, Christopher Dillon;Brooks, Francis Michael
    • Asian Spine Journal
    • /
    • v.12 no.6
    • /
    • pp.987-991
    • /
    • 2018
  • Study Design: Cross sectional study. Purpose: To assess the quality of anterior cervical discectomy and fusion (ACDF) videos available on YouTube and identify factors associated with video quality. Overview of Literature: Patients commonly use the internet as a source of information regarding their surgeries. However, there is currently limited information regarding the quality of online videos about ACDF. Methods: A search was performed on YouTube using the phrase 'anterior cervical discectomy and fusion.' The Journal of the American Medical Association (JAMA), DISCERN, and Health on the Net (HON) systems were used to rate the first 50 videos obtained. Information about each video was collected, including number of views, duration since the video was posted, percentage positivity (defined as number of likes the video received, divided by the total number of likes or dislikes of that video), number of comments, and the author of the video. Relationships between video quality and these factors were investigated. Results: The average number of views for each video was 96,239. The most common videos were those published by surgeons and those containing patient testimonies. Overall, the video quality was poor, with mean scores of 1.78/5 using the DISCERN criteria, 1.63/4 using the JAMA criteria, and 1.96/8 using the HON criteria. Surgeon authors' videos scored higher than patient testimony videos when reviewed using the HON or JAMA systems. However, no other factors were found to be associated with video quality. Conclusions: The quality of ACDF videos on YouTube is low, with the majority of videos produced by unreliable sources. Therefore, these YouTube videos should not be recommended as patient education tools for ACDF.

Dual-Stream Fusion and Graph Convolutional Network for Skeleton-Based Action Recognition

  • Hu, Zeyuan;Feng, Yiran;Lee, Eung-Joo
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.3
    • /
    • pp.423-430
    • /
    • 2021
  • Aiming Graph convolutional networks (GCNs) have achieved outstanding performances on skeleton-based action recognition. However, several problems remain in existing GCN-based methods, and the problem of low recognition rate caused by single input data information has not been effectively solved. In this article, we propose a Dual-stream fusion method that combines video data and skeleton data. The two networks respectively identify skeleton data and video data and fuse the probabilities of the two outputs to achieve the effect of information fusion. Experiments on two large dataset, Kinetics and NTU-RGBC+D Human Action Dataset, illustrate that our proposed method achieves state-of-the-art. Compared with the traditional method, the recognition accuracy is improved better.

The Sensory-Motor Fusion System for Object Tracking (이동 물체를 추적하기 위한 감각 운동 융합 시스템 설계)

  • Lee, Sang-Hee;Wee, Jae-Woo;Lee, Chong-Ho
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.52 no.3
    • /
    • pp.181-187
    • /
    • 2003
  • For the moving objects with environmental sensors such as object tracking moving robot with audio and video sensors, environmental information acquired from sensors keep changing according to movements of objects. In such case, due to lack of adaptability and system complexity, conventional control schemes show limitations on control performance, and therefore, sensory-motor systems, which can intuitively respond to various types of environmental information, are desirable. And also, to improve the system robustness, it is desirable to fuse more than two types of sensory information simultaneously. In this paper, based on Braitenberg's model, we propose a sensory-motor based fusion system, which can trace the moving objects adaptively to environmental changes. With the nature of direct connecting structure, sensory-motor based fusion system can control each motor simultaneously, and the neural networks are used to fuse information from various types of sensors. And also, even if the system receives noisy information from one sensor, the system still robustly works with information from other sensors which compensates the noisy information through sensor fusion. In order to examine the performance, sensory-motor based fusion model is applied to object-tracking four-foot robot equipped with audio and video sensors. The experimental results show that the sensory-motor based fusion system can tract moving objects robustly with simpler control mechanism than model-based control approaches.

A Programmable Multi-Format Video Decoder (프로그래머블 멀티 포맷 비디오 디코더)

  • Kim, Jaehyun;Park, Goo-man
    • Journal of Broadcast Engineering
    • /
    • v.20 no.6
    • /
    • pp.963-966
    • /
    • 2015
  • This paper introduces a programmable multi-format video decoder(MFD) to support HEVC(High Efficiency Video Coding) standard and for other video coding standards. The goal of the proposed MFD is the high-end FHD(Full High Definition) video decoder needed for a DTV(Digital Tele-Vision) SoC(System on Chip). The proposed platform consists of a hybrid architecture that is comprised of reconfigurable processors and flexible hardware accelerators to support the massive computational load and various kinds of video coding standards. The experimental results show that the proposed architecture is operating at a 300MHz clock that is capable of decoding HEVC bit-stream of FHD 30 frames per second.

Using the fusion of spatial and temporal features for malicious video classification (공간과 시간적 특징 융합 기반 유해 비디오 분류에 관한 연구)

  • Jeon, Jae-Hyun;Kim, Se-Min;Han, Seung-Wan;Ro, Yong-Man
    • The KIPS Transactions:PartB
    • /
    • v.18B no.6
    • /
    • pp.365-374
    • /
    • 2011
  • Recently, malicious video classification and filtering techniques are of practical interest as ones can easily access to malicious multimedia contents through the Internet, IPTV, online social network, and etc. Considerable research efforts have been made to developing malicious video classification and filtering systems. However, the malicious video classification and filtering is not still being from mature in terms of reliable classification/filtering performance. In particular, the most of conventional approaches have been limited to using only the spatial features (such as a ratio of skin regions and bag of visual words) for the purpose of malicious image classification. Hence, previous approaches have been restricted to achieving acceptable classification and filtering performance. In order to overcome the aforementioned limitation, we propose new malicious video classification framework that takes advantage of using both the spatial and temporal features that are readily extracted from a sequence of video frames. In particular, we develop the effective temporal features based on the motion periodicity feature and temporal correlation. In addition, to exploit the best data fusion approach aiming to combine the spatial and temporal features, the representative data fusion approaches are applied to the proposed framework. To demonstrate the effectiveness of our method, we collect 200 sexual intercourse videos and 200 non-sexual intercourse videos. Experimental results show that the proposed method increases 3.75% (from 92.25% to 96%) for classification of sexual intercourse video in terms of accuracy. Further, based on our experimental results, feature-level fusion approach (for fusing spatial and temporal features) is found to achieve the best classification accuracy.

CNN-based Visual/Auditory Feature Fusion Method with Frame Selection for Classifying Video Events

  • Choe, Giseok;Lee, Seungbin;Nang, Jongho
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.3
    • /
    • pp.1689-1701
    • /
    • 2019
  • In recent years, personal videos have been shared online due to the popular uses of portable devices, such as smartphones and action cameras. A recent report predicted that 80% of the Internet traffic will be video content by the year 2021. Several studies have been conducted on the detection of main video events to manage a large scale of videos. These studies show fairly good performance in certain genres. However, the methods used in previous studies have difficulty in detecting events of personal video. This is because the characteristics and genres of personal videos vary widely. In a research, we found that adding a dataset with the right perspective in the study improved performance. It has also been shown that performance improves depending on how you extract keyframes from the video. we selected frame segments that can represent video considering the characteristics of this personal video. In each frame segment, object, location, food and audio features were extracted, and representative vectors were generated through a CNN-based recurrent model and a fusion module. The proposed method showed mAP 78.4% performance through experiments using LSVC data.

A Study for Improved Human Action Recognition using Multi-classifiers (비디오 행동 인식을 위하여 다중 판별 결과 융합을 통한 성능 개선에 관한 연구)

  • Kim, Semin;Ro, Yong Man
    • Journal of Broadcast Engineering
    • /
    • v.19 no.2
    • /
    • pp.166-173
    • /
    • 2014
  • Recently, human action recognition have been developed for various broadcasting and video process. Since a video can consist of various scenes, keypoint approaches have been more attracted than template based methods for real application. Keypoint approahces tried to find regions having motion in video, and made 3-dimensional patches. Then, descriptors using histograms were computed from the patches, and a classifier based on machine learning method was applied to detect actions in video. However, a single classifier was difficult to handle various human actions. In order to improve this problem, approaches using multi classifiers were used to detect and to recognize objects. Thus, we propose a new human action recognition using decision-level fusion with support vector machine and sparse representation. The proposed method extracted descriptors based on keypoint approach from a video, and acquired results from each classifier for human action recognition. Then, we applied weights which were acquired by training stage to fuse each results from two classifiers. The experiment results in this paper show better result than a previous fusion method.