• 제목/요약/키워드: video action recognition

검색결과 64건 처리시간 0.027초

Action Recognition Method in Sports Video Shear Based on Fish Swarm Algorithm

  • Jie Sun;Lin Lu
    • Journal of Information Processing Systems
    • /
    • 제19권4호
    • /
    • pp.554-562
    • /
    • 2023
  • This research offers a sports video action recognition approach based on the fish swarm algorithm in light of the low accuracy of existing sports video action recognition methods. A modified fish swarm algorithm is proposed to construct invariant features and decrease the dimension of features. Based on this algorithm, local features and global features can be classified. The experimental findings on the typical sports action data set demonstrate that the key details of sports action can be successfully retained by the dimensionality-reduced fusion invariant characteristics. According to this research, the average recognition time of the proposed method for walking, running, squatting, sitting, and bending is less than 326 seconds, and the average recognition rate is higher than 94%. This proves that this method can significantly improve the performance and efficiency of online sports video motion recognition.

ADD-Net: Attention Based 3D Dense Network for Action Recognition

  • Man, Qiaoyue;Cho, Young Im
    • 한국컴퓨터정보학회논문지
    • /
    • 제24권6호
    • /
    • pp.21-28
    • /
    • 2019
  • Recent years with the development of artificial intelligence and the success of the deep model, they have been deployed in all fields of computer vision. Action recognition, as an important branch of human perception and computer vision system research, has attracted more and more attention. Action recognition is a challenging task due to the special complexity of human movement, the same movement may exist between multiple individuals. The human action exists as a continuous image frame in the video, so action recognition requires more computational power than processing static images. And the simple use of the CNN network cannot achieve the desired results. Recently, the attention model has achieved good results in computer vision and natural language processing. In particular, for video action classification, after adding the attention model, it is more effective to focus on motion features and improve performance. It intuitively explains which part the model attends to when making a particular decision, which is very helpful in real applications. In this paper, we proposed a 3D dense convolutional network based on attention mechanism(ADD-Net), recognition of human motion behavior in the video.

Two-Stream Convolutional Neural Network for Video Action Recognition

  • Qiao, Han;Liu, Shuang;Xu, Qingzhen;Liu, Shouqiang;Yang, Wanggan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권10호
    • /
    • pp.3668-3684
    • /
    • 2021
  • Video action recognition is widely used in video surveillance, behavior detection, human-computer interaction, medically assisted diagnosis and motion analysis. However, video action recognition can be disturbed by many factors, such as background, illumination and so on. Two-stream convolutional neural network uses the video spatial and temporal models to train separately, and performs fusion at the output end. The multi segment Two-Stream convolutional neural network model trains temporal and spatial information from the video to extract their feature and fuse them, then determine the category of video action. Google Xception model and the transfer learning is adopted in this paper, and the Xception model which trained on ImageNet is used as the initial weight. It greatly overcomes the problem of model underfitting caused by insufficient video behavior dataset, and it can effectively reduce the influence of various factors in the video. This way also greatly improves the accuracy and reduces the training time. What's more, to make up for the shortage of dataset, the kinetics400 dataset was used for pre-training, which greatly improved the accuracy of the model. In this applied research, through continuous efforts, the expected goal is basically achieved, and according to the study and research, the design of the original dual-flow model is improved.

A New Residual Attention Network based on Attention Models for Human Action Recognition in Video

  • Kim, Jee-Hyun;Cho, Young-Im
    • 한국컴퓨터정보학회논문지
    • /
    • 제25권1호
    • /
    • pp.55-61
    • /
    • 2020
  • 딥 러닝 기술의 발전과 컴퓨팅 파워 등의 개선으로 인해 비디오 기반 연구는 최근 많은 관심을 얻고 있다. 비디오 데이터가 이미지 데이터와 비교하여 가장 큰 차이는 비디오 데이터에는 많은 양의 시간적, 공간적 정보가 포함되어 있다는 점이다. 이처럼 비디오에 포함된 많은 양의 데이터로 인해 컴퓨터 비전 연구에 있어서 행동 인식은 중요한 연구 과제 중 하나이지만, 비디오와 같이 움직임이 있는 환경에서 인간의 행동 인식은 매우 복잡하고 도전적인 과제이다. 인간에 대한 여러 연구를 바탕으로 인공지능에서는 인간과 유사한 주의(attention)메커니즘이 효율적인 인식 모델이라는 것을 알게 되었다. 이 효율적인 모델은 이미지 정보와 복잡한 연속 비디오 정보를 처리하는 데 이상적이다. 본 논문에서는 이러한 연구배경을 기반으로, 비디오에서 인간의 행동을 효율적으로 인식하기 위해 먼저 인간의 행동에 주목한 후 비디오 행동 인식에 주의메커니즘을 도입하고자 한다. 논문의 주요내용은 두 가지 주의 메카니즘을 기반으로 컨볼루션 신경망을 이용한 새로운 3D 잔류 주의 네트워크를 제안함으로써 비디오에서 인간의 행동을 식별하고자 한다. 제안 모델의 평가 결과 최대 90.7%정도의 정확도를 보였다.

Dual-Stream Fusion and Graph Convolutional Network for Skeleton-Based Action Recognition

  • Hu, Zeyuan;Feng, Yiran;Lee, Eung-Joo
    • 한국멀티미디어학회논문지
    • /
    • 제24권3호
    • /
    • pp.423-430
    • /
    • 2021
  • Aiming Graph convolutional networks (GCNs) have achieved outstanding performances on skeleton-based action recognition. However, several problems remain in existing GCN-based methods, and the problem of low recognition rate caused by single input data information has not been effectively solved. In this article, we propose a Dual-stream fusion method that combines video data and skeleton data. The two networks respectively identify skeleton data and video data and fuse the probabilities of the two outputs to achieve the effect of information fusion. Experiments on two large dataset, Kinetics and NTU-RGBC+D Human Action Dataset, illustrate that our proposed method achieves state-of-the-art. Compared with the traditional method, the recognition accuracy is improved better.

Real-Time Cattle Action Recognition for Estrus Detection

  • Heo, Eui-Ju;Ahn, Sung-Jin;Choi, Kang-Sun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권4호
    • /
    • pp.2148-2161
    • /
    • 2019
  • In this paper, we present a real-time cattle action recognition algorithm to detect the estrus phase of cattle from a live video stream. In order to classify cattle movement, specifically, to detect the mounting action, the most observable sign of the estrus phase, a simple yet effective feature description exploiting motion history images (MHI) is designed. By learning the proposed features using the support vector machine framework, various representative cattle actions, such as mounting, walking, tail wagging, and foot stamping, can be recognized robustly in complex scenes. Thanks to low complexity of the proposed action recognition algorithm, multiple cattle in three enclosures can be monitored simultaneously using a single fisheye camera. Through extensive experiments with real video streams, we confirmed that the proposed algorithm outperforms a conventional human action recognition algorithm by 18% in terms of recognition accuracy even with much smaller dimensional feature description.

Video augmentation technique for human action recognition using genetic algorithm

  • Nida, Nudrat;Yousaf, Muhammad Haroon;Irtaza, Aun;Velastin, Sergio A.
    • ETRI Journal
    • /
    • 제44권2호
    • /
    • pp.327-338
    • /
    • 2022
  • Classification models for human action recognition require robust features and large training sets for good generalization. However, data augmentation methods are employed for imbalanced training sets to achieve higher accuracy. These samples generated using data augmentation only reflect existing samples within the training set, their feature representations are less diverse and hence, contribute to less precise classification. This paper presents new data augmentation and action representation approaches to grow training sets. The proposed approach is based on two fundamental concepts: virtual video generation for augmentation and representation of the action videos through robust features. Virtual videos are generated from the motion history templates of action videos, which are convolved using a convolutional neural network, to generate deep features. Furthermore, by observing an objective function of the genetic algorithm, the spatiotemporal features of different samples are combined, to generate the representations of the virtual videos and then classified through an extreme learning machine classifier on MuHAVi-Uncut, iXMAS, and IAVID-1 datasets.

RGB 비디오 데이터를 이용한 Slowfast 모델 기반 이상 행동 인식 최적화 (Optimization of Action Recognition based on Slowfast Deep Learning Model using RGB Video Data)

  • 정재혁;김민석
    • 한국멀티미디어학회논문지
    • /
    • 제25권8호
    • /
    • pp.1049-1058
    • /
    • 2022
  • HAR(Human Action Recognition) such as anomaly and object detection has become a trend in research field(s) that focus on utilizing Artificial Intelligence (AI) methods to analyze patterns of human action in crime-ridden area(s), media services, and industrial facilities. Especially, in real-time system(s) using video streaming data, HAR has become a more important AI-based research field in application development and many different research fields using HAR have currently been developed and improved. In this paper, we propose and analyze a deep-learning-based HAR that provides more efficient scheme(s) using an intelligent AI models, such system can be applied to media services using RGB video streaming data usage without feature extraction pre-processing. For the method, we adopt Slowfast based on the Deep Neural Network(DNN) model under an open dataset(HMDB-51 or UCF101) for improvement in prediction accuracy.

Extensible Hierarchical Method of Detecting Interactive Actions for Video Understanding

  • Moon, Jinyoung;Jin, Junho;Kwon, Yongjin;Kang, Kyuchang;Park, Jongyoul;Park, Kyoung
    • ETRI Journal
    • /
    • 제39권4호
    • /
    • pp.502-513
    • /
    • 2017
  • For video understanding, namely analyzing who did what in a video, actions along with objects are primary elements. Most studies on actions have handled recognition problems for a well-trimmed video and focused on enhancing their classification performance. However, action detection, including localization as well as recognition, is required because, in general, actions intersect in time and space. In addition, most studies have not considered extensibility for a newly added action that has been previously trained. Therefore, proposed in this paper is an extensible hierarchical method for detecting generic actions, which combine object movements and spatial relations between two objects, and inherited actions, which are determined by the related objects through an ontology and rule based methodology. The hierarchical design of the method enables it to detect any interactive actions based on the spatial relations between two objects. The method using object information achieves an F-measure of 90.27%. Moreover, this paper describes the extensibility of the method for a new action contained in a video from a video domain that is different from the dataset used.

다중 객체가 존재하는 ERP 영상에서 행동 인식 모델 성능 향상을 위한 전처리 기법 (Preprocessing Technique for Improving Action Recognition Performance in ERP Video with Multiple Objects)

  • 박은수;김승환;류은석
    • 방송공학회논문지
    • /
    • 제25권3호
    • /
    • pp.374-385
    • /
    • 2020
  • 본 논문에서 Equirectangular Projection(ERP) 영상으로 행동 인식을 할 때의 문제점들을 해결할 수 있는 전처리 기법을 제안한다. 본 논문에서 제안하는 전처리 기법은 사람 객체를 행동의 주체 즉, Object of Interest(OOI)로 가정하고, OOI의 주변 영역을 ROI로 가정한다. 전처리 기법은 3개의 모듈로 이루어져 있다. I) 객체 인식 모델로 영상 내 사람 객체를 인식한다. II) 입력 영상에서 saliency map을 생성한다. III) 인식된 사람 객체와 saliency map을 이용하여 행동의 주체를 선정한다. 이후 행동 인식 모델에 선정된 행동의 주체 boundary box를 입력하여 행동 인식 성능을 높인다. 제안하는 전처리기법을 사용한 데이터를 행동 인식 모델에 입력한 방법의 성능과 원본 ERP 영상을 입력한 방법의 성능을 비교하였을 때 최대 99.6%의 성능 향상을 보이며, OOI가 감지되는 프레임만을 추출하였을 때 행동 관련 영상 요약의 효과도 볼 수 있다.