• Title/Summary/Keyword: video action recognition

Search Result 65, Processing Time 0.023 seconds

Spatial-Temporal Scale-Invariant Human Action Recognition using Motion Gradient Histogram (모션 그래디언트 히스토그램 기반의 시공간 크기 변화에 강인한 동작 인식)

  • Kim, Kwang-Soo;Kim, Tae-Hyoung;Kwak, Soo-Yeong;Byun, Hye-Ran
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.12
    • /
    • pp.1075-1082
    • /
    • 2007
  • In this paper, we propose the method of multiple human action recognition on video clip. For being invariant to the change of speed or size of actions, Spatial-Temporal Pyramid method is applied. Proposed method can minimize the complexity of the procedures owing to select Motion Gradient Histogram (MGH) based on statistical approach for action representation feature. For multiple action detection, Motion Energy Image (MEI) of binary frame difference accumulations is adapted and then we detect each action of which area is represented by MGH. The action MGH should be compared with pre-learning MGH having pyramid method. As a result, recognition can be done by the analyze between action MGH and pre-learning MGH. Ten video clips are used for evaluating the proposed method. We have various experiments such as mono action, multiple action, speed and site scale-changes, comparison with previous method. As a result, we can see that proposed method is simple and efficient to recognize multiple human action with stale variations.

Human Motion Recognition Based on Spatio-temporal Convolutional Neural Network

  • Hu, Zeyuan;Park, Sange-yun;Lee, Eung-Joo
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.8
    • /
    • pp.977-985
    • /
    • 2020
  • Aiming at the problem of complex feature extraction and low accuracy in human action recognition, this paper proposed a network structure combining batch normalization algorithm with GoogLeNet network model. Applying Batch Normalization idea in the field of image classification to action recognition field, it improved the algorithm by normalizing the network input training sample by mini-batch. For convolutional network, RGB image was the spatial input, and stacked optical flows was the temporal input. Then, it fused the spatio-temporal networks to get the final action recognition result. It trained and evaluated the architecture on the standard video actions benchmarks of UCF101 and HMDB51, which achieved the accuracy of 93.42% and 67.82%. The results show that the improved convolutional neural network has a significant improvement in improving the recognition rate and has obvious advantages in action recognition.

Improved DT Algorithm Based Human Action Features Detection

  • Hu, Zeyuan;Lee, Suk-Hwan;Lee, Eung-Joo
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.4
    • /
    • pp.478-484
    • /
    • 2018
  • The choice of the motion features influences the result of the human action recognition method directly. Many factors often influence the single feature differently, such as appearance of the human body, environment and video camera. So the accuracy of action recognition is restricted. On the bases of studying the representation and recognition of human actions, and giving fully consideration to the advantages and disadvantages of different features, the Dense Trajectories(DT) algorithm is a very classic algorithm in the field of behavior recognition feature extraction, but there are some defects in the use of optical flow images. In this paper, we will use the improved Dense Trajectories(iDT) algorithm to optimize and extract the optical flow features in the movement of human action, then we will combined with Support Vector Machine methods to identify human behavior, and use the image in the KTH database for training and testing.

Binary Hashing CNN Features for Action Recognition

  • Li, Weisheng;Feng, Chen;Xiao, Bin;Chen, Yanquan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.9
    • /
    • pp.4412-4428
    • /
    • 2018
  • The purpose of this work is to solve the problem of representing an entire video using Convolutional Neural Network (CNN) features for human action recognition. Recently, due to insufficient GPU memory, it has been difficult to take the whole video as the input of the CNN for end-to-end learning. A typical method is to use sampled video frames as inputs and corresponding labels as supervision. One major issue of this popular approach is that the local samples may not contain the information indicated by the global labels and sufficient motion information. To address this issue, we propose a binary hashing method to enhance the local feature extractors. First, we extract the local features and aggregate them into global features using maximum/minimum pooling. Second, we use the binary hashing method to capture the motion features. Finally, we concatenate the hashing features with global features using different normalization methods to train the classifier. Experimental results on the JHMDB and MPII-Cooking datasets show that, for these new local features, binary hashing mapping on the sparsely sampled features led to significant performance improvements.

Sign language translation using video captioning and sign language recognition using action recognition (비디오 캡셔닝을 적용한 수어 번역 및 행동 인식을 적용한 수어 인식)

  • Gi-Duk Kim;Geun-Hoo Lee
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2024.01a
    • /
    • pp.317-319
    • /
    • 2024
  • 본 논문에서는 비디오 캡셔닝 알고리즘을 적용한 수어 번역 및 행동 인식 알고리즘을 적용한 수어 인식 알고리즘을 제안한다. 본 논문에 사용된 비디오 캡셔닝 알고리즘으로 40개의 연속된 입력 데이터 프레임을 CNN 네트워크를 통해 임베딩 하고 트랜스포머의 입력으로 하여 문장을 출력하였다. 행동 인식 알고리즘은 랜덤 샘플링을 하여 한 영상에 40개의 인덱스에서 40개의 연속된 데이터에 CNN 네트워크를 통해 임베딩하고 GRU, 트랜스포머를 결합한 RNN 모델을 통해 인식 결과를 출력하였다. 수어 번역에서 BLEU-4의 경우 7.85, CIDEr는 53.12를 얻었고 수어 인식으로 96.26%의 인식 정확도를 얻었다.

  • PDF

Action Recognition with deep network features and dimension reduction

  • Li, Lijun;Dai, Shuling
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.2
    • /
    • pp.832-854
    • /
    • 2019
  • Action recognition has been studied in computer vision field for years. We present an effective approach to recognize actions using a dimension reduction method, which is applied as a crucial step to reduce the dimensionality of feature descriptors after extracting features. We propose to use sparse matrix and randomized kd-tree to modify it and then propose modified Local Fisher Discriminant Analysis (mLFDA) method which greatly reduces the required memory and accelerate the standard Local Fisher Discriminant Analysis. For feature encoding, we propose a useful encoding method called mix encoding which combines Fisher vector encoding and locality-constrained linear coding to get the final video representations. In order to add more meaningful features to the process of action recognition, the convolutional neural network is utilized and combined with mix encoding to produce the deep network feature. Experimental results show that our algorithm is a competitive method on KTH dataset, HMDB51 dataset and UCF101 dataset when combining all these methods.

Spatial-temporal Ensemble Method for Action Recognition (행동 인식을 위한 시공간 앙상블 기법)

  • Seo, Minseok;Lee, Sangwoo;Choi, Dong-Geol
    • The Journal of Korea Robotics Society
    • /
    • v.15 no.4
    • /
    • pp.385-391
    • /
    • 2020
  • As deep learning technology has been developed and applied to various fields, it is gradually changing from an existing single image based application to a video based application having a time base in order to recognize human behavior. However, unlike 2D CNN in a single image, 3D CNN in a video has a very high amount of computation and parameter increase due to the addition of a time axis, so improving accuracy in action recognition technology is more difficult than in a single image. To solve this problem, we investigate and analyze various techniques to improve performance in 3D CNN-based image recognition without additional training time and parameter increase. We propose a time base ensemble using the time axis that exists only in the videos and an ensemble in the input frame. We have achieved an accuracy improvement of up to 7.1% compared to the existing performance with a combination of techniques. It also revealed the trade-off relationship between computational and accuracy.

Trends in Online Action Detection in Streaming Videos (온라인 행동 탐지 기술 동향)

  • Moon, J.Y.;Kim, H.I.;Lee, Y.J.
    • Electronics and Telecommunications Trends
    • /
    • v.36 no.2
    • /
    • pp.75-82
    • /
    • 2021
  • Online action detection (OAD) in a streaming video is an attractive research area that has aroused interest lately. Although most studies for action understanding have considered action recognition in well-trimmed videos and offline temporal action detection in untrimmed videos, online action detection methods are required to monitor action occurrences in streaming videos. OAD predicts action probabilities for a current frame or frame sequence using a fixed-sized video segment, including past and current frames. In this article, we discuss deep learning-based OAD models. In addition, we investigated OAD evaluation methodologies, including benchmark datasets and performance measures, and compared the performances of the presented OAD models.

A Deep Learning Algorithm for Fusing Action Recognition and Psychological Characteristics of Wrestlers

  • Yuan Yuan;Yuan Yuan;Jun Liu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.3
    • /
    • pp.754-774
    • /
    • 2023
  • Wrestling is one of the popular events for modern sports. It is difficult to quantitatively describe a wrestling game between athletes. And deep learning can help wrestling training by human recognition techniques. Based on the characteristics of latest wrestling competition rules and human recognition technologies, a set of wrestling competition video analysis and retrieval system is proposed. This system uses a combination of literature method, observation method, interview method and mathematical statistics to conduct statistics, analysis, research and discussion on the application of technology. Combined the system application in targeted movement technology. A deep learning-based facial recognition psychological feature analysis method for the training and competition of classical wrestling after the implementation of the new rules is proposed. The experimental results of this paper showed that the proportion of natural emotions of male and female wrestlers was about 50%, indicating that the wrestler's mentality was relatively stable before the intense physical confrontation, and the test of the system also proved the stability of the system.

Human Action Recognition Using Deep Data: A Fine-Grained Study

  • Rao, D. Surendra;Potturu, Sudharsana Rao;Bhagyaraju, V
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.6
    • /
    • pp.97-108
    • /
    • 2022
  • The video-assisted human action recognition [1] field is one of the most active ones in computer vision research. Since the depth data [2] obtained by Kinect cameras has more benefits than traditional RGB data, research on human action detection has recently increased because of the Kinect camera. We conducted a systematic study of strategies for recognizing human activity based on deep data in this article. All methods are grouped into deep map tactics and skeleton tactics. A comparison of some of the more traditional strategies is also covered. We then examined the specifics of different depth behavior databases and provided a straightforward distinction between them. We address the advantages and disadvantages of depth and skeleton-based techniques in this discussion.