• Title/Summary/Keyword: Video Classification

Search Result 355, Processing Time 0.031 seconds

A Study on Efficient Learning Units for Behavior-Recognition of People in Video (비디오에서 동체의 행위인지를 위한 효율적 학습 단위에 관한 연구)

  • Kwon, Ick-Hwan;Hadjer, Boubenna;Lee, Dohoon
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.2
    • /
    • pp.196-204
    • /
    • 2017
  • Behavior of intelligent video surveillance system is recognized by analyzing the pattern of the object of interest by using the frame information of video inputted from the camera and analyzes the behavior. Detection of object's certain behaviors in the crowd has become a critical problem because in the event of terror strikes. Recognition of object's certain behaviors is an important but difficult problem in the area of computer vision. As the realization of big data utilizing machine learning, data mining techniques, the amount of video through the CCTV, Smart-phone and Drone's video has increased dramatically. In this paper, we propose a multiple-sliding window method to recognize the cumulative change as one piece in order to improve the accuracy of the recognition. The experimental results demonstrated the method was robust and efficient learning units in the classification of certain behaviors.

CNN-based Fast Split Mode Decision Algorithm for Versatile Video Coding (VVC) Inter Prediction

  • Yeo, Woon-Ha;Kim, Byung-Gyu
    • Journal of Multimedia Information System
    • /
    • v.8 no.3
    • /
    • pp.147-158
    • /
    • 2021
  • Versatile Video Coding (VVC) is the latest video coding standard developed by Joint Video Exploration Team (JVET). In VVC, the quadtree plus multi-type tree (QT+MTT) structure of coding unit (CU) partition is adopted, and its computational complexity is considerably high due to the brute-force search for recursive rate-distortion (RD) optimization. In this paper, we aim to reduce the time complexity of inter-picture prediction mode since the inter prediction accounts for a large portion of the total encoding time. The problem can be defined as classifying the split mode of each CU. To classify the split mode effectively, a novel convolutional neural network (CNN) called multi-level tree (MLT-CNN) architecture is introduced. For boosting classification performance, we utilize additional information including inter-picture information while training the CNN. The overall algorithm including the MLT-CNN inference process is implemented on VVC Test Model (VTM) 11.0. The CUs of size 128×128 can be the inputs of the CNN. The sequences are encoded at the random access (RA) configuration with five QP values {22, 27, 32, 37, 42}. The experimental results show that the proposed algorithm can reduce the computational complexity by 11.53% on average, and 26.14% for the maximum with an average 1.01% of the increase in Bjøntegaard delta bit rate (BDBR). Especially, the proposed method shows higher performance on the sequences of the A and B classes, reducing 9.81%~26.14% of encoding time with 0.95%~3.28% of the BDBR increase.

Human Activity Classification Using Deep Transfer Learning (딥 전이 학습을 이용한 인간 행동 분류)

  • Nindam, Somsawut;Manmai, Thong-oon;Sung, Thaileang;Wu, Jiahua;Lee, Hyo Jong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.11a
    • /
    • pp.478-480
    • /
    • 2022
  • This paper studies human activity image classification using deep transfer learning techniques focused on the inception convolutional neural networks (InceptionV3) model. For this, we used UFC-101 public datasets containing a group of students' behaviors in mathematics classrooms at a school in Thailand. The video dataset contains Play Sitar, Tai Chi, Walking with Dog, and Student Study (our dataset) classes. The experiment was conducted in three phases. First, it extracts an image frame from the video, and a tag is labeled on the frame. Second, it loads the dataset into the inception V3 with transfer learning for image classification of four classes. Lastly, we evaluate the model's accuracy using precision, recall, F1-Score, and confusion matrix. The outcomes of the classifications for the public and our dataset are 1) Play Sitar (precision = 1.0, recall = 1.0, F1 = 1.0), 2), Tai Chi (precision = 1.0, recall = 1.0, F1 = 1.0), 3) Walking with Dog (precision = 1.0, recall = 1.0, F1 = 1.0), and 4) Student Study (precision = 1.0, recall = 1.0, F1 = 1.0), respectively. The results show that the overall accuracy of the classification rate is 100% which states the model is more powerful for learning UCF-101 and our dataset with higher accuracy.

Automatic Summarization of Basketball Video Using the Score Information (스코어 정보를 이용한 농구 비디오의 자동요약)

  • Jung, Cheol-Kon;Kim, Eui-Jin;Lee, Gwang-Gook;Kim, Whoi-Yul
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.9C
    • /
    • pp.881-887
    • /
    • 2007
  • In this paper, we proposed a method for content based automatic summarization of basketball game videos. For meaningful summary, we used the score information in basketball videos. And the score information is obtained by recognizing the digits on the score caption and analyzing the variation of the score. Generally, important events of basketball are the 3-point shot, one-sided runs, the lead changes, and so on. We have detected these events using score information and made summaries and highlights of basketball video games.

Automatic Summarization of Basketball Video Using the Score Information (스코어 정보를 이용한 농구 비디오의 자동요약)

  • Jung, Cheol-Kon;Kim, Eui-Jin;Lee, Gwang-Gook;Kim, Whoi-Yul
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.8C
    • /
    • pp.738-744
    • /
    • 2007
  • In this paper, we proposed a method for content based automatic summarization of basketball game videos. For meaningful summary, we used the score information in basketball videos. And the score information is obtained by recognizing the digits on the score caption and analyzing the variation of the score. Generally, important events of basketball are the 3-point shot, one-sided runs, the lead changes, and so on. We have detected these events using score information and made summaries and highlights of basketball video games.

An Efficient Motion Compensation Algorithm for Video Sequences with Brightness Variations (밝기 변화가 심한 비디오 시퀀스에 대한 효율적인 움직임 보상 알고리즘)

  • 김상현;박래홍
    • Journal of Broadcast Engineering
    • /
    • v.7 no.4
    • /
    • pp.291-299
    • /
    • 2002
  • This paper proposes an efficient motion compensation algorithm for video sequences with brightness variations. In the proposed algorithm, the brightness variation parameters are estimated and local motions are compensated. To detect the frame with large brightness variations. we employ the frame classification based on the cross entropy between histograms of two successive frames, which can reduce the computational redundancy. Simulation results show that the proposed method yields a higher peak signal to noise ratio (PSNR) than the conventional methods, with a low computational load, when the video scene contains large brightness changes.

An Efficient Video Coding Algorithm Applying Brightness Variation Compensation (밝기변화 보상을 적용한 효율적인 비디오 코딩 알고리즘)

  • Kim Sang-Hyun
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.5 no.4
    • /
    • pp.287-293
    • /
    • 2004
  • This paper proposes an efficient motion compensation algorithm for video sequences with brightness variations. In the proposed algorithm, the brightness variation parameters are estimated and local motions are compensated. To detect the frame with large brightness variations, we employ the frame classification based on the cross entropy between histograms of two successive frames, which can reduce the computational redundancy. Simulation results show that the proposed method yields a higher peak signal to noise ratio (PSNR) than that of the conventional methods, with a low computational load, when the video scene contains large brightness changes.

  • PDF

Decomposed "Spatial and Temporal" Convolution for Human Action Recognition in Videos

  • Sediqi, Khwaja Monib;Lee, Hyo Jong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.05a
    • /
    • pp.455-457
    • /
    • 2019
  • In this paper we study the effect of decomposed spatiotemporal convolutions for action recognition in videos. Our motivation emerges from the empirical observation that spatial convolution applied on solo frames of the video provide good performance in action recognition. In this research we empirically show the accuracy of factorized convolution on individual frames of video for action classification. We take 3D ResNet-18 as base line model for our experiment, factorize its 3D convolution to 2D (Spatial) and 1D (Temporal) convolution. We train the model from scratch using Kinetics video dataset. We then fine-tune the model on UCF-101 dataset and evaluate the performance. Our results show good accuracy similar to that of the state of the art algorithms on Kinetics and UCF-101 datasets.

ADD-Net: Attention Based 3D Dense Network for Action Recognition

  • Man, Qiaoyue;Cho, Young Im
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.6
    • /
    • pp.21-28
    • /
    • 2019
  • Recent years with the development of artificial intelligence and the success of the deep model, they have been deployed in all fields of computer vision. Action recognition, as an important branch of human perception and computer vision system research, has attracted more and more attention. Action recognition is a challenging task due to the special complexity of human movement, the same movement may exist between multiple individuals. The human action exists as a continuous image frame in the video, so action recognition requires more computational power than processing static images. And the simple use of the CNN network cannot achieve the desired results. Recently, the attention model has achieved good results in computer vision and natural language processing. In particular, for video action classification, after adding the attention model, it is more effective to focus on motion features and improve performance. It intuitively explains which part the model attends to when making a particular decision, which is very helpful in real applications. In this paper, we proposed a 3D dense convolutional network based on attention mechanism(ADD-Net), recognition of human motion behavior in the video.

A Study on Sentiment Pattern Analysis of Video Viewers and Predicting Interest in Video using Facial Emotion Recognition (얼굴 감정을 이용한 시청자 감정 패턴 분석 및 흥미도 예측 연구)

  • Jo, In Gu;Kong, Younwoo;Jeon, Soyi;Cho, Seoyeong;Lee, DoHoon
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.2
    • /
    • pp.215-220
    • /
    • 2022
  • Emotion recognition is one of the most important and challenging areas of computer vision. Nowadays, many studies on emotion recognition were conducted and the performance of models is also improving. but, more research is needed on emotion recognition and sentiment analysis of video viewers. In this paper, we propose an emotion analysis system the includes a sentiment analysis model and an interest prediction model. We analyzed the emotional patterns of people watching popular and unpopular videos and predicted the level of interest using the emotion analysis system. Experimental results showed that certain emotions were strongly related to the popularity of videos and the interest prediction model had high accuracy in predicting the level of interest.