• 제목/요약/키워드: video recognition

검색결과 680건 처리시간 0.024초

Two person Interaction Recognition Based on Effective Hybrid Learning

  • Ahmed, Minhaz Uddin;Kim, Yeong Hyeon;Kim, Jin Woo;Bashar, Md Rezaul;Rhee, Phill Kyu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권2호
    • /
    • pp.751-770
    • /
    • 2019
  • Action recognition is an essential task in computer vision due to the variety of prospective applications, such as security surveillance, machine learning, and human-computer interaction. The availability of more video data than ever before and the lofty performance of deep convolutional neural networks also make it essential for action recognition in video. Unfortunately, limited crafted video features and the scarcity of benchmark datasets make it challenging to address the multi-person action recognition task in video data. In this work, we propose a deep convolutional neural network-based Effective Hybrid Learning (EHL) framework for two-person interaction classification in video data. Our approach exploits a pre-trained network model (the VGG16 from the University of Oxford Visual Geometry Group) and extends the Faster R-CNN (region-based convolutional neural network a state-of-the-art detector for image classification). We broaden a semi-supervised learning method combined with an active learning method to improve overall performance. Numerous types of two-person interactions exist in the real world, which makes this a challenging task. In our experiment, we consider a limited number of actions, such as hugging, fighting, linking arms, talking, and kidnapping in two environment such simple and complex. We show that our trained model with an active semi-supervised learning architecture gradually improves the performance. In a simple environment using an Intelligent Technology Laboratory (ITLab) dataset from Inha University, performance increased to 95.6% accuracy, and in a complex environment, performance reached 81% accuracy. Our method reduces data-labeling time, compared to supervised learning methods, for the ITLab dataset. We also conduct extensive experiment on Human Action Recognition benchmarks such as UT-Interaction dataset, HMDB51 dataset and obtain better performance than state-of-the-art approaches.

손과 팔 재활 훈련 지원 시스템에서의 사용자 인터페이스 설계와 재활 훈련 방법 (User Interface Design and Rehabilitation Training Methods in Hand or Arm Rehabilitation Support System)

  • 하진영;이준호;최선화
    • 산업기술연구
    • /
    • 제31권A호
    • /
    • pp.63-69
    • /
    • 2011
  • A home-based rehabilitation system for patients with uncomfortable hands or arms was developed. By using this system, patients can save time and money of going to the hospital. The system's interface is easy to manipulate. In this paper, we discuss a rehabilitation system using video recognition; the focus is on designing a convenient user interface and rehabilitation training methods. The system consists of two screens: one for recording user's information and the other for training. A first-time user inputs his/her information. The system chooses the training method based on the information and records the training process automatically using video recognition. On the training screen, video clips of the training method and help messages are displayed for the user.

  • PDF

RGB 비디오 데이터를 이용한 Slowfast 모델 기반 이상 행동 인식 최적화 (Optimization of Action Recognition based on Slowfast Deep Learning Model using RGB Video Data)

  • 정재혁;김민석
    • 한국멀티미디어학회논문지
    • /
    • 제25권8호
    • /
    • pp.1049-1058
    • /
    • 2022
  • HAR(Human Action Recognition) such as anomaly and object detection has become a trend in research field(s) that focus on utilizing Artificial Intelligence (AI) methods to analyze patterns of human action in crime-ridden area(s), media services, and industrial facilities. Especially, in real-time system(s) using video streaming data, HAR has become a more important AI-based research field in application development and many different research fields using HAR have currently been developed and improved. In this paper, we propose and analyze a deep-learning-based HAR that provides more efficient scheme(s) using an intelligent AI models, such system can be applied to media services using RGB video streaming data usage without feature extraction pre-processing. For the method, we adopt Slowfast based on the Deep Neural Network(DNN) model under an open dataset(HMDB-51 or UCF101) for improvement in prediction accuracy.

스포츠 중계를 위한 자막 인식 시스템 개발 (Development of a Video Caption Recognition System for Sport Event Broadcasting)

  • 오주현
    • 한국HCI학회:학술대회논문집
    • /
    • 한국HCI학회 2009년도 학술대회
    • /
    • pp.94-98
    • /
    • 2009
  • 메이저리그 야구 중계 등 해외 스포츠 중계제작에서 해결해야 할 문제 중 하나는 MPH(miles per hour)와 같이 영미식 단위로 표시된 자막을 국내 실정에 맞게 km/h 등으로 변환하는 것이다. 이를 위해 중계화면에 표시된 자막영역의 변화로부터 해당 자막이 표시되었음을 감지하고 숫자 정보를 인식하여 이를 국내실정에 맞는 SI 단위로 변환하는 스포츠 자막 인식 시스템을 개발하였다. 변환된 자막은 후단의 문자발생기 (CG) 시스템으로 전달되어 최종적으로 TV 화면에 표시된다. 일반적으로 문자 인식에 주로 사용되는 신경망(neural networks) 기반 방식은 사전에 유사 데이터를 이용한 신경망의 학습(training) 과정이 필수적으로 요구되며, 또한 학습에 사용된 데이터와 다른 모양의 자막이 예고 없이 사용되었을 경우 대처할 수 없다는 단점이 있다. 생방송이라는 사용 환경을 고려하여 새로운 폰트로 제작된 자막에도 신속하게 대처할 수 있는 템플릿 매칭(template matching) 방식을 사용하였다. 여러 가지 실험 영상으로 테스트한 결과 97% 이상의 정확한 인식 결과를 얻었으며, 정확성을 요하는 생방송의 특성상 매칭의 확신도(confidence)가 높지 않은 경우에는 작업자가 판단한 후 핫키를 이용하여 정확한 자막을 출력할 수 있게 하였다.

  • PDF

손동작 추적 및 인식을 이용한 비디오 편집 (Video Editing using Hand Gesture Tracking and Recognition)

  • 배철수
    • 한국정보통신학회논문지
    • /
    • 제11권1호
    • /
    • pp.102-107
    • /
    • 2007
  • 본 논문에서는 동작에 근거한 새로운 비디오 편집 방법을 제안한다. 강의 비디오에서 전자 슬라이드 내용을 자동으로 검출하고 비디오와 동기화한다. 각 동기화된 표제의 동작을 연속적으로 추적 및 인식한 후, 등록된 화면과 슬라이드에서 변환 내용을 찾아 동작이 일어 나는 영역을 확인한다. 인식된 동작과 등록된 지점에서 슬라이드의 정보를 추출하여 슬라이드 영역을 부분적으로 확대한다거나 원본 비디오를 자동으로 편집함으로써 비디오의 질을 향상 시킬 수가 있다. 2개의 비디오 가지고 실험한 결과 각 95.5,96.4 %의 동작 인식 결과를 얻을 수 있었다.

Method of extracting context from media data by using video sharing site

  • Kondoh, Satoshi;Ogawa, Takeshi
    • 한국방송∙미디어공학회:학술대회논문집
    • /
    • 한국방송공학회 2009년도 IWAIT
    • /
    • pp.709-713
    • /
    • 2009
  • Recently, a lot of research that applies data acquired from devices such as cameras and RFIDs to context aware services is being performed in the field on Life-Log and the sensor network. A variety of analytical techniques has been proposed to recognize various information from the raw data because video and audio data include a larger volume of information than other sensor data. However, manually watching a huge amount of media data again has been necessary to create supervised data for the update of a class or the addition of a new class because these techniques generally use supervised learning. Therefore, the problem was that applications were able to use only recognition function based on fixed supervised data in most cases. Then, we proposed a method of acquiring supervised data from a video sharing site where users give comments on any video scene because those sites are remarkably popular and, therefore, many comments are generated. In the first step of this method, words with a high utility value are extracted by filtering the comment about the video. Second, the set of feature data in the time series is calculated by applying functions, which extract various feature data, to media data. Finally, our learning system calculates the correlation coefficient by using the above-mentioned two kinds of data, and the correlation coefficient is stored in the DB of the system. Various other applications contain a recognition function that is used to generate collective intelligence based on Web comments, by applying this correlation coefficient to new media data. In addition, flexible recognition that adjusts to a new object becomes possible by regularly acquiring and learning both media data and comments from a video sharing site while reducing work by manual operation. As a result, recognition of not only the name of the seen object but also indirect information, e.g. the impression or the action toward the object, was enabled.

  • PDF

Human Activity Recognition Using Spatiotemporal 3-D Body Joint Features with Hidden Markov Models

  • Uddin, Md. Zia;Kim, Jaehyoun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제10권6호
    • /
    • pp.2767-2780
    • /
    • 2016
  • Video-based human-activity recognition has become increasingly popular due to the prominent corresponding applications in a variety of fields such as computer vision, image processing, smart-home healthcare, and human-computer interactions. The essential goals of a video-based activity-recognition system include the provision of behavior-based information to enable functionality that proactively assists a person with his/her tasks. The target of this work is the development of a novel approach for human-activity recognition, whereby human-body-joint features that are extracted from depth videos are used. From silhouette images taken at every depth, the direction and magnitude features are first obtained from each connected body-joint pair so that they can be augmented later with motion direction, as well as with the magnitude features of each joint in the next frame. A generalized discriminant analysis (GDA) is applied to make the spatiotemporal features more robust, followed by the feeding of the time-sequence features into a Hidden Markov Model (HMM) for the training of each activity. Lastly, all of the trained-activity HMMs are used for depth-video activity recognition.

실시간 비정형객체 인식 기법 기반 지능형 이상 탐지 시스템에 관한 연구 (Research on Intelligent Anomaly Detection System Based on Real-Time Unstructured Object Recognition Technique)

  • 이석창;김영현;강수경;박명혜
    • 한국멀티미디어학회논문지
    • /
    • 제25권3호
    • /
    • pp.546-557
    • /
    • 2022
  • Recently, the demand to interpret image data with artificial intelligence in various fields is rapidly increasing. Object recognition and detection techniques using deep learning are mainly used, and video integration analysis to determine unstructured object recognition is a particularly important problem. In the case of natural disasters or social disasters, there is a limit to the object recognition structure alone because it has an unstructured shape. In this paper, we propose intelligent video integration analysis system that can recognize unstructured objects based on video turning point and object detection. We also introduce a method to apply and evaluate object recognition using virtual augmented images from 2D to 3D through GAN.

Intelligent Activity Recognition based on Improved Convolutional Neural Network

  • Park, Jin-Ho;Lee, Eung-Joo
    • 한국멀티미디어학회논문지
    • /
    • 제25권6호
    • /
    • pp.807-818
    • /
    • 2022
  • In order to further improve the accuracy and time efficiency of behavior recognition in intelligent monitoring scenarios, a human behavior recognition algorithm based on YOLO combined with LSTM and CNN is proposed. Using the real-time nature of YOLO target detection, firstly, the specific behavior in the surveillance video is detected in real time, and the depth feature extraction is performed after obtaining the target size, location and other information; Then, remove noise data from irrelevant areas in the image; Finally, combined with LSTM modeling and processing time series, the final behavior discrimination is made for the behavior action sequence in the surveillance video. Experiments in the MSR and KTH datasets show that the average recognition rate of each behavior reaches 98.42% and 96.6%, and the average recognition speed reaches 210ms and 220ms. The method in this paper has a good effect on the intelligence behavior recognition.

Face Spoofing Attack Detection Using Spatial Frequency and Gradient-Based Descriptor

  • Ali, Zahid;Park, Unsang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권2호
    • /
    • pp.892-911
    • /
    • 2019
  • Biometric recognition systems have been widely used for information security. Among the most popular biometric traits, there are fingerprint and face due to their high recognition accuracies. However, the security system that uses face recognition as the login method are vulnerable to face-spoofing attacks, from using printed photo or video of the valid user. In this study, we propose a fast and robust method to detect face-spoofing attacks based on the analysis of spatial frequency differences between the real and fake videos. We found that the effect of a spoofing attack stands out more prominently in certain regions of the 2D Fourier spectra and, therefore, it is adequate to use the information about those regions to classify the input video or image as real or fake. We adopt a divide-conquer-aggregate approach, where we first divide the frequency domain image into local blocks, classify each local block independently, and then aggregate all the classification results by the weighted-sum approach. The effectiveness of the methodology is demonstrated using two different publicly available databases, namely: 1) Replay Attack Database and 2) CASIA-Face Anti-Spoofing Database. Experimental results show that the proposed method provides state-of-the-art performance by processing fewer frames of each video.