• 제목/요약/키워드: Video recognition

검색결과 679건 처리시간 0.023초

TSN을 이용한 도로 감시 카메라 영상의 강우량 인식 방법 (Rainfall Recognition from Road Surveillance Videos Using TSN)

  • ;현종환;최호진
    • 한국대기환경학회지
    • /
    • 제34권5호
    • /
    • pp.735-747
    • /
    • 2018
  • Rainfall depth is an important meteorological information. Generally, high spatial resolution rainfall data such as road-level rainfall data are more beneficial. However, it is expensive to set up sufficient Automatic Weather Systems to get the road-level rainfall data. In this paper, we propose to use deep learning to recognize rainfall depth from road surveillance videos. To achieve this goal, we collect a new video dataset and propose a procedure to calculate refined rainfall depth from the original meteorological data. We also propose to utilize the differential frame as well as the optical flow image for better recognition of rainfall depth. Under the Temporal Segment Networks framework, the experimental results show that the combination of the video frame and the differential frame is a superior solution for the rainfall depth recognition. The final model is able to achieve high performance in the single-location low sensitivity classification task and reasonable accuracy in the higher sensitivity classification task for both the single-location and the multi-location case.

윈도우 기반의 광학문자인식을 이용한 영상 번역 시스템 구현 (An Implementation of a System for Video Translation on Window Platform Using OCR)

  • 황선명;염희균
    • 사물인터넷융복합논문지
    • /
    • 제5권2호
    • /
    • pp.15-20
    • /
    • 2019
  • 기계학습 연구가 발달함에 따라 번역 분야 및, 광학 문자 인식(Optical Character Recognition, OCR) 등의 이미지 분석 기술은 뛰어난 발전을 보였다. 하지만 이 두 가지를 접목시킨 영상 번역은 기존의 개발에 비해 그 진척이 더딘 편이다. 본 논문에서는 기존의 OCR 기술과 번역기술을 접목시킨 이미지 번역기를 개발하고 그 효용성을 검증한다. 개발에 앞서 본 시스템을 구현하기 위하여 어떤 기능을 필요로 하는지, 기능을 구현하기 위한 방법은 어떤 것이 있는지 제시한 뒤 각기 그 성능을 시험하였다. 본 논문을 통하여 개발된 응용프로그램으로 사용자들은 좀 더 편리하게 번역에 접근할 수 있으며, 영상 번역이라는 특수한 환경으로 한정된 번역기능에서 벗어나 어떠한 환경에서라도 제공되는 편의성을 확보하는데 기여할 수 있을 것이다.

Spatial-temporal texture features for 3D human activity recognition using laser-based RGB-D videos

  • Ming, Yue;Wang, Guangchao;Hong, Xiaopeng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제11권3호
    • /
    • pp.1595-1613
    • /
    • 2017
  • The IR camera and laser-based IR projector provide an effective solution for real-time collection of moving targets in RGB-D videos. Different from the traditional RGB videos, the captured depth videos are not affected by the illumination variation. In this paper, we propose a novel feature extraction framework to describe human activities based on the above optical video capturing method, namely spatial-temporal texture features for 3D human activity recognition. Spatial-temporal texture feature with depth information is insensitive to illumination and occlusions, and efficient for fine-motion description. The framework of our proposed algorithm begins with video acquisition based on laser projection, video preprocessing with visual background extraction and obtains spatial-temporal key images. Then, the texture features encoded from key images are used to generate discriminative features for human activity information. The experimental results based on the different databases and practical scenarios demonstrate the effectiveness of our proposed algorithm for the large-scale data sets.

비디오 영상 정보 검색을 위한 문자 추출 및 인식 (Caption Detection and Recognition for Video Image Information Retrieval)

  • 구건서
    • 한국컴퓨터산업학회논문지
    • /
    • 제3권7호
    • /
    • pp.901-914
    • /
    • 2002
  • 본 논문에서는 비디오에서 입력된 영상으로부터 내용기반 검색을 위해 자동으로 자막을 추출하여 특징 추출을 기반의 단층 연결 신경망 인식기(FE-MCBP)에 의해 자막 문자를 인식하여 영상 자막의 내용을 검출하는 방법을 제시하였다. 비디오에서 자막 추출은 먼저, 비디오에서 일정한 시간 간격으로 획득한 프레임 중에서 히스토그램 분석을 통하여 키 프레임을 찾는 과정을 수행하며, 그 다음에 각각의 키 프레임에 대하여 칼라 세그먼테이션 후 라인 검사 방법 통하여 자막 영역을 추출하도록 하였다. 마지막으로 추출된 자막영역에서 개별문자를 분리하였다. 본 연구에서는 칼라 히스토그램을 분석 후 지역 최대값을 이용하여 세그먼테이션 후 라인 검사를 수행함으로써 처리 속도와 자막영역 검출의 정확도를 개선하였다. 비디오에서 자막 추출은 비디오 정보를 멀티미디어 데이터베이스화하는 초기 단계로 추출된 자막은 바로 문자 인식기의 입력이 된다. 또한 인식된 자막정보는 데이터베이스로 구축되며 내용기반 검색 기법에 의해 검색되도록 하였다.

  • PDF

Binary Hashing CNN Features for Action Recognition

  • Li, Weisheng;Feng, Chen;Xiao, Bin;Chen, Yanquan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제12권9호
    • /
    • pp.4412-4428
    • /
    • 2018
  • The purpose of this work is to solve the problem of representing an entire video using Convolutional Neural Network (CNN) features for human action recognition. Recently, due to insufficient GPU memory, it has been difficult to take the whole video as the input of the CNN for end-to-end learning. A typical method is to use sampled video frames as inputs and corresponding labels as supervision. One major issue of this popular approach is that the local samples may not contain the information indicated by the global labels and sufficient motion information. To address this issue, we propose a binary hashing method to enhance the local feature extractors. First, we extract the local features and aggregate them into global features using maximum/minimum pooling. Second, we use the binary hashing method to capture the motion features. Finally, we concatenate the hashing features with global features using different normalization methods to train the classifier. Experimental results on the JHMDB and MPII-Cooking datasets show that, for these new local features, binary hashing mapping on the sparsely sampled features led to significant performance improvements.

차량정보 분석과 제스처 인식을 위한 AVN 소프트웨어 구현 (Development of AVN Software Using Vehicle Information for Hand Gesture)

  • 오규태;박인혜;이상엽;고재진
    • 한국통신학회논문지
    • /
    • 제42권4호
    • /
    • pp.892-898
    • /
    • 2017
  • 본 논문은 차량 내 AVN(Audio Video Navigation)에서 차량정보 분석과 제스처 인식이 가능한 소프트웨어 구조를 설계하고 구현 방법을 서술한다. 설계된 소프트웨어는 차량정보 분석을 위해 CAN(Controller Area Network) 통신 데이터 분석 모듈을 구현하여 차량의 주행 상태를 분석했다. AVN 소프트웨어는 분석된 정보를 웨어러블 디바이스의 제스처 정보와 융합토록 했다. 도출된 융합정보는 운전자의 명령 수행 단계로 매칭하고 서비스를 지원하는데 사용됐다. 설계된 AVN 소프트웨어는 기성 제품과 유사한 환경의 HW 플랫폼 상에 구현되어 차량 주행 상황과 동일하게 모사된 상황에서의 차량정보분석, 제스처 인식 수행 등의 기능을 지원함을 확인했다.

Design of Metaverse for Two-Way Video Conferencing Platform Based on Virtual Reality

  • Yoon, Dongeon;Oh, Amsuk
    • Journal of information and communication convergence engineering
    • /
    • 제20권3호
    • /
    • pp.189-194
    • /
    • 2022
  • As non-face-to-face activities have become commonplace, online video conferencing platforms have become popular collaboration tools. However, existing video conferencing platforms have a structure in which one side unilaterally exchanges information, potentially increase the fatigue of meeting participants. In this study, we designed a video conferencing platform utilizing virtual reality (VR), a metaverse technology, to enable various interactions. A virtual conferencing space and realistic VR video conferencing content authoring tool support system were designed using Meta's Oculus Quest 2 hardware, the Unity engine, and 3D Max software. With the Photon software development kit, voice recognition was designed to perform automatic text translation with the Watson application programming interface, allowing the online video conferencing participants to communicate smoothly even if using different languages. It is expected that the proposed video conferencing platform will enable conference participants to interact and improve their work efficiency.

Secured Authentication through Integration of Gait and Footprint for Human Identification

  • Murukesh, C.;Thanushkodi, K.;Padmanabhan, Preethi;Feroze, Naina Mohamed D.
    • Journal of Electrical Engineering and Technology
    • /
    • 제9권6호
    • /
    • pp.2118-2125
    • /
    • 2014
  • Gait Recognition is a new technique to identify the people by the way they walk. Human gait is a spatio-temporal phenomenon that typifies the motion characteristics of an individual. The proposed method makes a simple but efficient attempt to gait recognition. For each video file, spatial silhouettes of a walker are extracted by an improved background subtraction procedure using Gaussian Mixture Model (GMM). Here GMM is used as a parametric probability density function represented as a weighted sum of Gaussian component densities. Then, the relevant features are extracted from the silhouette tracked from the given video file using the Principal Component Analysis (PCA) method. The Fisher Linear Discriminant Analysis (FLDA) classifier is used in the classification of dimensional reduced image derived by the PCA method for gait recognition. Although gait images can be easily acquired, the gait recognition is affected by clothes, shoes, carrying status and specific physical condition of an individual. To overcome this problem, it is combined with footprint as a multimodal biometric system. The minutiae is extracted from the footprint and then fused with silhouette image using the Discrete Stationary Wavelet Transform (DSWT). The experimental result shows that the efficiency of proposed fusion algorithm works well and attains better result while comparing with other fusion schemes.

SoC FPGA 기반 실시간 객체 인식 및 추적 시스템 구현 (An Implementation of SoC FPGA-based Real-time Object Recognition and Tracking System)

  • 김동진;주연정;박영석
    • 대한임베디드공학회논문지
    • /
    • 제10권6호
    • /
    • pp.363-372
    • /
    • 2015
  • Recent some SoC FPGA Releases that integrate ARM processor and FPGA fabric show better performance compared to the ASIC SoC used in typical embedded image processing system. In this study, using the above advantages, we implement a SoC FPGA-based Real-Time Object Recognition and Tracking System. In our system, the video input and output, image preprocessing process, and background subtraction processing were implemented in FPGA logics. And the object recognition and tracking processes were implemented in ARM processor-based programs. Our system provides the processing performance of 5.3 fps for the SVGA video input. This is about 79 times faster processing power than software approach based on the Nios II Soft-core processor, and about 4 times faster than approach based the HPS processor. Consequently, if the object recognition and tracking system takes a design structure combined with the FPGA logic and HPS processor-based processes of recent SoC FPGA Releases, then the real-time processing is possible because the processing speed is improved than the system that be handled only by the software approach.

Dense RGB-D Map-Based Human Tracking and Activity Recognition using Skin Joints Features and Self-Organizing Map

  • Farooq, Adnan;Jalal, Ahmad;Kamal, Shaharyar
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제9권5호
    • /
    • pp.1856-1869
    • /
    • 2015
  • This paper addresses the issues of 3D human activity detection, tracking and recognition from RGB-D video sequences using a feature structured framework. During human tracking and activity recognition, initially, dense depth images are captured using depth camera. In order to track human silhouettes, we considered spatial/temporal continuity, constraints of human motion information and compute centroids of each activity based on chain coding mechanism and centroids point extraction. In body skin joints features, we estimate human body skin color to identify human body parts (i.e., head, hands, and feet) likely to extract joint points information. These joints points are further processed as feature extraction process including distance position features and centroid distance features. Lastly, self-organized maps are used to recognize different activities. Experimental results demonstrate that the proposed method is reliable and efficient in recognizing human poses at different realistic scenes. The proposed system should be applicable to different consumer application systems such as healthcare system, video surveillance system and indoor monitoring systems which track and recognize different activities of multiple users.