• 제목/요약/키워드: spatiotemporal features

검색결과 40건 처리시간 0.019초

Video Expression Recognition Method Based on Spatiotemporal Recurrent Neural Network and Feature Fusion

  • Zhou, Xuan
    • Journal of Information Processing Systems
    • /
    • 제17권2호
    • /
    • pp.337-351
    • /
    • 2021
  • Automatically recognizing facial expressions in video sequences is a challenging task because there is little direct correlation between facial features and subjective emotions in video. To overcome the problem, a video facial expression recognition method using spatiotemporal recurrent neural network and feature fusion is proposed. Firstly, the video is preprocessed. Then, the double-layer cascade structure is used to detect a face in a video image. In addition, two deep convolutional neural networks are used to extract the time-domain and airspace facial features in the video. The spatial convolutional neural network is used to extract the spatial information features from each frame of the static expression images in the video. The temporal convolutional neural network is used to extract the dynamic information features from the optical flow information from multiple frames of expression images in the video. A multiplication fusion is performed with the spatiotemporal features learned by the two deep convolutional neural networks. Finally, the fused features are input to the support vector machine to realize the facial expression classification task. The experimental results on cNTERFACE, RML, and AFEW6.0 datasets show that the recognition rates obtained by the proposed method are as high as 88.67%, 70.32%, and 63.84%, respectively. Comparative experiments show that the proposed method obtains higher recognition accuracy than other recently reported methods.

신재생에너지 자원 관리를 위한 시공간 응용 기술 (Spatiotemporal Applications for Managing New&Renewable Energy Resources)

  • 이양구;류근호;김광득
    • 한국태양에너지학회:학술대회논문집
    • /
    • 한국태양에너지학회 2008년도 추계학술발표대회 논문집
    • /
    • pp.327-331
    • /
    • 2008
  • In this paper, we argue that new&renewable energy resources are difficult to be managed with GIS technology due to their spatiotemporal features, and suggest that spatiotemporal database and sensor network can be applied to the new&renewable energy management system as advanced technology. To give the motivated issues, we introduce and analyze the concept of the spatiotemporal database and sensor network, and the case studies in each applications.

  • PDF

No-reference quality assessment of dynamic sports videos based on a spatiotemporal motion model

  • Kim, Hyoung-Gook;Shin, Seung-Su;Kim, Sang-Wook;Lee, Gi Yong
    • ETRI Journal
    • /
    • 제43권3호
    • /
    • pp.538-548
    • /
    • 2021
  • This paper proposes an approach to improve the performance of no-reference video quality assessment for sports videos with dynamic motion scenes using an efficient spatiotemporal model. In the proposed method, we divide the video sequences into video blocks and apply a 3D shearlet transform that can efficiently extract primary spatiotemporal features to capture dynamic natural motion scene statistics from the incoming video blocks. The concatenation of a deep residual bidirectional gated recurrent neural network and logistic regression is used to learn the spatiotemporal correlation more robustly and predict the perceptual quality score. In addition, conditional video block-wise constraints are incorporated into the objective function to improve quality estimation performance for the entire video. The experimental results show that the proposed method extracts spatiotemporal motion information more effectively and predicts the video quality with higher accuracy than the conventional no-reference video quality assessment methods.

시.공간특징에 대해 적응할 수 있는 ROI 탐지 시스템 (An Adaptive ROI Detection System for Spatiotemporal Features)

  • 박민철;최경주
    • 한국콘텐츠학회논문지
    • /
    • 제6권1호
    • /
    • pp.41-53
    • /
    • 2006
  • 본 논문에서는 동영상에서 시간과 공간특징을 선택적으로 사용한 ROI(Region of Interest) 탐지 시스템을 소개한다. 동영상에서 명암도, 색상 등과 같은 공간특징을 사용한 공간상의 현저도 뿐만 아니라 시간상의 현저도도 얻기 위하여 모션이라는 시간특징을 사용하였다. 본 시스템에서는 동영상이 입력되면 공간특징 및 시간특징을 탐지하고, 이 특징과 관련된 기존의 심리학적 연구결과를 바탕으로 이들을 분석한다. 이렇게 분석된 결과는 하나로 통합되어 이를 기반으로 ROI 영역을 탐지한다. 일반적으로 비디오 영상에서 움직이는 개체나 영역은 같은 영상 안의 다른 개체나 영역보다 먼저 주의가 가게 되므로, 본 시스템에서는 분석된 결과를 통합하는 데 있어 시간특징인 모션을 공간특징보다 우선하여 통합한다. 시스템의 성능 분석을 위하여 동일한 실험영상을 가지고 인간을 대상으로 한 심리학적 실험을 우선 수행하였으며, 이 결과를 기준으로 본 시스템에서 얻어진 결과를 비교하여 모형의 성능을 분석하였다. 실험결과 공간특징만을 사용했을 때 보다 시간특징을 같이 사용함으로써 시스템의 성능을 보다 향상시킬 수 있었다.

  • PDF

객체기반의 시공간 단서와 이들의 동적결합 된돌출맵에 의한 상향식 인공시각주의 시스템 (A New Covert Visual Attention System by Object-based Spatiotemporal Cues and Their Dynamic Fusioned Saliency Map)

  • 최경주
    • 한국멀티미디어학회논문지
    • /
    • 제18권4호
    • /
    • pp.460-472
    • /
    • 2015
  • Most of previous visual attention system finds attention regions based on saliency map which is combined by multiple extracted features. The differences of these systems are in the methods of feature extraction and combination. This paper presents a new system which has an improvement in feature extraction method of color and motion, and in weight decision method of spatial and temporal features. Our system dynamically extracts one color which has the strongest response among two opponent colors, and detects the moving objects not moving pixels. As a combination method of spatial and temporal feature, the proposed system sets the weight dynamically by each features' relative activities. Comparative results show that our suggested feature extraction and integration method improved the detection rate of attention region.

Extraction and classification of tempo stimuli from electroencephalography recordings using convolutional recurrent attention model

  • Lee, Gi Yong;Kim, Min-Soo;Kim, Hyoung-Gook
    • ETRI Journal
    • /
    • 제43권6호
    • /
    • pp.1081-1092
    • /
    • 2021
  • Electroencephalography (EEG) recordings taken during the perception of music tempo contain information that estimates the tempo of a music piece. If information about this tempo stimulus in EEG recordings can be extracted and classified, it can be effectively used to construct a music-based brain-computer interface. This study proposes a novel convolutional recurrent attention model (CRAM) to extract and classify features corresponding to tempo stimuli from EEG recordings of listeners who listened with concentration to the tempo of musics. The proposed CRAM is composed of six modules, namely, network inputs, two-dimensional convolutional bidirectional gated recurrent unit-based sample encoder, sample-level intuitive attention, segment encoder, segment-level intuitive attention, and softmax layer, to effectively model spatiotemporal features and improve the classification accuracy of tempo stimuli. To evaluate the proposed method's performance, we conducted experiments on two benchmark datasets. The proposed method achieves promising results, outperforming recent methods.

Crime amount prediction based on 2D convolution and long short-term memory neural network

  • Dong, Qifen;Ye, Ruihui;Li, Guojun
    • ETRI Journal
    • /
    • 제44권2호
    • /
    • pp.208-219
    • /
    • 2022
  • Crime amount prediction is crucial for optimizing the police patrols' arrangement in each region of a city. First, we analyzed spatiotemporal correlations of the crime data and the relationships between crime and related auxiliary data, including points-of-interest (POI), public service complaints, and demographics. Then, we proposed a crime amount prediction model based on 2D convolution and long short-term memory neural network (2DCONV-LSTM). The proposed model captures the spatiotemporal correlations in the crime data, and the crime-related auxiliary data are used to enhance the regional spatial features. Extensive experiments on real-world datasets are conducted. Results demonstrated that capturing both temporal and spatial correlations in crime data and using auxiliary data to extract regional spatial features improve the prediction performance. In the best case scenario, the proposed model reduces the prediction error by at least 17.8% and 8.2% compared with support vector regression (SVR) and LSTM, respectively. Moreover, excessive auxiliary data reduce model performance because of the presence of redundant information.

Depth Images-based Human Detection, Tracking and Activity Recognition Using Spatiotemporal Features and Modified HMM

  • Kamal, Shaharyar;Jalal, Ahmad;Kim, Daijin
    • Journal of Electrical Engineering and Technology
    • /
    • 제11권6호
    • /
    • pp.1857-1862
    • /
    • 2016
  • Human activity recognition using depth information is an emerging and challenging technology in computer vision due to its considerable attention by many practical applications such as smart home/office system, personal health care and 3D video games. This paper presents a novel framework of 3D human body detection, tracking and recognition from depth video sequences using spatiotemporal features and modified HMM. To detect human silhouette, raw depth data is examined to extract human silhouette by considering spatial continuity and constraints of human motion information. While, frame differentiation is used to track human movements. Features extraction mechanism consists of spatial depth shape features and temporal joints features are used to improve classification performance. Both of these features are fused together to recognize different activities using the modified hidden Markov model (M-HMM). The proposed approach is evaluated on two challenging depth video datasets. Moreover, our system has significant abilities to handle subject's body parts rotation and body parts missing which provide major contributions in human activity recognition.

Human Activity Recognition Using Spatiotemporal 3-D Body Joint Features with Hidden Markov Models

  • Uddin, Md. Zia;Kim, Jaehyoun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제10권6호
    • /
    • pp.2767-2780
    • /
    • 2016
  • Video-based human-activity recognition has become increasingly popular due to the prominent corresponding applications in a variety of fields such as computer vision, image processing, smart-home healthcare, and human-computer interactions. The essential goals of a video-based activity-recognition system include the provision of behavior-based information to enable functionality that proactively assists a person with his/her tasks. The target of this work is the development of a novel approach for human-activity recognition, whereby human-body-joint features that are extracted from depth videos are used. From silhouette images taken at every depth, the direction and magnitude features are first obtained from each connected body-joint pair so that they can be augmented later with motion direction, as well as with the magnitude features of each joint in the next frame. A generalized discriminant analysis (GDA) is applied to make the spatiotemporal features more robust, followed by the feeding of the time-sequence features into a Hidden Markov Model (HMM) for the training of each activity. Lastly, all of the trained-activity HMMs are used for depth-video activity recognition.

A Video Expression Recognition Method Based on Multi-mode Convolution Neural Network and Multiplicative Feature Fusion

  • Ren, Qun
    • Journal of Information Processing Systems
    • /
    • 제17권3호
    • /
    • pp.556-570
    • /
    • 2021
  • The existing video expression recognition methods mainly focus on the spatial feature extraction of video expression images, but tend to ignore the dynamic features of video sequences. To solve this problem, a multi-mode convolution neural network method is proposed to effectively improve the performance of facial expression recognition in video. Firstly, OpenFace 2.0 is used to detect face images in video, and two deep convolution neural networks are used to extract spatiotemporal expression features. Furthermore, spatial convolution neural network is used to extract the spatial information features of each static expression image, and the dynamic information feature is extracted from the optical flow information of multiple expression images based on temporal convolution neural network. Then, the spatiotemporal features learned by the two deep convolution neural networks are fused by multiplication. Finally, the fused features are input into support vector machine to realize the facial expression classification. Experimental results show that the recognition accuracy of the proposed method can reach 64.57% and 60.89%, respectively on RML and Baum-ls datasets. It is better than that of other contrast methods.