• Title/Summary/Keyword: Deep Residual Network

Search Result 110, Processing Time 0.023 seconds

Effective Hand Gesture Recognition by Key Frame Selection and 3D Neural Network

  • Hoang, Nguyen Ngoc;Lee, Guee-Sang;Kim, Soo-Hyung;Yang, Hyung-Jeong
    • Smart Media Journal
    • /
    • v.9 no.1
    • /
    • pp.23-29
    • /
    • 2020
  • This paper presents an approach for dynamic hand gesture recognition by using algorithm based on 3D Convolutional Neural Network (3D_CNN), which is later extended to 3D Residual Networks (3D_ResNet), and the neural network based key frame selection. Typically, 3D deep neural network is used to classify gestures from the input of image frames, randomly sampled from a video data. In this work, to improve the classification performance, we employ key frames which represent the overall video, as the input of the classification network. The key frames are extracted by SegNet instead of conventional clustering algorithms for video summarization (VSUMM) which require heavy computation. By using a deep neural network, key frame selection can be performed in a real-time system. Experiments are conducted using 3D convolutional kernels such as 3D_CNN, Inflated 3D_CNN (I3D) and 3D_ResNet for gesture classification. Our algorithm achieved up to 97.8% of classification accuracy on the Cambridge gesture dataset. The experimental results show that the proposed approach is efficient and outperforms existing methods.

Indoor Environment Drone Detection through DBSCAN and Deep Learning

  • Ha Tran Thi;Hien Pham The;Yun-Seok Mun;Ic-Pyo Hong
    • Journal of IKEEE
    • /
    • v.27 no.4
    • /
    • pp.439-449
    • /
    • 2023
  • In an era marked by the increasing use of drones and the growing demand for indoor surveillance, the development of a robust application for detecting and tracking both drones and humans within indoor spaces becomes imperative. This study presents an innovative application that uses FMCW radar to detect human and drone motions from the cloud point. At the outset, the DBSCAN (Density-based Spatial Clustering of Applications with Noise) algorithm is utilized to categorize cloud points into distinct groups, each representing the objects present in the tracking area. Notably, this algorithm demonstrates remarkable efficiency, particularly in clustering drone point clouds, achieving an impressive accuracy of up to 92.8%. Subsequently, the clusters are discerned and classified into either humans or drones by employing a deep learning model. A trio of models, including Deep Neural Network (DNN), Residual Network (ResNet), and Long Short-Term Memory (LSTM), are applied, and the outcomes reveal that the ResNet model achieves the highest accuracy. It attains an impressive 98.62% accuracy for identifying drone clusters and a noteworthy 96.75% accuracy for human clusters.

Multiple Hint Information-based Knowledge Transfer with Block-wise Retraining (블록 계층별 재학습을 이용한 다중 힌트정보 기반 지식전이 학습)

  • Bae, Ji-Hoon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.15 no.2
    • /
    • pp.43-49
    • /
    • 2020
  • In this paper, we propose a stage-wise knowledge transfer method that uses block-wise retraining to transfer the useful knowledge of a pre-trained residual network (ResNet) in a teacher-student framework (TSF). First, multiple hint information transfer and block-wise supervised retraining of the information was alternatively performed between teacher and student ResNet models. Next, Softened output information-based knowledge transfer was additionally considered in the TSF. The results experimentally showed that the proposed method using multiple hint-based bottom-up knowledge transfer coupled with incremental block-wise retraining provided the improved student ResNet with higher accuracy than existing KD and hint-based knowledge transfer methods considered in this study.

Research on Damage Identification of Buried Pipeline Based on Fiber Optic Vibration Signal

  • Weihong Lin;Wei Peng;Yong Kong;Zimin Shen;Yuzhou Du;Leihong Zhang;Dawei Zhang
    • Current Optics and Photonics
    • /
    • v.7 no.5
    • /
    • pp.511-517
    • /
    • 2023
  • Pipelines play an important role in urban water supply and drainage, oil and gas transmission, etc. This paper presents a technique for pattern recognition of fiber optic vibration signals collected by a distributed vibration sensing (DVS) system using a deep learning residual network (ResNet). The optical fiber is laid on the pipeline, and the signal is collected by the DVS system and converted into a 64 × 64 single-channel grayscale image. The grayscale image is input into the ResNet to extract features, and finally the K-nearest-neighbors (KNN) algorithm is used to achieve the classification and recognition of pipeline damage.

Deepfake Detection using Supervised Temporal Feature Extraction model and LSTM (지도 학습한 시계열적 특징 추출 모델과 LSTM을 활용한 딥페이크 판별 방법)

  • Lee, Chunghwan;Kim, Jaihoon;Yoon, Kijung
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • fall
    • /
    • pp.91-94
    • /
    • 2021
  • As deep learning technologies becoming developed, realistic fake videos synthesized by deep learning models called "Deepfake" videos became even more difficult to distinguish from original videos. As fake news or Deepfake blackmailing are causing confusion and serious problems, this paper suggests a novel model detecting Deepfake videos. We chose Residual Convolutional Neural Network (Resnet50) as an extraction model and Long Short-Term Memory (LSTM) which is a form of Recurrent Neural Network (RNN) as a classification model. We adopted cosine similarity with hinge loss to train our extraction model in embedding the features of Deepfake and original video. The result in this paper demonstrates that temporal features in the videos are essential for detecting Deepfake videos.

  • PDF

A New Residual Attention Network based on Attention Models for Human Action Recognition in Video

  • Kim, Jee-Hyun;Cho, Young-Im
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.1
    • /
    • pp.55-61
    • /
    • 2020
  • With the development of deep learning technology and advances in computing power, video-based research is now gaining more and more attention. Video data contains a large amount of temporal and spatial information, which is the biggest difference compared with image data. It has a larger amount of data. It has attracted intense attention in computer vision. Among them, motion recognition is one of the research focuses. However, the action recognition of human in the video is extremely complex and challenging subject. Based on many research in human beings, we have found that artificial intelligence-like attention mechanisms are an efficient model for cognition. This efficient model is ideal for processing image information and complex continuous video information. We introduce this attention mechanism into video action recognition, paying attention to human actions in video and effectively improving recognition efficiency. In this paper, we propose a new 3D residual attention network using convolutional neural network based on two attention models to identify human action behavior in the video. An evaluation result of our model showed up to 90.7% accuracy.

Applicability of Image Classification Using Deep Learning in Small Area : Case of Agricultural Lands Using UAV Image (딥러닝을 이용한 소규모 지역의 영상분류 적용성 분석 : UAV 영상을 이용한 농경지를 대상으로)

  • Choi, Seok-Keun;Lee, Soung-Ki;Kang, Yeon-Bin;Seong, Seon-Kyeong;Choi, Do-Yeon;Kim, Gwang-Ho
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.38 no.1
    • /
    • pp.23-33
    • /
    • 2020
  • Recently, high-resolution images can be easily acquired using UAV (Unmanned Aerial Vehicle), so that it is possible to produce small area observation and spatial information at low cost. In particular, research on the generation of cover maps in crop production areas is being actively conducted for monitoring the agricultural environment. As a result of comparing classification performance by applying RF(Random Forest), SVM(Support Vector Machine) and CNN(Convolutional Neural Network), deep learning classification method has many advantages in image classification. In particular, land cover classification using satellite images has the advantage of accuracy and time of classification using satellite image data set and pre-trained parameters. However, UAV images have different characteristics such as satellite images and spatial resolution, which makes it difficult to apply them. In order to solve this problem, we conducted a study on the application of deep learning algorithms that can be used for analyzing agricultural lands where UAV data sets and small-scale composite cover exist in Korea. In this study, we applied DeepLab V3 +, FC-DenseNet (Fully Convolutional DenseNets) and FRRN-B (Full-Resolution Residual Networks), the semantic image classification of the state-of-art algorithm, to UAV data set. As a result, DeepLab V3 + and FC-DenseNet have an overall accuracy of 97% and a Kappa coefficient of 0.92, which is higher than the conventional classification. The applicability of the cover classification using UAV images of small areas is shown.

Super-resolution in Music Score Images by Instance Normalization

  • Tran, Minh-Trieu;Lee, Guee-Sang
    • Smart Media Journal
    • /
    • v.8 no.4
    • /
    • pp.64-71
    • /
    • 2019
  • The performance of an OMR (Optical Music Recognition) system is usually determined by the characterizing features of the input music score images. Low resolution is one of the main factors leading to degraded image quality. In this paper, we handle the low-resolution problem using the super-resolution technique. We propose the use of a deep neural network with instance normalization to improve the quality of music score images. We apply instance normalization which has proven to be beneficial in single image enhancement. It works better than batch normalization, which shows the effectiveness of shifting the mean and variance of deep features at the instance level. The proposed method provides an end-to-end mapping technique between the high and low-resolution images respectively. New images are then created, in which the resolution is four times higher than the resolution of the original images. Our model has been evaluated with the dataset "DeepScores" and shows that it outperforms other existing methods.

A Study on the Outlet Blockage Determination Technology of Conveyor System using Deep Learning

  • Jeong, Eui-Han;Suh, Young-Joo;Kim, Dong-Ju
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.5
    • /
    • pp.11-18
    • /
    • 2020
  • This study proposes a technique for the determination of outlet blockage using deep learning in a conveyor system. The proposed method aims to apply the best model to the actual process, where we train various CNN models for the determination of outlet blockage using images collected by CCTV in an industrial scene. We used the well-known CNN model such as VGGNet, ResNet, DenseNet and NASNet, and used 18,000 images collected by CCTV for model training and performance evaluation. As a experiment result with various models, VGGNet showed the best performance with 99.03% accuracy and 29.05ms processing time, and we confirmed that VGGNet is suitable for the determination of outlet blockage.

FGW-FER: Lightweight Facial Expression Recognition with Attention

  • Huy-Hoang Dinh;Hong-Quan Do;Trung-Tung Doan;Cuong Le;Ngo Xuan Bach;Tu Minh Phuong;Viet-Vu Vu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.9
    • /
    • pp.2505-2528
    • /
    • 2023
  • The field of facial expression recognition (FER) has been actively researched to improve human-computer interaction. In recent years, deep learning techniques have gained popularity for addressing FER, with numerous studies proposing end-to-end frameworks that stack or widen significant convolutional neural network layers. While this has led to improved performance, it has also resulted in larger model sizes and longer inference times. To overcome this challenge, our work introduces a novel lightweight model architecture. The architecture incorporates three key factors: Depth-wise Separable Convolution, Residual Block, and Attention Modules. By doing so, we aim to strike a balance between model size, inference speed, and accuracy in FER tasks. Through extensive experimentation on popular benchmark FER datasets, our proposed method has demonstrated promising results. Notably, it stands out due to its substantial reduction in parameter count and faster inference time, while maintaining accuracy levels comparable to other lightweight models discussed in the existing literature.