• Title/Summary/Keyword: Fully convolution network

Search Result 35, Processing Time 0.019 seconds

Residual Learning Based CNN for Gesture Recognition in Robot Interaction

  • Han, Hua
    • Journal of Information Processing Systems
    • /
    • v.17 no.2
    • /
    • pp.385-398
    • /
    • 2021
  • The complexity of deep learning models affects the real-time performance of gesture recognition, thereby limiting the application of gesture recognition algorithms in actual scenarios. Hence, a residual learning neural network based on a deep convolutional neural network is proposed. First, small convolution kernels are used to extract the local details of gesture images. Subsequently, a shallow residual structure is built to share weights, thereby avoiding gradient disappearance or gradient explosion as the network layer deepens; consequently, the difficulty of model optimisation is simplified. Additional convolutional neural networks are used to accelerate the refinement of deep abstract features based on the spatial importance of the gesture feature distribution. Finally, a fully connected cascade softmax classifier is used to complete the gesture recognition. Compared with the dense connection multiplexing feature information network, the proposed algorithm is optimised in feature multiplexing to avoid performance fluctuations caused by feature redundancy. Experimental results from the ISOGD gesture dataset and Gesture dataset prove that the proposed algorithm affords a fast convergence speed and high accuracy.

DP-LinkNet: A convolutional network for historical document image binarization

  • Xiong, Wei;Jia, Xiuhong;Yang, Dichun;Ai, Meihui;Li, Lirong;Wang, Song
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.5
    • /
    • pp.1778-1797
    • /
    • 2021
  • Document image binarization is an important pre-processing step in document analysis and archiving. The state-of-the-art models for document image binarization are variants of encoder-decoder architectures, such as FCN (fully convolutional network) and U-Net. Despite their success, they still suffer from three limitations: (1) reduced feature map resolution due to consecutive strided pooling or convolutions, (2) multiple scales of target objects, and (3) reduced localization accuracy due to the built-in invariance of deep convolutional neural networks (DCNNs). To overcome these three challenges, we propose an improved semantic segmentation model, referred to as DP-LinkNet, which adopts the D-LinkNet architecture as its backbone, with the proposed hybrid dilated convolution (HDC) and spatial pyramid pooling (SPP) modules between the encoder and the decoder. Extensive experiments are conducted on recent document image binarization competition (DIBCO) and handwritten document image binarization competition (H-DIBCO) benchmark datasets. Results show that our proposed DP-LinkNet outperforms other state-of-the-art techniques by a large margin. Our implementation and the pre-trained models are available at https://github.com/beargolden/DP-LinkNet.

Improvement of Face Recognition Algorithm for Residential Area Surveillance System Based on Graph Convolution Network (그래프 컨벌루션 네트워크 기반 주거지역 감시시스템의 얼굴인식 알고리즘 개선)

  • Tan Heyi;Byung-Won Min
    • Journal of Internet of Things and Convergence
    • /
    • v.10 no.2
    • /
    • pp.1-15
    • /
    • 2024
  • The construction of smart communities is a new method and important measure to ensure the security of residential areas. In order to solve the problem of low accuracy in face recognition caused by distorting facial features due to monitoring camera angles and other external factors, this paper proposes the following optimization strategies in designing a face recognition network: firstly, a global graph convolution module is designed to encode facial features as graph nodes, and a multi-scale feature enhancement residual module is designed to extract facial keypoint features in conjunction with the global graph convolution module. Secondly, after obtaining facial keypoints, they are constructed as a directed graph structure, and graph attention mechanisms are used to enhance the representation power of graph features. Finally, tensor computations are performed on the graph features of two faces, and the aggregated features are extracted and discriminated by a fully connected layer to determine whether the individuals' identities are the same. Through various experimental tests, the network designed in this paper achieves an AUC index of 85.65% for facial keypoint localization on the 300W public dataset and 88.92% on a self-built dataset. In terms of face recognition accuracy, the proposed network achieves an accuracy of 83.41% on the IBUG public dataset and 96.74% on a self-built dataset. Experimental results demonstrate that the network designed in this paper exhibits high detection and recognition accuracy for faces in surveillance videos.

AANet: Adjacency auxiliary network for salient object detection

  • Li, Xialu;Cui, Ziguan;Gan, Zongliang;Tang, Guijin;Liu, Feng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.10
    • /
    • pp.3729-3749
    • /
    • 2021
  • At present, deep convolution network-based salient object detection (SOD) has achieved impressive performance. However, it is still a challenging problem to make full use of the multi-scale information of the extracted features and which appropriate feature fusion method is adopted to process feature mapping. In this paper, we propose a new adjacency auxiliary network (AANet) based on multi-scale feature fusion for SOD. Firstly, we design the parallel connection feature enhancement module (PFEM) for each layer of feature extraction, which improves the feature density by connecting different dilated convolution branches in parallel, and add channel attention flow to fully extract the context information of features. Then the adjacent layer features with close degree of abstraction but different characteristic properties are fused through the adjacent auxiliary module (AAM) to eliminate the ambiguity and noise of the features. Besides, in order to refine the features effectively to get more accurate object boundaries, we design adjacency decoder (AAM_D) based on adjacency auxiliary module (AAM), which concatenates the features of adjacent layers, extracts their spatial attention, and then combines them with the output of AAM. The outputs of AAM_D features with semantic information and spatial detail obtained from each feature are used as salient prediction maps for multi-level feature joint supervising. Experiment results on six benchmark SOD datasets demonstrate that the proposed method outperforms similar previous methods.

Speech emotion recognition using attention mechanism-based deep neural networks (주목 메커니즘 기반의 심층신경망을 이용한 음성 감정인식)

  • Ko, Sang-Sun;Cho, Hye-Seung;Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.36 no.6
    • /
    • pp.407-412
    • /
    • 2017
  • In this paper, we propose a speech emotion recognition method using a deep neural network based on the attention mechanism. The proposed method consists of a combination of CNN (Convolution Neural Networks), GRU (Gated Recurrent Unit), DNN (Deep Neural Networks) and attention mechanism. The spectrogram of the speech signal contains characteristic patterns according to the emotion. Therefore, we modeled characteristic patterns according to the emotion by applying the tuned Gabor filters as convolutional filter of typical CNN. In addition, we applied the attention mechanism with CNN and FC (Fully-Connected) layer to obtain the attention weight by considering context information of extracted features and used it for emotion recognition. To verify the proposed method, we conducted emotion recognition experiments on six emotions. The experimental results show that the proposed method achieves higher performance in speech emotion recognition than the conventional methods.

Real-time 3D Pose Estimation of Both Human Hands via RGB-Depth Camera and Deep Convolutional Neural Networks (RGB-Depth 카메라와 Deep Convolution Neural Networks 기반의 실시간 사람 양손 3D 포즈 추정)

  • Park, Na Hyeon;Ji, Yong Bin;Gi, Geon;Kim, Tae Yeon;Park, Hye Min;Kim, Tae-Seong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.10a
    • /
    • pp.686-689
    • /
    • 2018
  • 3D 손 포즈 추정(Hand Pose Estimation, HPE)은 스마트 인간 컴퓨터 인터페이스를 위해서 중요한 기술이다. 이 연구에서는 딥러닝 방법을 기반으로 하여 단일 RGB-Depth 카메라로 촬영한 양손의 3D 손 자세를 실시간으로 인식하는 손 포즈 추정 시스템을 제시한다. 손 포즈 추정 시스템은 4단계로 구성된다. 첫째, Skin Detection 및 Depth cutting 알고리즘을 사용하여 양손을 RGB와 깊이 영상에서 감지하고 추출한다. 둘째, Convolutional Neural Network(CNN) Classifier는 오른손과 왼손을 구별하는데 사용된다. CNN Classifier 는 3개의 convolution layer와 2개의 Fully-Connected Layer로 구성되어 있으며, 추출된 깊이 영상을 입력으로 사용한다. 셋째, 학습된 CNN regressor는 추출된 왼쪽 및 오른쪽 손의 깊이 영상에서 손 관절을 추정하기 위해 다수의 Convolutional Layers, Pooling Layers, Fully Connected Layers로 구성된다. CNN classifier와 regressor는 22,000개 깊이 영상 데이터셋으로 학습된다. 마지막으로, 각 손의 3D 손 자세는 추정된 손 관절 정보로부터 재구성된다. 테스트 결과, CNN classifier는 오른쪽 손과 왼쪽 손을 96.9%의 정확도로 구별할 수 있으며, CNN regressor는 형균 8.48mm의 오차 범위로 3D 손 관절 정보를 추정할 수 있다. 본 연구에서 제안하는 손 포즈 추정 시스템은 가상 현실(virtual reality, VR), 증강 현실(Augmented Reality, AR) 및 융합 현실 (Mixed Reality, MR) 응용 프로그램을 포함한 다양한 응용 분야에서 사용할 수 있다.

CNN-based damage identification method of tied-arch bridge using spatial-spectral information

  • Duan, Yuanfeng;Chen, Qianyi;Zhang, Hongmei;Yun, Chung Bang;Wu, Sikai;Zhu, Qi
    • Smart Structures and Systems
    • /
    • v.23 no.5
    • /
    • pp.507-520
    • /
    • 2019
  • In the structural health monitoring field, damage detection has been commonly carried out based on the structural model and the engineering features related to the model. However, the extracted features are often subjected to various errors, which makes the pattern recognition for damage detection still challenging. In this study, an automated damage identification method is presented for hanger cables in a tied-arch bridge using a convolutional neural network (CNN). Raw measurement data for Fourier amplitude spectra (FAS) of acceleration responses are used without a complex data pre-processing for modal identification. A CNN is a kind of deep neural network that typically consists of convolution, pooling, and fully-connected layers. A numerical simulation study was performed for multiple damage detection in the hangers using ambient wind vibration data on the bridge deck. The results show that the current CNN using FAS data performs better under various damage states than the CNN using time-history data and the traditional neural network using FAS. Robustness of the present CNN has been proven under various observational noise levels and wind speeds.

Association Analysis of Convolution Layer, Kernel and Accuracy in CNN (CNN의 컨볼루션 레이어, 커널과 정확도의 연관관계 분석)

  • Kong, Jun-Bea;Jang, Min-Seok
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.14 no.6
    • /
    • pp.1153-1160
    • /
    • 2019
  • In this paper, we experimented to find out how the number of convolution layers, the size, and the number of kernels affect the CNN. In addition, the general CNN was also tested for analysis and compared with the CNN used in the experiment. The neural networks used for the analysis are based on CNN, and each experimental model is experimented with the number of layers, the size, and the number of kernels at a constant value. All experiments were conducted using two layers of fully connected layers as a fixed. All other variables were tested with the same value. As the result of the analysis, when the number of layers is small, the data variance value is small regardless of the size and number of kernels, showing a solid accuracy. As the number of layers increases, the accuracy increases, but from above a certain number, the accuracy decreases, and the variance value also increases, resulting in a large accuracy deviation. The number of kernels had a greater effect on learning speed than other variables.

A fully deep learning model for the automatic identification of cephalometric landmarks

  • Kim, Young Hyun;Lee, Chena;Ha, Eun-Gyu;Choi, Yoon Jeong;Han, Sang-Sun
    • Imaging Science in Dentistry
    • /
    • v.51 no.3
    • /
    • pp.299-306
    • /
    • 2021
  • Purpose: This study aimed to propose a fully automatic landmark identification model based on a deep learning algorithm using real clinical data and to verify its accuracy considering inter-examiner variability. Materials and Methods: In total, 950 lateral cephalometric images from Yonsei Dental Hospital were used. Two calibrated examiners manually identified the 13 most important landmarks to set as references. The proposed deep learning model has a 2-step structure-a region of interest machine and a detection machine-each consisting of 8 convolution layers, 5 pooling layers, and 2 fully connected layers. The distance errors of detection between 2 examiners were used as a clinically acceptable range for performance evaluation. Results: The 13 landmarks were automatically detected using the proposed model. Inter-examiner agreement for all landmarks indicated excellent reliability based on the 95% confidence interval. The average clinically acceptable range for all 13 landmarks was 1.24 mm. The mean radial error between the reference values assigned by 1 expert and the proposed model was 1.84 mm, exhibiting a successful detection rate of 36.1%. The A-point, the incisal tip of the maxillary and mandibular incisors, and ANS showed lower mean radial error than the calibrated expert variability. Conclusion: This experiment demonstrated that the proposed deep learning model can perform fully automatic identification of cephalometric landmarks and achieve better results than examiners for some landmarks. It is meaningful to consider between-examiner variability for clinical applicability when evaluating the performance of deep learning methods in cephalometric landmark identification.

Pointwise CNN for 3D Object Classification on Point Cloud

  • Song, Wei;Liu, Zishu;Tian, Yifei;Fong, Simon
    • Journal of Information Processing Systems
    • /
    • v.17 no.4
    • /
    • pp.787-800
    • /
    • 2021
  • Three-dimensional (3D) object classification tasks using point clouds are widely used in 3D modeling, face recognition, and robotic missions. However, processing raw point clouds directly is problematic for a traditional convolutional network due to the irregular data format of point clouds. This paper proposes a pointwise convolution neural network (CNN) structure that can process point cloud data directly without preprocessing. First, a 2D convolutional layer is introduced to percept coordinate information of each point. Then, multiple 2D convolutional layers and a global max pooling layer are applied to extract global features. Finally, based on the extracted features, fully connected layers predict the class labels of objects. We evaluated the proposed pointwise CNN structure on the ModelNet10 dataset. The proposed structure obtained higher accuracy compared to the existing methods. Experiments using the ModelNet10 dataset also prove that the difference in the point number of point clouds does not significantly influence on the proposed pointwise CNN structure.