Search | Korea Science

A Framework for Facial Expression Recognition Combining Contextual Information and Attention Mechanism

Jianzeng Chen;Ningning Chen
- Journal of Information Processing Systems
- /
- v.20 no.4
- /
- pp.535-549
- /
- 2024
Facial expressions (FEs) serve as fundamental components for human emotion assessment and human-computer interaction. Traditional convolutional neural networks tend to overlook valuable information during the FE feature extraction, resulting in suboptimal recognition rates. To address this problem, we propose a deep learning framework that incorporates hierarchical feature fusion, contextual data, and an attention mechanism for precise FE recognition. In our approach, we leveraged an enhanced VGGNet16 as the backbone network and introduced an improved group convolutional channel attention (GCCA) module in each block to emphasize the crucial expression features. A partial decoder was added at the end of the backbone network to facilitate the fusion of multilevel features for a comprehensive feature map. A reverse attention mechanism guides the model to refine details layer-by-layer while introducing contextual information and extracting richer expression features. To enhance feature distinguishability, we employed islanding loss in combination with softmax loss, creating a joint loss function. Using two open datasets, our experimental results demonstrated the effectiveness of our framework. Our framework achieved an average accuracy rate of 74.08% on the FER2013 dataset and 98.66% on the CK+ dataset, outperforming advanced methods in both recognition accuracy and stability.
https://doi.org/10.3745/JIPS.01.0107 인용 PDF

Deep Learning-based Super Resolution Method Using Combination of Channel Attention and Spatial Attention (채널 강조와 공간 강조의 결합을 이용한 딥 러닝 기반의 초해상도 방법)

Lee, Dong-Woo;Lee, Sang-Hun;Han, Hyun Ho
- Journal of the Korea Convergence Society
- /
- v.11 no.12
- /
- pp.15-22
- /
- 2020
In this paper, we proposed a deep learning based super-resolution method that combines Channel Attention and Spatial Attention feature enhancement methods. It is important to restore high-frequency components, such as texture and features, that have large changes in surrounding pixels during super-resolution processing. We proposed a super-resolution method using feature enhancement that combines Channel Attention and Spatial Attention. The existing CNN (Convolutional Neural Network) based super-resolution method has difficulty in deep network learning and lacks emphasis on high frequency components, resulting in blurry contours and distortion. In order to solve the problem, we used an emphasis block that combines Channel Attention and Spatial Attention to which Skip Connection was applied, and a Residual Block. The emphasized feature map extracted by the method was extended through Sub-pixel Convolution to obtain the super resolution. As a result, about PSNR improved by 5%, SSIM improved by 3% compared with the conventional SRCNN, and by comparison with VDSR, about PSNR improved by 2% and SSIM improved by 1%.
https://doi.org/10.15207/JKCS.2020.11.12.015 인용 PDF KSCI

Channel Attention Module in Convolutional Neural Network and Its Application to SAR Target Recognition Under Limited Angular Diversity Condition (합성곱 신경망의 Channel Attention 모듈 및 제한적인 각도 다양성 조건에서의 SAR 표적영상 식별로의 적용)

Park, Ji-Hoon;Seo, Seung-Mo;Yoo, Ji Hee
- Journal of the Korea Institute of Military Science and Technology
- /
- v.24 no.2
- /
- pp.175-186
- /
- 2021
In the field of automatic target recognition(ATR) with synthetic aperture radar(SAR) imagery, it is usually impractical to obtain SAR target images covering a full range of aspect views. When the database consists of SAR target images with limited angular diversity, it can lead to performance degradation of the SAR-ATR system. To address this problem, this paper proposes a deep learning-based method where channel attention modules(CAMs) are inserted to a convolutional neural network(CNN). Motivated by the idea of the squeeze-and-excitation(SE) network, the CAM is considered to help improve recognition performance by selectively emphasizing discriminative features and suppressing ones with less information. After testing various CAM types included in the ResNet18-type base network, the SE CAM and its modified forms are applied to SAR target recognition using MSTAR dataset with different reduction ratios in order to validate recognition performance improvement under the limited angular diversity condition.
https://doi.org/10.9766/KIMST.2021.24.2.175 인용 PDF KSCI

Convolutional GRU and Attention based Fall Detection Integrating with Human Body Keypoints and DensePose

Yi Zheng;Cunyi Liao;Ruifeng Xiao;Qiang He
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.18 no.9
- /
- pp.2782-2804
- /
- 2024
The integration of artificial intelligence technology with medicine has rapidly evolved, with increasing demands for quality of life. However, falls remain a significant risk leading to severe injuries and fatalities, especially among the elderly. Therefore, the development and application of computer vision-based fall detection technologies have become increasingly important. In this paper, firstly, the keypoint detection algorithm ViTPose++ is used to obtain the coordinates of human body keypoints from the camera images. Human skeletal feature maps are generated from this keypoint coordinate information. Meanwhile, human dense feature maps are produced based on the DensePose algorithm. Then, these two types of feature maps are confused as dual-channel inputs for the model. The convolutional gated recurrent unit is introduced to extract the frame-to-frame relevance in the process of falling. To further integrate features across three dimensions (spatio-temporal-channel), a dual-channel fall detection algorithm based on video streams is proposed by combining the Convolutional Block Attention Module (CBAM) with the ConvGRU. Finally, experiments on the public UR Fall Detection Dataset demonstrate that the improved ConvGRU-CBAM achieves an F1 score of 92.86% and an AUC of 95.34%.
https://doi.org/10.3837/tiis.2024.09.016 인용 PDF HTML

Attention-based for Multiscale Fusion Underwater Image Enhancement

Huang, Zhixiong;Li, Jinjiang;Hua, Zhen
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.16 no.2
- /
- pp.544-564
- /
- 2022
Underwater images often suffer from color distortion, blurring and low contrast, which is caused by the propagation of light in the underwater environment being affected by the two processes: absorption and scattering. To cope with the poor quality of underwater images, this paper proposes a multiscale fusion underwater image enhancement method based on channel attention mechanism and local binary pattern (LBP). The network consists of three modules: feature aggregation, image reconstruction and LBP enhancement. The feature aggregation module aggregates feature information at different scales of the image, and the image reconstruction module restores the output features to high-quality underwater images. The network also introduces channel attention mechanism to make the network pay more attention to the channels containing important information. The detail information is protected by real-time superposition with feature information. Experimental results demonstrate that the method in this paper produces results with correct colors and complete details, and outperforms existing methods in quantitative metrics.
https://doi.org/10.3837/tiis.2022.02.010 인용 PDF KSCI HTML

Performance Evaluation of AHDR Model using Channel Attention (채널 어텐션을 이용한 AHDR 모델의 성능 평가)

Youn, Seok Jun;Lee, Keuntek;Cho, Nam Ik
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2021.06a
- /
- pp.335-338
- /
- 2021
본 논문에서는 기존 AHDRNet에 channel attention 기법을 적용했을 때 성능에 어떠한 변화가 있는지를 평가하였다. 기존 모델의 병합 망에 존재하는 DRDB(Dilated Residual Dense Block) 사이, 그리고 DRDB 내의 확장된 합성곱 레이어 (dilated convolutional layer) 뒤에 또다른 합성곱 레이어를 추가하는 방식으로 channel attention 기법을 적용하였다. 데이터셋은 Kalantari의 데이터셋을 사용하였으며, PSNR(Peak Signal-to-Noise Ratio)로 비교해본 결과 기존의 AHDRNet의 PSNR은 42.1656이며, 제안된 모델의 PSNR은 42.8135로 더 높아진 것을 확인하였다.
PDF

Convolutional Network with Densely Backward Attention for Facial Expression Recognition (얼굴 표정 인식을 위한 Densely Backward Attention 기반 컨볼루션 네트워크)

Seo, Hyun-Seok;Hua, Cam-Hao;Lee, Sung-Young
- Annual Conference of KIPS
- /
- 2019.10a
- /
- pp.958-961
- /
- 2019
Convolutional neural network(CNN)의 등장으로 얼굴 표현 인식 연구는 많은 발전을 이루었다. 그러나, 기존의 CNN 접근법은 미리 학습된 훈련모델에서 Multiple-level 의 의미적 맥락을 포함하지 않는 Attention-embedded 문제가 발생한다. 사람의 얼굴 감정은 다양한 근육의 움직임과 결합에 기초하여 관찰되며, CNN 에서 딥 레이어의 산출물로 나온 특징들의 결합은 많은 서브샘플링 단계를 통해서 class 구별와 같은 의미 정보의 손실이 일어나기 때문에 전이 학습을 통한 올바른 훈련 모델 생성이 어렵다는 단점이 있다. 따라서, 본 논문은 Backbone 네트워크의 Multi-level 특성에서 Channel-wise Attention 통합 및 의미 정보를 포함하여 높은 인식 성능을 달성하는 Densely Backwarnd Attention(DBA) CNN 방법을 제안한다. 제안하는 기법은 High-level 기능에서 채널 간 시멘틱 정보를 활용하여 세분화된 시멘틱 정보를 Low-level 버전에서 다시 재조정한다. 그런 다음, 중요한 얼굴 표정의 묘사를 분명하게 포함시키기 위해서 multi-level 데이터를 통합하는 단계를 추가로 실행한다. 실험을 통해, 제안된 접근방법이 정확도 79.37%를 달성 하여 제안 기술이 효율성이 있음을 증명하였다.
https://doi.org/10.3745/PKIPS.y2019m10a.958 인용 PDF

Multimode-fiber Speckle Image Reconstruction Based on Multiscale Convolution and a Multidimensional Attention Mechanism

Kai Liu;Leihong Zhang;Runchu Xu;Dawei Zhang;Haima Yang;Quan Sun
- Current Optics and Photonics
- /
- v.8 no.5
- /
- pp.463-471
- /
- 2024
Multimode fibers (MMFs) possess high information throughput and small core diameter, making them highly promising for applications such as endoscopy and communication. However, modal dispersion hinders the direct use of MMFs for image transmission. By training neural networks on time-series waveforms collected from MMFs it is possible to reconstruct images, transforming blurred speckle patterns into recognizable images. This paper proposes a fully convolutional neural-network model, MSMDFNet, for image restoration in MMFs. The network employs an encoder-decoder architecture, integrating multiscale convolutional modules in the decoding layers to enhance the receptive field for feature extraction. Additionally, attention mechanisms are incorporated from both spatial and channel dimensions, to improve the network's feature-perception capabilities. The algorithm demonstrates excellent performance on MNIST and Fashion-MNIST datasets collected through MMFs, showing significant improvements in various metrics such as SSIM.
https://doi.org/10.3807/COPP.2024.8.5.463 인용 PDF

Deep Learning-Based Plant Health State Classification Using Image Data (영상 데이터를 이용한 딥러닝 기반 작물 건강 상태 분류 연구)

Ali Asgher Syed;Jaehawn Lee;Alvaro Fuentes;Sook Yoon;Dong Sun Park
- Journal of Internet of Things and Convergence
- /
- v.10 no.4
- /
- pp.43-53
- /
- 2024
Tomatoes are rich in nutrients like lycopene, β-carotene, and vitamin C. However, they often suffer from biological and environmental stressors, resulting in significant yield losses. Traditional manual plant health assessments are error-prone and inefficient for large-scale production. To address this need, we collected a comprehensive dataset covering the entire life span of tomato plants, annotated across 5 health states from 1 to 5. Our study introduces an Attention-Enhanced DS-ResNet architecture with Channel-wise attention and Grouped convolution, refined with new training techniques. Our model achieved an overall accuracy of 80.2% using 5-fold cross-validation, showcasing its robustness in precisely classifying the health states of tomato plants.
https://doi.org/10.20465/KIOTS.2024.10.4.043 인용 PDF

A Study on Lane Detection Based on Split-Attention Backbone Network (Split-Attention 백본 네트워크를 활용한 차선 인식에 관한 연구)

Song, In seo;Lee, Seon woo;Kwon, Jang woo;Won, Jong hoon
- The Journal of The Korea Institute of Intelligent Transport Systems
- /
- v.19 no.5
- /
- pp.178-188
- /
- 2020
This paper proposes a lane recognition CNN network using split-attention network as a backbone to extract feature. Split-attention is a method of assigning weight to each channel of a feature map in the CNN feature extraction process; it can reliably extract the features of an image during the rapidly changing driving environment of a vehicle. The proposed deep neural networks in this paper were trained and evaluated using the Tusimple data set. The change in performance according to the number of layers of the backbone network was compared and analyzed. A result comparable to the latest research was obtained with an accuracy of up to 96.26, and FN showed the best result. Therefore, even in the driving environment of an actual vehicle, stable lane recognition is possible without misrecognition using the model proposed in this study.
https://doi.org/10.12815/kits.2020.19.5.178 인용 PDF KSCI

Search Result 25, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)