통합 검색 | Korea Science

Audio and Video Bimodal Emotion Recognition in Social Networks Based on Improved AlexNet Network and Attention Mechanism

Liu, Min;Tang, Jun
- Journal of Information Processing Systems
- /
- 제17권4호
- /
- pp.754-771
- /
- 2021
In the task of continuous dimension emotion recognition, the parts that highlight the emotional expression are not the same in each mode, and the influences of different modes on the emotional state is also different. Therefore, this paper studies the fusion of the two most important modes in emotional recognition (voice and visual expression), and proposes a two-mode dual-modal emotion recognition method combined with the attention mechanism of the improved AlexNet network. After a simple preprocessing of the audio signal and the video signal, respectively, the first step is to use the prior knowledge to realize the extraction of audio characteristics. Then, facial expression features are extracted by the improved AlexNet network. Finally, the multimodal attention mechanism is used to fuse facial expression features and audio features, and the improved loss function is used to optimize the modal missing problem, so as to improve the robustness of the model and the performance of emotion recognition. The experimental results show that the concordance coefficient of the proposed model in the two dimensions of arousal and valence (concordance correlation coefficient) were 0.729 and 0.718, respectively, which are superior to several comparative algorithms.
https://doi.org/10.3745/JIPS.02.0161 인용 PDF KSCI

Tobacco Retail License Recognition Based on Dual Attention Mechanism

Shan, Yuxiang;Ren, Qin;Wang, Cheng;Wang, Xiuhui
- Journal of Information Processing Systems
- /
- 제18권4호
- /
- pp.480-488
- /
- 2022
Images of tobacco retail licenses have complex unstructured characteristics, which is an urgent technical problem in the robot process automation of tobacco marketing. In this paper, a novel recognition approach using a double attention mechanism is presented to realize the automatic recognition and information extraction from such images. First, we utilized a DenseNet network to extract the license information from the input tobacco retail license data. Second, bi-directional long short-term memory was used for coding and decoding using a continuous decoder integrating dual attention to realize the recognition and information extraction of tobacco retail license images without segmentation. Finally, several performance experiments were conducted using a largescale dataset of tobacco retail licenses. The experimental results show that the proposed approach achieves a correction accuracy of 98.36% on the ZY-LQ dataset, outperforming most existing methods.
https://doi.org/10.3745/JIPS.02.0177 인용 PDF KSCI

A Consecutive Motion and Situation Recognition Mechanism to Detect a Vulnerable Condition Based on Android Smartphone

Choi, Hoan-Suk;Lee, Gyu Myoung;Rhee, Woo-Seop
- International Journal of Contents
- /
- 제16권3호
- /
- pp.1-17
- /
- 2020
Human motion recognition is essential for user-centric services such as surveillance-based security, elderly condition monitoring, exercise tracking, daily calories expend analysis, etc. It is typically based on the movement data analysis such as the acceleration and angular velocity of a target user. The existing motion recognition studies are only intended to measure the basic information (e.g., user's stride, number of steps, speed) or to recognize single motion (e.g., sitting, running, walking). Thus, a new mechanism is required to identify the transition of single motions for assessing a user's consecutive motion more accurately as well as recognizing the user's body and surrounding situations arising from the motion. Thus, in this paper, we collect the human movement data through Android smartphones in real time for five targeting single motions and propose a mechanism to recognize a consecutive motion including transitions among various motions and an occurred situation, with the state transition model to check if a vulnerable (life-threatening) condition, especially for the elderly, has occurred or not. Through implementation and experiments, we demonstrate that the proposed mechanism recognizes a consecutive motion and a user's situation accurately and quickly. As a result of the recognition experiment about mix sequence likened to daily motion, the proposed adoptive weighting method showed 4% (Holding time=15 sec), 88% (30 sec), 6.5% (60 sec) improvements compared to static method.
https://doi.org/10.5392/IJoC.2020.16.3.001 인용 PDF KSCI HTML

주목 메커니즘 기반의 심층신경망을 이용한 음성 감정인식 (Speech emotion recognition using attention mechanism-based deep neural networks)

고상선;조혜승;김형국
- 한국음향학회지
- /
- 제36권6호
- /
- pp.407-412
- /
- 2017
본 논문에서는 주목 메커니즘 기반의 심층 신경망을 사용한 음성 감정인식 방법을 제안한다. 제안하는 방식은 CNN(Convolution Neural Networks), GRU(Gated Recurrent Unit), DNN(Deep Neural Networks)의 결합으로 이루어진 심층 신경망 구조와 주목 메커니즘으로 구성된다. 음성의 스펙트로그램에는 감정에 따른 특징적인 패턴이 포함되어 있으므로 제안하는 방식에서는 일반적인 CNN에서 컨벌루션 필터를 tuned Gabor 필터로 사용하는 GCNN(Gabor CNN)을 사용하여 패턴을 효과적으로 모델링한다. 또한 CNN과 FC(Fully-Connected)레이어 기반의 주목 메커니즘을 적용하여 추출된 특징의 맥락 정보를 고려한 주목 가중치를 구해 감정인식에 사용한다. 본 논문에서 제안하는 방식의 검증을 위해 6가지 감정에 대해 인식 실험을 진행하였다. 실험 결과, 제안한 방식이 음성 감정인식에서 기존의 방식보다 더 높은 성능을 보였다.
https://doi.org/10.7776/ASK.2017.36.6.407 인용 PDF KSCI

Object Recognition using Smart Tag and Stereo Vision System on Pan-Tilt Mechanism

Kim, Jin-Young;Im, Chang-Jun;Lee, Sang-Won;Lee, Ho-Gil
- 제어로봇시스템학회:학술대회논문집
- /
- 제어로봇시스템학회 2005년도 ICCAS
- /
- pp.2379-2384
- /
- 2005
We propose a novel method for object recognition using the smart tag system with a stereo vision on a pan-tilt mechanism. We developed a smart tag which included IRED device. The smart tag is attached onto the object. We also developed a stereo vision system which pans and tilts for the object image to be the centered on each whole image view. A Stereo vision system on the pan-tilt mechanism can map the position of IRED to the robot coordinate system by using pan-tilt angles. And then, to map the size and pose of the object for the robot to coordinate the system, we used a simple model-based vision algorithm. To increase the possibility of tag-based object recognition, we implemented our approach by using as easy and simple techniques as possible.
PDF

Adaptive low-resolution palmprint image recognition based on channel attention mechanism and modified deep residual network

Xu, Xuebin;Meng, Kan;Xing, Xiaomin;Chen, Chen
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- 제16권3호
- /
- pp.757-770
- /
- 2022
Palmprint recognition has drawn increasingly attentions in the past decade due to its uniqueness and reliability. Traditional palmprint recognition methods usually use high-resolution images as the identification basis so that they can achieve relatively high precision. However, high-resolution images mean more computation cost in the recognition process, which usually cannot be guaranteed in mobile computing. Therefore, this paper proposes an improved low-resolution palmprint image recognition method based on residual networks. The main contributions include: 1) We introduce a channel attention mechanism to refactor the extracted feature maps, which can pay more attention to the informative feature maps and suppress the useless ones. 2) The ResStage group structure proposed by us divides the original residual block into three stages, and we stabilize the signal characteristics before each stage by means of BN normalization operation to enhance the feature channel. Comparison experiments are conducted on a public dataset provided by the Hong Kong Polytechnic University. Experimental results show that the proposed method achieve a rank-1 accuracy of 98.17% when tested on low-resolution images with the size of 12dpi, which outperforms all the compared methods obviously.
https://doi.org/10.3837/tiis.2022.03.001 인용 PDF KSCI HTML

A Study on Improving License Plate Recognition Performance Using Super-Resolution Techniques

Kyeongseok JANG;Kwangchul SON
- 한국인공지능학회지
- /
- 제12권3호
- /
- pp.1-7
- /
- 2024
In this paper, we propose an innovative super-resolution technique to address the issue of reduced accuracy in license plate recognition caused by low-resolution images. Conventional vehicle license plate recognition systems have relied on images obtained from fixed surveillance cameras for traffic detection to perform vehicle detection, tracking, and license plate recognition. However, during this process, image quality degradation occurred due to the physical distance between the camera and the vehicle, vehicle movement, and external environmental factors such as weather and lighting conditions. In particular, the acquisition of low-resolution images due to camera performance limitations has been a major cause of significantly reduced accuracy in license plate recognition. To solve this problem, we propose a Single Image Super-Resolution (SISR) model with a parallel structure that combines Multi-Scale and Attention Mechanism. This model is capable of effectively extracting features at various scales and focusing on important areas. Specifically, it generates feature maps of various sizes through a multi-branch structure and emphasizes the key features of license plates using an Attention Mechanism. Experimental results show that the proposed model demonstrates significantly improved recognition accuracy compared to existing vehicle license plate super-resolution methods using Bicubic Interpolation.
https://doi.org/10.24225/kjai.2024.12.3.1 인용 PDF

ADD-Net: Attention Based 3D Dense Network for Action Recognition

Man, Qiaoyue;Cho, Young Im
- 한국컴퓨터정보학회논문지
- /
- 제24권6호
- /
- pp.21-28
- /
- 2019
Recent years with the development of artificial intelligence and the success of the deep model, they have been deployed in all fields of computer vision. Action recognition, as an important branch of human perception and computer vision system research, has attracted more and more attention. Action recognition is a challenging task due to the special complexity of human movement, the same movement may exist between multiple individuals. The human action exists as a continuous image frame in the video, so action recognition requires more computational power than processing static images. And the simple use of the CNN network cannot achieve the desired results. Recently, the attention model has achieved good results in computer vision and natural language processing. In particular, for video action classification, after adding the attention model, it is more effective to focus on motion features and improve performance. It intuitively explains which part the model attends to when making a particular decision, which is very helpful in real applications. In this paper, we proposed a 3D dense convolutional network based on attention mechanism(ADD-Net), recognition of human motion behavior in the video.
https://doi.org/10.9708/jksci.2019.24.06.021 인용 PDF KSCI HTML

A Framework for Facial Expression Recognition Combining Contextual Information and Attention Mechanism

Jianzeng Chen;Ningning Chen
- Journal of Information Processing Systems
- /
- 제20권4호
- /
- pp.535-549
- /
- 2024
Facial expressions (FEs) serve as fundamental components for human emotion assessment and human-computer interaction. Traditional convolutional neural networks tend to overlook valuable information during the FE feature extraction, resulting in suboptimal recognition rates. To address this problem, we propose a deep learning framework that incorporates hierarchical feature fusion, contextual data, and an attention mechanism for precise FE recognition. In our approach, we leveraged an enhanced VGGNet16 as the backbone network and introduced an improved group convolutional channel attention (GCCA) module in each block to emphasize the crucial expression features. A partial decoder was added at the end of the backbone network to facilitate the fusion of multilevel features for a comprehensive feature map. A reverse attention mechanism guides the model to refine details layer-by-layer while introducing contextual information and extracting richer expression features. To enhance feature distinguishability, we employed islanding loss in combination with softmax loss, creating a joint loss function. Using two open datasets, our experimental results demonstrated the effectiveness of our framework. Our framework achieved an average accuracy rate of 74.08% on the FER2013 dataset and 98.66% on the CK+ dataset, outperforming advanced methods in both recognition accuracy and stability.
https://doi.org/10.3745/JIPS.01.0107 인용 PDF

A Facial Expression Recognition Method Using Two-Stream Convolutional Networks in Natural Scenes

Zhao, Lixin
- Journal of Information Processing Systems
- /
- 제17권2호
- /
- pp.399-410
- /
- 2021
Aiming at the problem that complex external variables in natural scenes have a greater impact on facial expression recognition results, a facial expression recognition method based on two-stream convolutional neural network is proposed. The model introduces exponentially enhanced shared input weights before each level of convolution input, and uses soft attention mechanism modules on the space-time features of the combination of static and dynamic streams. This enables the network to autonomously find areas that are more relevant to the expression category and pay more attention to these areas. Through these means, the information of irrelevant interference areas is suppressed. In order to solve the problem of poor local robustness caused by lighting and expression changes, this paper also performs lighting preprocessing with the lighting preprocessing chain algorithm to eliminate most of the lighting effects. Experimental results on AFEW6.0 and Multi-PIE datasets show that the recognition rates of this method are 95.05% and 61.40%, respectively, which are better than other comparison methods.
https://doi.org/10.3745/JIPS.01.0070 인용 PDF KSCI

검색결과 373건 처리시간 0.021초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)