Search | Korea Science

Audio and Video Bimodal Emotion Recognition in Social Networks Based on Improved AlexNet Network and Attention Mechanism

Liu, Min;Tang, Jun
- Journal of Information Processing Systems
- /
- v.17 no.4
- /
- pp.754-771
- /
- 2021
In the task of continuous dimension emotion recognition, the parts that highlight the emotional expression are not the same in each mode, and the influences of different modes on the emotional state is also different. Therefore, this paper studies the fusion of the two most important modes in emotional recognition (voice and visual expression), and proposes a two-mode dual-modal emotion recognition method combined with the attention mechanism of the improved AlexNet network. After a simple preprocessing of the audio signal and the video signal, respectively, the first step is to use the prior knowledge to realize the extraction of audio characteristics. Then, facial expression features are extracted by the improved AlexNet network. Finally, the multimodal attention mechanism is used to fuse facial expression features and audio features, and the improved loss function is used to optimize the modal missing problem, so as to improve the robustness of the model and the performance of emotion recognition. The experimental results show that the concordance coefficient of the proposed model in the two dimensions of arousal and valence (concordance correlation coefficient) were 0.729 and 0.718, respectively, which are superior to several comparative algorithms.
https://doi.org/10.3745/JIPS.02.0161 인용 PDF KSCI

Tobacco Retail License Recognition Based on Dual Attention Mechanism

Shan, Yuxiang;Ren, Qin;Wang, Cheng;Wang, Xiuhui
- Journal of Information Processing Systems
- /
- v.18 no.4
- /
- pp.480-488
- /
- 2022
Images of tobacco retail licenses have complex unstructured characteristics, which is an urgent technical problem in the robot process automation of tobacco marketing. In this paper, a novel recognition approach using a double attention mechanism is presented to realize the automatic recognition and information extraction from such images. First, we utilized a DenseNet network to extract the license information from the input tobacco retail license data. Second, bi-directional long short-term memory was used for coding and decoding using a continuous decoder integrating dual attention to realize the recognition and information extraction of tobacco retail license images without segmentation. Finally, several performance experiments were conducted using a largescale dataset of tobacco retail licenses. The experimental results show that the proposed approach achieves a correction accuracy of 98.36% on the ZY-LQ dataset, outperforming most existing methods.
https://doi.org/10.3745/JIPS.02.0177 인용 PDF KSCI

A Consecutive Motion and Situation Recognition Mechanism to Detect a Vulnerable Condition Based on Android Smartphone

Choi, Hoan-Suk;Lee, Gyu Myoung;Rhee, Woo-Seop
- International Journal of Contents
- /
- v.16 no.3
- /
- pp.1-17
- /
- 2020
Human motion recognition is essential for user-centric services such as surveillance-based security, elderly condition monitoring, exercise tracking, daily calories expend analysis, etc. It is typically based on the movement data analysis such as the acceleration and angular velocity of a target user. The existing motion recognition studies are only intended to measure the basic information (e.g., user's stride, number of steps, speed) or to recognize single motion (e.g., sitting, running, walking). Thus, a new mechanism is required to identify the transition of single motions for assessing a user's consecutive motion more accurately as well as recognizing the user's body and surrounding situations arising from the motion. Thus, in this paper, we collect the human movement data through Android smartphones in real time for five targeting single motions and propose a mechanism to recognize a consecutive motion including transitions among various motions and an occurred situation, with the state transition model to check if a vulnerable (life-threatening) condition, especially for the elderly, has occurred or not. Through implementation and experiments, we demonstrate that the proposed mechanism recognizes a consecutive motion and a user's situation accurately and quickly. As a result of the recognition experiment about mix sequence likened to daily motion, the proposed adoptive weighting method showed 4% (Holding time=15 sec), 88% (30 sec), 6.5% (60 sec) improvements compared to static method.
https://doi.org/10.5392/IJoC.2020.16.3.001 인용 PDF KSCI HTML

Speech emotion recognition using attention mechanism-based deep neural networks (주목 메커니즘 기반의 심층신경망을 이용한 음성 감정인식)

Ko, Sang-Sun;Cho, Hye-Seung;Kim, Hyoung-Gook
- The Journal of the Acoustical Society of Korea
- /
- v.36 no.6
- /
- pp.407-412
- /
- 2017
In this paper, we propose a speech emotion recognition method using a deep neural network based on the attention mechanism. The proposed method consists of a combination of CNN (Convolution Neural Networks), GRU (Gated Recurrent Unit), DNN (Deep Neural Networks) and attention mechanism. The spectrogram of the speech signal contains characteristic patterns according to the emotion. Therefore, we modeled characteristic patterns according to the emotion by applying the tuned Gabor filters as convolutional filter of typical CNN. In addition, we applied the attention mechanism with CNN and FC (Fully-Connected) layer to obtain the attention weight by considering context information of extracted features and used it for emotion recognition. To verify the proposed method, we conducted emotion recognition experiments on six emotions. The experimental results show that the proposed method achieves higher performance in speech emotion recognition than the conventional methods.
https://doi.org/10.7776/ASK.2017.36.6.407 인용 PDF KSCI

Object Recognition using Smart Tag and Stereo Vision System on Pan-Tilt Mechanism

Kim, Jin-Young;Im, Chang-Jun;Lee, Sang-Won;Lee, Ho-Gil
- 제어로봇시스템학회:학술대회논문집
- /
- 2005.06a
- /
- pp.2379-2384
- /
- 2005
We propose a novel method for object recognition using the smart tag system with a stereo vision on a pan-tilt mechanism. We developed a smart tag which included IRED device. The smart tag is attached onto the object. We also developed a stereo vision system which pans and tilts for the object image to be the centered on each whole image view. A Stereo vision system on the pan-tilt mechanism can map the position of IRED to the robot coordinate system by using pan-tilt angles. And then, to map the size and pose of the object for the robot to coordinate the system, we used a simple model-based vision algorithm. To increase the possibility of tag-based object recognition, we implemented our approach by using as easy and simple techniques as possible.
PDF

Adaptive low-resolution palmprint image recognition based on channel attention mechanism and modified deep residual network

Xu, Xuebin;Meng, Kan;Xing, Xiaomin;Chen, Chen
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.16 no.3
- /
- pp.757-770
- /
- 2022
Palmprint recognition has drawn increasingly attentions in the past decade due to its uniqueness and reliability. Traditional palmprint recognition methods usually use high-resolution images as the identification basis so that they can achieve relatively high precision. However, high-resolution images mean more computation cost in the recognition process, which usually cannot be guaranteed in mobile computing. Therefore, this paper proposes an improved low-resolution palmprint image recognition method based on residual networks. The main contributions include: 1) We introduce a channel attention mechanism to refactor the extracted feature maps, which can pay more attention to the informative feature maps and suppress the useless ones. 2) The ResStage group structure proposed by us divides the original residual block into three stages, and we stabilize the signal characteristics before each stage by means of BN normalization operation to enhance the feature channel. Comparison experiments are conducted on a public dataset provided by the Hong Kong Polytechnic University. Experimental results show that the proposed method achieve a rank-1 accuracy of 98.17% when tested on low-resolution images with the size of 12dpi, which outperforms all the compared methods obviously.
https://doi.org/10.3837/tiis.2022.03.001 인용 PDF KSCI HTML

A Study on Improving License Plate Recognition Performance Using Super-Resolution Techniques

Kyeongseok JANG;Kwangchul SON
- Korean Journal of Artificial Intelligence
- /
- v.12 no.3
- /
- pp.1-7
- /
- 2024
In this paper, we propose an innovative super-resolution technique to address the issue of reduced accuracy in license plate recognition caused by low-resolution images. Conventional vehicle license plate recognition systems have relied on images obtained from fixed surveillance cameras for traffic detection to perform vehicle detection, tracking, and license plate recognition. However, during this process, image quality degradation occurred due to the physical distance between the camera and the vehicle, vehicle movement, and external environmental factors such as weather and lighting conditions. In particular, the acquisition of low-resolution images due to camera performance limitations has been a major cause of significantly reduced accuracy in license plate recognition. To solve this problem, we propose a Single Image Super-Resolution (SISR) model with a parallel structure that combines Multi-Scale and Attention Mechanism. This model is capable of effectively extracting features at various scales and focusing on important areas. Specifically, it generates feature maps of various sizes through a multi-branch structure and emphasizes the key features of license plates using an Attention Mechanism. Experimental results show that the proposed model demonstrates significantly improved recognition accuracy compared to existing vehicle license plate super-resolution methods using Bicubic Interpolation.
https://doi.org/10.24225/kjai.2024.12.3.1 인용 PDF

ADD-Net: Attention Based 3D Dense Network for Action Recognition

Man, Qiaoyue;Cho, Young Im
- Journal of the Korea Society of Computer and Information
- /
- v.24 no.6
- /
- pp.21-28
- /
- 2019
Recent years with the development of artificial intelligence and the success of the deep model, they have been deployed in all fields of computer vision. Action recognition, as an important branch of human perception and computer vision system research, has attracted more and more attention. Action recognition is a challenging task due to the special complexity of human movement, the same movement may exist between multiple individuals. The human action exists as a continuous image frame in the video, so action recognition requires more computational power than processing static images. And the simple use of the CNN network cannot achieve the desired results. Recently, the attention model has achieved good results in computer vision and natural language processing. In particular, for video action classification, after adding the attention model, it is more effective to focus on motion features and improve performance. It intuitively explains which part the model attends to when making a particular decision, which is very helpful in real applications. In this paper, we proposed a 3D dense convolutional network based on attention mechanism(ADD-Net), recognition of human motion behavior in the video.
https://doi.org/10.9708/jksci.2019.24.06.021 인용 PDF KSCI HTML

A Framework for Facial Expression Recognition Combining Contextual Information and Attention Mechanism

Jianzeng Chen;Ningning Chen
- Journal of Information Processing Systems
- /
- v.20 no.4
- /
- pp.535-549
- /
- 2024
Facial expressions (FEs) serve as fundamental components for human emotion assessment and human-computer interaction. Traditional convolutional neural networks tend to overlook valuable information during the FE feature extraction, resulting in suboptimal recognition rates. To address this problem, we propose a deep learning framework that incorporates hierarchical feature fusion, contextual data, and an attention mechanism for precise FE recognition. In our approach, we leveraged an enhanced VGGNet16 as the backbone network and introduced an improved group convolutional channel attention (GCCA) module in each block to emphasize the crucial expression features. A partial decoder was added at the end of the backbone network to facilitate the fusion of multilevel features for a comprehensive feature map. A reverse attention mechanism guides the model to refine details layer-by-layer while introducing contextual information and extracting richer expression features. To enhance feature distinguishability, we employed islanding loss in combination with softmax loss, creating a joint loss function. Using two open datasets, our experimental results demonstrated the effectiveness of our framework. Our framework achieved an average accuracy rate of 74.08% on the FER2013 dataset and 98.66% on the CK+ dataset, outperforming advanced methods in both recognition accuracy and stability.
https://doi.org/10.3745/JIPS.01.0107 인용 PDF

A Facial Expression Recognition Method Using Two-Stream Convolutional Networks in Natural Scenes

Zhao, Lixin
- Journal of Information Processing Systems
- /
- v.17 no.2
- /
- pp.399-410
- /
- 2021
Aiming at the problem that complex external variables in natural scenes have a greater impact on facial expression recognition results, a facial expression recognition method based on two-stream convolutional neural network is proposed. The model introduces exponentially enhanced shared input weights before each level of convolution input, and uses soft attention mechanism modules on the space-time features of the combination of static and dynamic streams. This enables the network to autonomously find areas that are more relevant to the expression category and pay more attention to these areas. Through these means, the information of irrelevant interference areas is suppressed. In order to solve the problem of poor local robustness caused by lighting and expression changes, this paper also performs lighting preprocessing with the lighting preprocessing chain algorithm to eliminate most of the lighting effects. Experimental results on AFEW6.0 and Multi-PIE datasets show that the recognition rates of this method are 95.05% and 61.40%, respectively, which are better than other comparison methods.
https://doi.org/10.3745/JIPS.01.0070 인용 PDF KSCI

Search Result 373, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)