• Title/Summary/Keyword: recognition mechanism

Search Result 368, Processing Time 0.035 seconds

Audio and Video Bimodal Emotion Recognition in Social Networks Based on Improved AlexNet Network and Attention Mechanism

  • Liu, Min;Tang, Jun
    • Journal of Information Processing Systems
    • /
    • v.17 no.4
    • /
    • pp.754-771
    • /
    • 2021
  • In the task of continuous dimension emotion recognition, the parts that highlight the emotional expression are not the same in each mode, and the influences of different modes on the emotional state is also different. Therefore, this paper studies the fusion of the two most important modes in emotional recognition (voice and visual expression), and proposes a two-mode dual-modal emotion recognition method combined with the attention mechanism of the improved AlexNet network. After a simple preprocessing of the audio signal and the video signal, respectively, the first step is to use the prior knowledge to realize the extraction of audio characteristics. Then, facial expression features are extracted by the improved AlexNet network. Finally, the multimodal attention mechanism is used to fuse facial expression features and audio features, and the improved loss function is used to optimize the modal missing problem, so as to improve the robustness of the model and the performance of emotion recognition. The experimental results show that the concordance coefficient of the proposed model in the two dimensions of arousal and valence (concordance correlation coefficient) were 0.729 and 0.718, respectively, which are superior to several comparative algorithms.

Tobacco Retail License Recognition Based on Dual Attention Mechanism

  • Shan, Yuxiang;Ren, Qin;Wang, Cheng;Wang, Xiuhui
    • Journal of Information Processing Systems
    • /
    • v.18 no.4
    • /
    • pp.480-488
    • /
    • 2022
  • Images of tobacco retail licenses have complex unstructured characteristics, which is an urgent technical problem in the robot process automation of tobacco marketing. In this paper, a novel recognition approach using a double attention mechanism is presented to realize the automatic recognition and information extraction from such images. First, we utilized a DenseNet network to extract the license information from the input tobacco retail license data. Second, bi-directional long short-term memory was used for coding and decoding using a continuous decoder integrating dual attention to realize the recognition and information extraction of tobacco retail license images without segmentation. Finally, several performance experiments were conducted using a largescale dataset of tobacco retail licenses. The experimental results show that the proposed approach achieves a correction accuracy of 98.36% on the ZY-LQ dataset, outperforming most existing methods.

A Consecutive Motion and Situation Recognition Mechanism to Detect a Vulnerable Condition Based on Android Smartphone

  • Choi, Hoan-Suk;Lee, Gyu Myoung;Rhee, Woo-Seop
    • International Journal of Contents
    • /
    • v.16 no.3
    • /
    • pp.1-17
    • /
    • 2020
  • Human motion recognition is essential for user-centric services such as surveillance-based security, elderly condition monitoring, exercise tracking, daily calories expend analysis, etc. It is typically based on the movement data analysis such as the acceleration and angular velocity of a target user. The existing motion recognition studies are only intended to measure the basic information (e.g., user's stride, number of steps, speed) or to recognize single motion (e.g., sitting, running, walking). Thus, a new mechanism is required to identify the transition of single motions for assessing a user's consecutive motion more accurately as well as recognizing the user's body and surrounding situations arising from the motion. Thus, in this paper, we collect the human movement data through Android smartphones in real time for five targeting single motions and propose a mechanism to recognize a consecutive motion including transitions among various motions and an occurred situation, with the state transition model to check if a vulnerable (life-threatening) condition, especially for the elderly, has occurred or not. Through implementation and experiments, we demonstrate that the proposed mechanism recognizes a consecutive motion and a user's situation accurately and quickly. As a result of the recognition experiment about mix sequence likened to daily motion, the proposed adoptive weighting method showed 4% (Holding time=15 sec), 88% (30 sec), 6.5% (60 sec) improvements compared to static method.

Speech emotion recognition using attention mechanism-based deep neural networks (주목 메커니즘 기반의 심층신경망을 이용한 음성 감정인식)

  • Ko, Sang-Sun;Cho, Hye-Seung;Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.36 no.6
    • /
    • pp.407-412
    • /
    • 2017
  • In this paper, we propose a speech emotion recognition method using a deep neural network based on the attention mechanism. The proposed method consists of a combination of CNN (Convolution Neural Networks), GRU (Gated Recurrent Unit), DNN (Deep Neural Networks) and attention mechanism. The spectrogram of the speech signal contains characteristic patterns according to the emotion. Therefore, we modeled characteristic patterns according to the emotion by applying the tuned Gabor filters as convolutional filter of typical CNN. In addition, we applied the attention mechanism with CNN and FC (Fully-Connected) layer to obtain the attention weight by considering context information of extracted features and used it for emotion recognition. To verify the proposed method, we conducted emotion recognition experiments on six emotions. The experimental results show that the proposed method achieves higher performance in speech emotion recognition than the conventional methods.

Object Recognition using Smart Tag and Stereo Vision System on Pan-Tilt Mechanism

  • Kim, Jin-Young;Im, Chang-Jun;Lee, Sang-Won;Lee, Ho-Gil
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2005.06a
    • /
    • pp.2379-2384
    • /
    • 2005
  • We propose a novel method for object recognition using the smart tag system with a stereo vision on a pan-tilt mechanism. We developed a smart tag which included IRED device. The smart tag is attached onto the object. We also developed a stereo vision system which pans and tilts for the object image to be the centered on each whole image view. A Stereo vision system on the pan-tilt mechanism can map the position of IRED to the robot coordinate system by using pan-tilt angles. And then, to map the size and pose of the object for the robot to coordinate the system, we used a simple model-based vision algorithm. To increase the possibility of tag-based object recognition, we implemented our approach by using as easy and simple techniques as possible.

  • PDF

Adaptive low-resolution palmprint image recognition based on channel attention mechanism and modified deep residual network

  • Xu, Xuebin;Meng, Kan;Xing, Xiaomin;Chen, Chen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.3
    • /
    • pp.757-770
    • /
    • 2022
  • Palmprint recognition has drawn increasingly attentions in the past decade due to its uniqueness and reliability. Traditional palmprint recognition methods usually use high-resolution images as the identification basis so that they can achieve relatively high precision. However, high-resolution images mean more computation cost in the recognition process, which usually cannot be guaranteed in mobile computing. Therefore, this paper proposes an improved low-resolution palmprint image recognition method based on residual networks. The main contributions include: 1) We introduce a channel attention mechanism to refactor the extracted feature maps, which can pay more attention to the informative feature maps and suppress the useless ones. 2) The ResStage group structure proposed by us divides the original residual block into three stages, and we stabilize the signal characteristics before each stage by means of BN normalization operation to enhance the feature channel. Comparison experiments are conducted on a public dataset provided by the Hong Kong Polytechnic University. Experimental results show that the proposed method achieve a rank-1 accuracy of 98.17% when tested on low-resolution images with the size of 12dpi, which outperforms all the compared methods obviously.

ADD-Net: Attention Based 3D Dense Network for Action Recognition

  • Man, Qiaoyue;Cho, Young Im
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.6
    • /
    • pp.21-28
    • /
    • 2019
  • Recent years with the development of artificial intelligence and the success of the deep model, they have been deployed in all fields of computer vision. Action recognition, as an important branch of human perception and computer vision system research, has attracted more and more attention. Action recognition is a challenging task due to the special complexity of human movement, the same movement may exist between multiple individuals. The human action exists as a continuous image frame in the video, so action recognition requires more computational power than processing static images. And the simple use of the CNN network cannot achieve the desired results. Recently, the attention model has achieved good results in computer vision and natural language processing. In particular, for video action classification, after adding the attention model, it is more effective to focus on motion features and improve performance. It intuitively explains which part the model attends to when making a particular decision, which is very helpful in real applications. In this paper, we proposed a 3D dense convolutional network based on attention mechanism(ADD-Net), recognition of human motion behavior in the video.

A Facial Expression Recognition Method Using Two-Stream Convolutional Networks in Natural Scenes

  • Zhao, Lixin
    • Journal of Information Processing Systems
    • /
    • v.17 no.2
    • /
    • pp.399-410
    • /
    • 2021
  • Aiming at the problem that complex external variables in natural scenes have a greater impact on facial expression recognition results, a facial expression recognition method based on two-stream convolutional neural network is proposed. The model introduces exponentially enhanced shared input weights before each level of convolution input, and uses soft attention mechanism modules on the space-time features of the combination of static and dynamic streams. This enables the network to autonomously find areas that are more relevant to the expression category and pay more attention to these areas. Through these means, the information of irrelevant interference areas is suppressed. In order to solve the problem of poor local robustness caused by lighting and expression changes, this paper also performs lighting preprocessing with the lighting preprocessing chain algorithm to eliminate most of the lighting effects. Experimental results on AFEW6.0 and Multi-PIE datasets show that the recognition rates of this method are 95.05% and 61.40%, respectively, which are better than other comparison methods.

Liquid Chromatographic Reaolution of N-Protected α -Amino Acids as Their Anilide and 3,5-Dimethylanilide Derivatives on Chiral Syationary Phases Derived fron (S)-Leucine

  • Hyun, Myung-Ho;Cho, Yoon-Jae;Baik, In-Kyu
    • Bulletin of the Korean Chemical Society
    • /
    • v.23 no.9
    • /
    • pp.1291-1294
    • /
    • 2002
  • Various racemic N-protected ${\alpha}-amino$ acids such as N-t-BOC-(tert-butoxycarbonyl), N-CBZ-(benzyloxycarbonyl) and N-FMOC-(9-fluorenylmethyloxycarbonyl) ${\alpha}-amino$ acids were resolved as their anilide and 3,5-dimethylanilde derivatives on an HPLC chira l stationary phase (CSP) developed by modifying a commercial (S)-leucine CSP. The chromatographic resolution results were compared to those on the commercial (S)-leucine CSP. The resolutions were greater on the modified CSP than those on the commercial CSP with only one exception, the resolution of N-t-BOC-phenylglycine anilide. In addition, the chromatographic resolution behaviors were quite consistent except for the resolution of N-protected phenylglycine derivatives, the (S)-enantiomers being retained longer. Based on the chromatographic resolution behaviors and with the aid of CPK molecular model studies, we proposed a chiral recognition mechanism for the resolution of N-protected ${\alpha}-amino$ acid derivatives. However, for the resolution of N-protected phenylglycine derivatives, a second chiral recognition mechanism, which competes in the opposite sense with the first chiral recognition mechanism, was proposed. The two competing chiral recognition mechanisms were successfully used in the rationalization of the chromatographic behaviors for the resolution of N-protected phenylglycine derivatives.

Information Processing in Primate Retinal Ganglion

  • Je, Sung-Kwan;Cho, Jae-Hyun;Kim, Gwang-Baek
    • Journal of information and communication convergence engineering
    • /
    • v.2 no.2
    • /
    • pp.132-137
    • /
    • 2004
  • Most of the current computer vision theories are based on hypotheses that are difficult to apply to the real world, and they simply imitate a coarse form of the human visual system. As a result, they have not been showing satisfying results. In the human visual system, there is a mechanism that processes information due to memory degradation with time and limited storage space. Starting from research on the human visual system, this study analyzes a mechanism that processes input information when information is transferred from the retina to ganglion cells. In this study, a model for the characteristics of ganglion cells in the retina is proposed after considering the structure of the retina and the efficiency of storage space. The MNIST database of handwritten letters is used as data for this research, and ART2 and SOM as recognizers. The results of this study show that the proposed recognition model is not much different from the general recognition model in terms of recognition rate, but the efficiency of storage space can be improved by constructing a mechanism that processes input information.