• Title/Summary/Keyword: Multimodal Context Fusion

Search Result 7, Processing Time 0.021 seconds

Multimodal Attention-Based Fusion Model for Context-Aware Emotion Recognition

  • Vo, Minh-Cong;Lee, Guee-Sang
    • International Journal of Contents
    • /
    • v.18 no.3
    • /
    • pp.11-20
    • /
    • 2022
  • Human Emotion Recognition is an exciting topic that has been attracting many researchers for a lengthy time. In recent years, there has been an increasing interest in exploiting contextual information on emotion recognition. Some previous explorations in psychology show that emotional perception is impacted by facial expressions, as well as contextual information from the scene, such as human activities, interactions, and body poses. Those explorations initialize a trend in computer vision in exploring the critical role of contexts, by considering them as modalities to infer predicted emotion along with facial expressions. However, the contextual information has not been fully exploited. The scene emotion created by the surrounding environment, can shape how people perceive emotion. Besides, additive fusion in multimodal training fashion is not practical, because the contributions of each modality are not equal to the final prediction. The purpose of this paper was to contribute to this growing area of research, by exploring the effectiveness of the emotional scene gist in the input image, to infer the emotional state of the primary target. The emotional scene gist includes emotion, emotional feelings, and actions or events that directly trigger emotional reactions in the input image. We also present an attention-based fusion network, to combine multimodal features based on their impacts on the target emotional state. We demonstrate the effectiveness of the method, through a significant improvement on the EMOTIC dataset.

Multi-Object Goal Visual Navigation Based on Multimodal Context Fusion (멀티모달 맥락정보 융합에 기초한 다중 물체 목표 시각적 탐색 이동)

  • Jeong Hyun Choi;In Cheol Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.9
    • /
    • pp.407-418
    • /
    • 2023
  • The Multi-Object Goal Visual Navigation(MultiOn) is a visual navigation task in which an agent must visit to multiple object goals in an unknown indoor environment in a given order. Existing models for the MultiOn task suffer from the limitation that they cannot utilize an integrated view of multimodal context because use only a unimodal context map. To overcome this limitation, in this paper, we propose a novel deep neural network-based agent model for MultiOn task. The proposed model, MCFMO, uses a multimodal context map, containing visual appearance features, semantic features of environmental objects, and goal object features. Moreover, the proposed model effectively fuses these three heterogeneous features into a global multimodal context map by using a point-wise convolutional neural network module. Lastly, the proposed model adopts an auxiliary task learning module to predict the observation status, goal direction and the goal distance, which can guide to learn the navigational policy efficiently. Conducting various quantitative and qualitative experiments using the Habitat-Matterport3D simulation environment and scene dataset, we demonstrate the superiority of the proposed model.

Dialog-based multi-item recommendation using automatic evaluation

  • Euisok Chung;Hyun Woo Kim;Byunghyun Yoo;Ran Han;Jeongmin Yang;Hwa Jeon Song
    • ETRI Journal
    • /
    • v.46 no.2
    • /
    • pp.277-289
    • /
    • 2024
  • In this paper, we describe a neural network-based application that recommends multiple items using dialog context input and simultaneously outputs a response sentence. Further, we describe a multi-item recommendation by specifying it as a set of clothing recommendations. For this, a multimodal fusion approach that can process both cloth-related text and images is required. We also examine achieving the requirements of downstream models using a pretrained language model. Moreover, we propose a gate-based multimodal fusion and multiprompt learning based on a pretrained language model. Specifically, we propose an automatic evaluation technique to solve the one-to-many mapping problem of multi-item recommendations. A fashion-domain multimodal dataset based on Koreans is constructed and tested. Various experimental environment settings are verified using an automatic evaluation method. The results show that our proposed method can be used to obtain confidence scores for multi-item recommendation results, which is different from traditional accuracy evaluation.

A Survey of Multimodal Systems and Techniques for Motor Learning

  • Tadayon, Ramin;McDaniel, Troy;Panchanathan, Sethuraman
    • Journal of Information Processing Systems
    • /
    • v.13 no.1
    • /
    • pp.8-25
    • /
    • 2017
  • This survey paper explores the application of multimodal feedback in automated systems for motor learning. In this paper, we review the findings shown in recent studies in this field using rehabilitation and various motor training scenarios as context. We discuss popular feedback delivery and sensing mechanisms for motion capture and processing in terms of requirements, benefits, and limitations. The selection of modalities is presented via our having reviewed the best-practice approaches for each modality relative to motor task complexity with example implementations in recent work. We summarize the advantages and disadvantages of several approaches for integrating modalities in terms of fusion and frequency of feedback during motor tasks. Finally, we review the limitations of perceptual bandwidth and provide an evaluation of the information transfer for each modality.

The Effect of AI Agent's Multi Modal Interaction on the Driver Experience in the Semi-autonomous Driving Context : With a Focus on the Existence of Visual Character (반자율주행 맥락에서 AI 에이전트의 멀티모달 인터랙션이 운전자 경험에 미치는 효과 : 시각적 캐릭터 유무를 중심으로)

  • Suh, Min-soo;Hong, Seung-Hye;Lee, Jeong-Myeong
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.8
    • /
    • pp.92-101
    • /
    • 2018
  • As the interactive AI speaker becomes popular, voice recognition is regarded as an important vehicle-driver interaction method in case of autonomous driving situation. The purpose of this study is to confirm whether multimodal interaction in which feedback is transmitted by auditory and visual mode of AI characters on screen is more effective in user experience optimization than auditory mode only. We performed the interaction tasks for the music selection and adjustment through the AI speaker while driving to the experiment participant and measured the information and system quality, presence, the perceived usefulness and ease of use, and the continuance intention. As a result of analysis, the multimodal effect of visual characters was not shown in most user experience factors, and the effect was not shown in the intention of continuous use. Rather, it was found that auditory single mode was more effective than multimodal in information quality factor. In the semi-autonomous driving stage, which requires driver 's cognitive effort, multimodal interaction is not effective in optimizing user experience as compared to single mode interaction.

Fuzzy Bayesian Network for Fusion of Multimodal Context Information (다양한 형태의 상황 정보 합성을 위한 퍼지 베이지안 네트워크)

  • Yoo Ji-Oh;Cho Sung-Bae
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07b
    • /
    • pp.631-633
    • /
    • 2005
  • 다양한 형태의 상황 정보를 결합하여 추론하기 위해 베이지안 네트워크를 많이 사용한다. 그러나 일반 베이지안 네트워크는 각 노드의 상태가 이산적이기 때문에, 연속적이거나 여러 상태가 동시에 존재할 수 있는 현실의 상황 정보를 처리하기 어렵다. 본 논문에서는 이와 같은 베이지안 네트워크의 단점을 보완하기 위해 다양한 형태의 상황 정보를 퍼지를 통해 전처리하여 베이지안 네트워크를 통해 추론하는 퍼지 베이지안 네트워크를 제안한다. 유용성을 보이기 위해 음악 추천 에이전트를 설계하여 일반 베이지안 네트워크와 비교 실험한 결과, 제안한 방법으로 다양한 상황 정보에 대해 유연한 처리가 가능함을 확인하였다.

  • PDF

Activity Recognition of Workers and Passengers onboard Ships Using Multimodal Sensors in a Smartphone (선박 탑승자를 위한 다중 센서 기반의 스마트폰을 이용한 활동 인식 시스템)

  • Piyare, Rajeev Kumar;Lee, Seong Ro
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.39C no.9
    • /
    • pp.811-819
    • /
    • 2014
  • Activity recognition is a key component in identifying the context of a user for providing services based on the application such as medical, entertainment and tactical scenarios. Instead of applying numerous sensor devices, as observed in many previous investigations, we are proposing the use of smartphone with its built-in multimodal sensors as an unobtrusive sensor device for recognition of six physical daily activities. As an improvement to previous works, accelerometer, gyroscope and magnetometer data are fused to recognize activities more reliably. The evaluation indicates that the IBK classifier using window size of 2s with 50% overlapping yields the highest accuracy (i.e., up to 99.33%). To achieve this peak accuracy, simple time-domain and frequency-domain features were extracted from raw sensor data of the smartphone.