• 제목/요약/키워드: multimodal fusion

검색결과 53건 처리시간 0.02초

Multimodal Medical Image Fusion Based on Sugeno's Intuitionistic Fuzzy Sets

  • Tirupal, Talari;Mohan, Bhuma Chandra;Kumar, Samayamantula Srinivas
    • ETRI Journal
    • /
    • 제39권2호
    • /
    • pp.173-180
    • /
    • 2017
  • Multimodal medical image fusion is the process of retrieving valuable information from medical images. The primary goal of medical image fusion is to combine several images obtained from various sources into a distinct image suitable for improved diagnosis. Complexity in medical images is higher, and many soft computing methods are applied by researchers to process them. Intuitionistic fuzzy sets are more appropriate for medical images because the images have many uncertainties. In this paper, a new method, based on Sugeno's intuitionistic fuzzy set (SIFS), is proposed. First, medical images are converted into Sugeno's intuitionistic fuzzy image (SIFI). An exponential intuitionistic fuzzy entropy calculates the optimum values of membership, non-membership, and hesitation degree functions. Then, the two SIFIs are disintegrated into image blocks for calculating the count of blackness and whiteness of the blocks. Finally, the fused image is rebuilt from the recombination of SIFI image blocks. The efficiency of the use of SIFS in multimodal medical image fusion is demonstrated on several pairs of images and the results are compared with existing studies in recent literature.

음성기반 멀티모달 사용자 인터페이스의 사용성 평가 방법론 (Usability Test Guidelines for Speech-Oriented Multimodal User Interface)

  • 홍기형
    • 대한음성학회지:말소리
    • /
    • 제67호
    • /
    • pp.103-120
    • /
    • 2008
  • Basic components for multimodal interface, such as speech recognition, speech synthesis, gesture recognition, and multimodal fusion, have their own technological limitations. For example, the accuracy of speech recognition decreases for large vocabulary and in noisy environments. In spite of those technological limitations, there are lots of applications in which speech-oriented multimodal user interfaces are very helpful to users. However, in order to expand application areas for speech-oriented multimodal interfaces, we have to develop the interfaces focused on usability. In this paper, we introduce usability and user-centered design methodology in general. There has been much work for evaluating spoken dialogue systems. We give a summary for PARADISE (PARAdigm for Dialogue System Evaluation) and PROMISE (PROcedure for Multimodal Interactive System Evaluation) that are the generalized evaluation frameworks for voice and multimodal user interfaces. Then, we present usability components for speech-oriented multimodal user interfaces and usability testing guidelines that can be used in a user-centered multimodal interface design process.

  • PDF

Multimodal Parametric Fusion for Emotion Recognition

  • Kim, Jonghwa
    • International journal of advanced smart convergence
    • /
    • 제9권1호
    • /
    • pp.193-201
    • /
    • 2020
  • The main objective of this study is to investigate the impact of additional modalities on the performance of emotion recognition using speech, facial expression and physiological measurements. In order to compare different approaches, we designed a feature-based recognition system as a benchmark which carries out linear supervised classification followed by the leave-one-out cross-validation. For the classification of four emotions, it turned out that bimodal fusion in our experiment improves recognition accuracy of unimodal approach, while the performance of trimodal fusion varies strongly depending on the individual. Furthermore, we experienced extremely high disparity between single class recognition rates, while we could not observe a best performing single modality in our experiment. Based on these observations, we developed a novel fusion method, called parametric decision fusion (PDF), which lies in building emotion-specific classifiers and exploits advantage of a parametrized decision process. By using the PDF scheme we achieved 16% improvement in accuracy of subject-dependent recognition and 10% for subject-independent recognition compared to the best unimodal results.

Dialog-based multi-item recommendation using automatic evaluation

  • Euisok Chung;Hyun Woo Kim;Byunghyun Yoo;Ran Han;Jeongmin Yang;Hwa Jeon Song
    • ETRI Journal
    • /
    • 제46권2호
    • /
    • pp.277-289
    • /
    • 2024
  • In this paper, we describe a neural network-based application that recommends multiple items using dialog context input and simultaneously outputs a response sentence. Further, we describe a multi-item recommendation by specifying it as a set of clothing recommendations. For this, a multimodal fusion approach that can process both cloth-related text and images is required. We also examine achieving the requirements of downstream models using a pretrained language model. Moreover, we propose a gate-based multimodal fusion and multiprompt learning based on a pretrained language model. Specifically, we propose an automatic evaluation technique to solve the one-to-many mapping problem of multi-item recommendations. A fashion-domain multimodal dataset based on Koreans is constructed and tested. Various experimental environment settings are verified using an automatic evaluation method. The results show that our proposed method can be used to obtain confidence scores for multi-item recommendation results, which is different from traditional accuracy evaluation.

MOSAICFUSION: MERGING MODALITIES WITH PARTIAL DIFFERENTIAL EQUATION AND DISCRETE COSINE TRANSFORMATION

  • GARGI TRIVEDI;RAJESH SANGHAVI
    • Journal of Applied and Pure Mathematics
    • /
    • 제5권5_6호
    • /
    • pp.389-406
    • /
    • 2023
  • In the pursuit of enhancing image fusion techniques, this research presents a novel approach for fusing multimodal images, specifically infrared (IR) and visible (VIS) images, utilizing a combination of partial differential equations (PDE) and discrete cosine transformation (DCT). The proposed method seeks to leverage the thermal and structural information provided by IR imaging and the fine-grained details offered by VIS imaging create composite images that are superior in quality and informativeness. Through a meticulous fusion process, which involves PDE-guided fusion, DCT component selection, and weighted combination, the methodology aims to strike a balance that optimally preserves essential features and minimizes artifacts. Rigorous evaluations, both objective and subjective, are conducted to validate the effectiveness of the approach. This research contributes to the ongoing advancement of multimodal image fusion, addressing applications in fields like medical imaging, surveillance, and remote sensing, where the marriage of IR and VIS data is of paramount importance.

Predicting Session Conversion on E-commerce: A Deep Learning-based Multimodal Fusion Approach

  • Minsu Kim;Woosik Shin;SeongBeom Kim;Hee-Woong Kim
    • Asia pacific journal of information systems
    • /
    • 제33권3호
    • /
    • pp.737-767
    • /
    • 2023
  • With the availability of big customer data and advances in machine learning techniques, the prediction of customer behavior at the session-level has attracted considerable attention from marketing practitioners and scholars. This study aims to predict customer purchase conversion at the session-level by employing customer profile, transaction, and clickstream data. For this purpose, we develop a multimodal deep learning fusion model with dynamic and static features (i.e., DS-fusion). Specifically, we base page views within focal visist and recency, frequency, monetary value, and clumpiness (RFMC) for dynamic and static features, respectively, to comprehensively capture customer characteristics for buying behaviors. Our model with deep learning architectures combines these features for conversion prediction. We validate the proposed model using real-world e-commerce data. The experimental results reveal that our model outperforms unimodal classifiers with each feature and the classical machine learning models with dynamic and static features, including random forest and logistic regression. In this regard, this study sheds light on the promise of the machine learning approach with the complementary method for different modalities in predicting customer behaviors.

인간의 언어와 얼굴 표정에 통하여 자동적으로 감정 인식 시스템 새로운 접근법 (Automatic Human Emotion Recognition from Speech and Face Display - A New Approach)

  • 딩�E령;이영구;이승룡
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 2011년도 한국컴퓨터종합학술대회논문집 Vol.38 No.1(B)
    • /
    • pp.231-234
    • /
    • 2011
  • Audiovisual-based human emotion recognition can be considered a good approach for multimodal humancomputer interaction. However, the optimal multimodal information fusion remains challenges. In order to overcome the limitations and bring robustness to the interface, we propose a framework of automatic human emotion recognition system from speech and face display. In this paper, we develop a new approach for fusing information in model-level based on the relationship between speech and face expression to detect automatic temporal segments and perform multimodal information fusion.

Incomplete Cholesky Decomposition based Kernel Cross Modal Factor Analysis for Audiovisual Continuous Dimensional Emotion Recognition

  • Li, Xia;Lu, Guanming;Yan, Jingjie;Li, Haibo;Zhang, Zhengyan;Sun, Ning;Xie, Shipeng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권2호
    • /
    • pp.810-831
    • /
    • 2019
  • Recently, continuous dimensional emotion recognition from audiovisual clues has attracted increasing attention in both theory and in practice. The large amount of data involved in the recognition processing decreases the efficiency of most bimodal information fusion algorithms. A novel algorithm, namely the incomplete Cholesky decomposition based kernel cross factor analysis (ICDKCFA), is presented and employed for continuous dimensional audiovisual emotion recognition, in this paper. After the ICDKCFA feature transformation, two basic fusion strategies, namely feature-level fusion and decision-level fusion, are explored to combine the transformed visual and audio features for emotion recognition. Finally, extensive experiments are conducted to evaluate the ICDKCFA approach on the AVEC 2016 Multimodal Affect Recognition Sub-Challenge dataset. The experimental results show that the ICDKCFA method has a higher speed than the original kernel cross factor analysis with the comparable performance. Moreover, the ICDKCFA method achieves a better performance than other common information fusion methods, such as the Canonical correlation analysis, kernel canonical correlation analysis and cross-modal factor analysis based fusion methods.

멀티모달 방식을 통한 가스 종류 인식 딥러닝 모델 개발 (Development of Gas Type Identification Deep-learning Model through Multimodal Method)

  • 안서희;김경영;김동주
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제12권12호
    • /
    • pp.525-534
    • /
    • 2023
  • 가스 누출 감지 시스템은 가스의 폭발성과 독성으로 인한 인명 피해를 최소화할 핵심적인 장치이다. 누출 감지 시스템은 대부분 단일 센서를 활용한 방식으로, 가스 센서나 열화상 카메라를 통한 검출 방식으로 진행되고 있다. 이러한 단일 센서 활용의 가스 누출감지 시스템 성능을 고도화하기 위하여, 본 연구에서는 가스 센서와 열화상 이미지 데이터에 멀티모달형 딥러닝을 적용한 연구를 소개한다. 멀티모달 공인 데이터셋인 MultimodalGasData를 통해 기존 논문과의 성능을 비교하였고, 가스 센서와 열화상 카메라의 단일모달 모델을 기반하여 네 가지 멀티모달 모델을 설계 및 학습하였다. 이를 통해 가스 센서와 열화상 카메라는 각각 1D CNN, GasNet 모델이 96.3%와 96.4%의 가장 높은 성능을 보였다. 앞선 두 단일모달 모델을 기반한 Early Fusion 형식의 멀티모달 모델 성능은 99.3%로 가장 높았으며, 또한 기존 논문의 멀티모달 모델 대비 3.3% 높았다. 본 연구의 높은 신뢰성을 갖춘 가스 누출 감지 시스템을 통해 가스 누출로 인한 추가적인 피해가 최소화되길 기대한다.

복층 분해기와 상세구조 보존모델에 기반한 다중모드 의료영상 융합 (Multimodal Medical Image Fusion Based on Double-Layer Decomposer and Fine Structure Preservation Model)

  • 장영매;이효종
    • 정보처리학회논문지:컴퓨터 및 통신 시스템
    • /
    • 제11권6호
    • /
    • pp.185-192
    • /
    • 2022
  • 다중모드 의료영상 융합(MMIF)은 각기 다른 특징들을 나타내는 여러 종류의 모드의 이미지를 풍부한 정보가 포함된 하나의 결과 이미지로 통합하는 것이다. 이러한 의료영상 융합은 의사가 환자의 병변을 정확하게 관찰하고 치료하는 것을 도와줄 수 있다. 이러한 목적에 영향을 받아 본 논문에서는 복층 분해기 및 미세구조 보존 모델에 기반한 새로운 방법을 제안한다. 첫째, 복층 분해기를 사용하여 소스 이미지를 미세정보 보존의 특성을 갖는 에너지 층과 구조적 층으로 분해하였다. 둘째, 구조 텐서 연산자와 max-abs를 결합하여 구조적 층을 융합한다. 에너지 층의 융합을 위해 미세구조 보존 모델을 제안하였으며 이미지 융합성능을 크게 향상시킬 수 있었다. 마지막으로, 융합규칙을 통해 형성된 두 개의 융합된 하위 이미지를 합산하여 구축하였다. 실험을 통하여 제안된 방법이 현재까지 최첨단 융합 방법들과 비교하여 우수한 성능을 나타내는 것을 검증하였다.