• Title/Summary/Keyword: multimodal information fusion

Search Result 38, Processing Time 0.02 seconds

Incomplete Cholesky Decomposition based Kernel Cross Modal Factor Analysis for Audiovisual Continuous Dimensional Emotion Recognition

  • Li, Xia;Lu, Guanming;Yan, Jingjie;Li, Haibo;Zhang, Zhengyan;Sun, Ning;Xie, Shipeng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.2
    • /
    • pp.810-831
    • /
    • 2019
  • Recently, continuous dimensional emotion recognition from audiovisual clues has attracted increasing attention in both theory and in practice. The large amount of data involved in the recognition processing decreases the efficiency of most bimodal information fusion algorithms. A novel algorithm, namely the incomplete Cholesky decomposition based kernel cross factor analysis (ICDKCFA), is presented and employed for continuous dimensional audiovisual emotion recognition, in this paper. After the ICDKCFA feature transformation, two basic fusion strategies, namely feature-level fusion and decision-level fusion, are explored to combine the transformed visual and audio features for emotion recognition. Finally, extensive experiments are conducted to evaluate the ICDKCFA approach on the AVEC 2016 Multimodal Affect Recognition Sub-Challenge dataset. The experimental results show that the ICDKCFA method has a higher speed than the original kernel cross factor analysis with the comparable performance. Moreover, the ICDKCFA method achieves a better performance than other common information fusion methods, such as the Canonical correlation analysis, kernel canonical correlation analysis and cross-modal factor analysis based fusion methods.

Automatic Human Emotion Recognition from Speech and Face Display - A New Approach (인간의 언어와 얼굴 표정에 통하여 자동적으로 감정 인식 시스템 새로운 접근법)

  • Luong, Dinh Dong;Lee, Young-Koo;Lee, Sung-Young
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2011.06b
    • /
    • pp.231-234
    • /
    • 2011
  • Audiovisual-based human emotion recognition can be considered a good approach for multimodal humancomputer interaction. However, the optimal multimodal information fusion remains challenges. In order to overcome the limitations and bring robustness to the interface, we propose a framework of automatic human emotion recognition system from speech and face display. In this paper, we develop a new approach for fusing information in model-level based on the relationship between speech and face expression to detect automatic temporal segments and perform multimodal information fusion.

Multimodal Medical Image Fusion Based on Sugeno's Intuitionistic Fuzzy Sets

  • Tirupal, Talari;Mohan, Bhuma Chandra;Kumar, Samayamantula Srinivas
    • ETRI Journal
    • /
    • v.39 no.2
    • /
    • pp.173-180
    • /
    • 2017
  • Multimodal medical image fusion is the process of retrieving valuable information from medical images. The primary goal of medical image fusion is to combine several images obtained from various sources into a distinct image suitable for improved diagnosis. Complexity in medical images is higher, and many soft computing methods are applied by researchers to process them. Intuitionistic fuzzy sets are more appropriate for medical images because the images have many uncertainties. In this paper, a new method, based on Sugeno's intuitionistic fuzzy set (SIFS), is proposed. First, medical images are converted into Sugeno's intuitionistic fuzzy image (SIFI). An exponential intuitionistic fuzzy entropy calculates the optimum values of membership, non-membership, and hesitation degree functions. Then, the two SIFIs are disintegrated into image blocks for calculating the count of blackness and whiteness of the blocks. Finally, the fused image is rebuilt from the recombination of SIFI image blocks. The efficiency of the use of SIFS in multimodal medical image fusion is demonstrated on several pairs of images and the results are compared with existing studies in recent literature.

MOSAICFUSION: MERGING MODALITIES WITH PARTIAL DIFFERENTIAL EQUATION AND DISCRETE COSINE TRANSFORMATION

  • GARGI TRIVEDI;RAJESH SANGHAVI
    • Journal of Applied and Pure Mathematics
    • /
    • v.5 no.5_6
    • /
    • pp.389-406
    • /
    • 2023
  • In the pursuit of enhancing image fusion techniques, this research presents a novel approach for fusing multimodal images, specifically infrared (IR) and visible (VIS) images, utilizing a combination of partial differential equations (PDE) and discrete cosine transformation (DCT). The proposed method seeks to leverage the thermal and structural information provided by IR imaging and the fine-grained details offered by VIS imaging create composite images that are superior in quality and informativeness. Through a meticulous fusion process, which involves PDE-guided fusion, DCT component selection, and weighted combination, the methodology aims to strike a balance that optimally preserves essential features and minimizes artifacts. Rigorous evaluations, both objective and subjective, are conducted to validate the effectiveness of the approach. This research contributes to the ongoing advancement of multimodal image fusion, addressing applications in fields like medical imaging, surveillance, and remote sensing, where the marriage of IR and VIS data is of paramount importance.

Multimodal Medical Image Fusion Based on Double-Layer Decomposer and Fine Structure Preservation Model (복층 분해기와 상세구조 보존모델에 기반한 다중모드 의료영상 융합)

  • Zhang, Yingmei;Lee, Hyo Jong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.6
    • /
    • pp.185-192
    • /
    • 2022
  • Multimodal medical image fusion (MMIF) fuses two images containing different structural details generated in two different modes into a comprehensive image with saturated information, which can help doctors improve the accuracy of observation and treatment of patients' diseases. Therefore, a method based on double-layer decomposer and fine structure preservation model is proposed. Firstly, a double-layer decomposer is applied to decompose the source images into the energy layers and structure layers, which can preserve details well. Secondly, The structure layer is processed by combining the structure tensor operator (STO) and max-abs. As for the energy layers, a fine structure preservation model is proposed to guide the fusion, further improving the image quality. Finally, the fused image can be achieved by performing an addition operation between the two sub-fused images formed through the fusion rules. Experiments manifest that our method has excellent performance compared with several typical fusion methods.

A Multimodal Fusion Method Based on a Rotation Invariant Hierarchical Model for Finger-based Recognition

  • Zhong, Zhen;Gao, Wanlin;Wang, Minjuan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.1
    • /
    • pp.131-146
    • /
    • 2021
  • Multimodal biometric-based recognition has been an active topic because of its higher convenience in recent years. Due to high user convenience of finger, finger-based personal identification has been widely used in practice. Hence, taking Finger-Print (FP), Finger-Vein (FV) and Finger-Knuckle-Print (FKP) as the ingredients of characteristic, their feature representation were helpful for improving the universality and reliability in identification. To usefully fuse the multimodal finger-features together, a new robust representation algorithm was proposed based on hierarchical model. Firstly, to obtain more robust features, the feature maps were obtained by Gabor magnitude feature coding and then described by Local Binary Pattern (LBP). Secondly, the LGBP-based feature maps were processed hierarchically in bottom-up mode by variable rectangle and circle granules, respectively. Finally, the intension of each granule was represented by Local-invariant Gray Features (LGFs) and called Hierarchical Local-Gabor-based Gray Invariant Features (HLGGIFs). Experiment results revealed that the proposed algorithm is capable of improving rotation variation of finger-pose, and achieving lower Equal Error Rate (EER) in our homemade database.

Development of Real-Time Verification System by Features Extraction of Multimodal Biometrics Using Hybrid Method (조합기법을 이용한 다중생체신호의 특징추출에 의한 실시간 인증시스템 개발)

  • Cho, Yong-Hyun
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.9 no.4
    • /
    • pp.263-268
    • /
    • 2006
  • This paper presents a real-time verification system by extracting a features of multimodal biometrics using hybrid method, which is combined the moment balance and the independent component analysis(ICA). The moment balance is applied to reduce the computation loads by extracting the validity signal due to exclude the needless backgrounds of multimodal biometrics. ICA is also applied to increase the verification performance by removing the overlapping signals due to extract the statistically independent basis of signals. Multimodal biometrics are used both the faces and the fingerprints which are acquired by Web camera and acquisition device, respectively. The proposed system has been applied to the fusion problems of 48 faces and 48 fingerprints(24 persons * 2 scenes) of 320*240 pixels, respectively. The experimental results show that the proposed system has a superior verification performances(speed, rate).

  • PDF

Development of Gas Type Identification Deep-learning Model through Multimodal Method (멀티모달 방식을 통한 가스 종류 인식 딥러닝 모델 개발)

  • Seo Hee Ahn;Gyeong Yeong Kim;Dong Ju Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.12
    • /
    • pp.525-534
    • /
    • 2023
  • Gas leak detection system is a key to minimize the loss of life due to the explosiveness and toxicity of gas. Most of the leak detection systems detect by gas sensors or thermal imaging cameras. To improve the performance of gas leak detection system using single-modal methods, the paper propose multimodal approach to gas sensor data and thermal camera data in developing a gas type identification model. MultimodalGasData, a multimodal open-dataset, is used to compare the performance of the four models developed through multimodal approach to gas sensors and thermal cameras with existing models. As a result, 1D CNN and GasNet models show the highest performance of 96.3% and 96.4%. The performance of the combined early fusion model of 1D CNN and GasNet reached 99.3%, 3.3% higher than the existing model. We hoped that further damage caused by gas leaks can be minimized through the gas leak detection system proposed in the study.

A Survey of Multimodal Systems and Techniques for Motor Learning

  • Tadayon, Ramin;McDaniel, Troy;Panchanathan, Sethuraman
    • Journal of Information Processing Systems
    • /
    • v.13 no.1
    • /
    • pp.8-25
    • /
    • 2017
  • This survey paper explores the application of multimodal feedback in automated systems for motor learning. In this paper, we review the findings shown in recent studies in this field using rehabilitation and various motor training scenarios as context. We discuss popular feedback delivery and sensing mechanisms for motion capture and processing in terms of requirements, benefits, and limitations. The selection of modalities is presented via our having reviewed the best-practice approaches for each modality relative to motor task complexity with example implementations in recent work. We summarize the advantages and disadvantages of several approaches for integrating modalities in terms of fusion and frequency of feedback during motor tasks. Finally, we review the limitations of perceptual bandwidth and provide an evaluation of the information transfer for each modality.

An Implementation of Multimodal Speaker Verification System using Teeth Image and Voice on Mobile Environment (이동환경에서 치열영상과 음성을 이용한 멀티모달 화자인증 시스템 구현)

  • Kim, Dong-Ju;Ha, Kil-Ram;Hong, Kwang-Seok
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.45 no.5
    • /
    • pp.162-172
    • /
    • 2008
  • In this paper, we propose a multimodal speaker verification method using teeth image and voice as biometric trait for personal verification in mobile terminal equipment. The proposed method obtains the biometric traits using image and sound input devices of smart-phone that is one of mobile terminal equipments, and performs verification with biometric traits. In addition, the proposed method consists the multimodal-fashion of combining two biometric authentication scores for totally performance enhancement, the fusion method is accompanied a weighted-summation method which has comparative simple structure and superior performance for considering limited resources of system. The performance evaluation of proposed multimodal speaker authentication system conducts using a database acquired in smart-phone for 40 subjects. The experimental result shows 8.59% of EER in case of teeth verification 11.73% in case of voice verification and the multimodal speaker authentication result presented the 4.05% of EER. In the experimental result, we obtain the enhanced performance more than each using teeth and voice by using the simple weight-summation method in the multimodal speaker verification system.