• Title/Summary/Keyword: Multiple Modalities

Search Result 122, Processing Time 0.041 seconds

A Study on Multiple Modalities for Face Anti-Spoofing (얼굴 스푸핑 방지를 위한 다중 양식에 관한 연구)

  • Wu, Chenmou;Lee, Hyo Jong
    • Annual Conference of KIPS
    • /
    • 2021.11a
    • /
    • pp.651-654
    • /
    • 2021
  • Face anti-spoofing (FAS) techniques play a significant role in the defense of facial recognition systems against spoofing attacks. Existing FAS methods achieve the great performance depending on annotated additional modalities. However, labeling these high-cost modalities need a lot of manpower, device resources and time. In this work, we proposed to use self-transforming modalities instead the annotated modalities. Three different modalities based on frequency domain and temporal domain are applied and analyzed. Intuitive visualization analysis shows the advantages of each modality. Comprehensive experiments in both the CNN-based and transformer-based architecture with various modalities combination demonstrate that self-transforming modalities improve the vanilla network a lot. The codes are available at https://github.com/chenmou0410/FAS-Challenge2021.

An evaluation of the effects of VDT tasks on multiple resources processing in working menory using MD, PD method (MD, PD법을 이용한 VDT 직무의 단기기억 다중자원처리에의 영향평가)

  • 윤철호;노병옥
    • Journal of the Ergonomics Society of Korea
    • /
    • v.16 no.1
    • /
    • pp.85-96
    • /
    • 1997
  • This article reviews the effects of VDT tasks on multiple resources for processing and storage in short-term working memory. MD and PD method were introduced toevaluate the modalities (auditory-visual) in the multiple resources model. The subjects conducted 2 sessions of 50 minites VDT tasks. Before, between and after VDT tasks, MD, PD task performance scores and CFF(critical flicker frequency0 values were measured. The review suggested that the modalities of human information processing in working memory were affected by VDT tasks with different task contents.

  • PDF

Multimodal Biometrics Recognition from Facial Video with Missing Modalities Using Deep Learning

  • Maity, Sayan;Abdel-Mottaleb, Mohamed;Asfour, Shihab S.
    • Journal of Information Processing Systems
    • /
    • v.16 no.1
    • /
    • pp.6-29
    • /
    • 2020
  • Biometrics identification using multiple modalities has attracted the attention of many researchers as it produces more robust and trustworthy results than single modality biometrics. In this paper, we present a novel multimodal recognition system that trains a deep learning network to automatically learn features after extracting multiple biometric modalities from a single data source, i.e., facial video clips. Utilizing different modalities, i.e., left ear, left profile face, frontal face, right profile face, and right ear, present in the facial video clips, we train supervised denoising auto-encoders to automatically extract robust and non-redundant features. The automatically learned features are then used to train modality specific sparse classifiers to perform the multimodal recognition. Moreover, the proposed technique has proven robust when some of the above modalities were missing during the testing. The proposed system has three main components that are responsible for detection, which consists of modality specific detectors to automatically detect images of different modalities present in facial video clips; feature selection, which uses supervised denoising sparse auto-encoders network to capture discriminative representations that are robust to the illumination and pose variations; and classification, which consists of a set of modality specific sparse representation classifiers for unimodal recognition, followed by score level fusion of the recognition results of the available modalities. Experiments conducted on the constrained facial video dataset (WVU) and the unconstrained facial video dataset (HONDA/UCSD), resulted in a 99.17% and 97.14% Rank-1 recognition rates, respectively. The multimodal recognition accuracy demonstrates the superiority and robustness of the proposed approach irrespective of the illumination, non-planar movement, and pose variations present in the video clips even in the situation of missing modalities.

Hybrid feature extraction of multimodal images for face recognition

  • Cheema, Usman;Moon, Seungbin
    • Annual Conference of KIPS
    • /
    • 2018.10a
    • /
    • pp.880-881
    • /
    • 2018
  • Recently technological advancements have allowed visible, infrared and thermal imaging systems to be readily available for security and access control. Increasing applications of facial recognition for security and access control leads to emerging spoofing methodologies. To overcome these challenges of occlusion, replay attack and disguise, researches have proposed using multiple imaging modalities. Using infrared and thermal modalities alongside visible imaging helps to overcome the shortcomings of visible imaging. In this paper we review and propose hybrid feature extraction methods to combine data from multiple imaging systems simultaneously.

Emotion Recognition based on Multiple Modalities

  • Kim, Dong-Ju;Lee, Hyeon-Gu;Hong, Kwang-Seok
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.12 no.4
    • /
    • pp.228-236
    • /
    • 2011
  • Emotion recognition plays an important role in the research area of human-computer interaction, and it allows a more natural and more human-like communication between humans and computer. Most of previous work on emotion recognition focused on extracting emotions from face, speech or EEG information separately. Therefore, a novel approach is presented in this paper, including face, speech and EEG, to recognize the human emotion. The individual matching scores obtained from face, speech, and EEG are combined using a weighted-summation operation, and the fused-score is utilized to classify the human emotion. In the experiment results, the proposed approach gives an improvement of more than 18.64% when compared to the most successful unimodal approach, and also provides better performance compared to approaches integrating two modalities each other. From these results, we confirmed that the proposed approach achieved a significant performance improvement and the proposed method was very effective.

A Personal Video Event Classification Method based on Multi-Modalities by DNN-Learning (DNN 학습을 이용한 퍼스널 비디오 시퀀스의 멀티 모달 기반 이벤트 분류 방법)

  • Lee, Yu Jin;Nang, Jongho
    • Journal of KIISE
    • /
    • v.43 no.11
    • /
    • pp.1281-1297
    • /
    • 2016
  • In recent years, personal videos have seen a tremendous growth due to the substantial increase in the use of smart devices and networking services in which users create and share video content easily without many restrictions. However, taking both into account would significantly improve event detection performance because videos generally have multiple modalities and the frame data in video varies at different time points. This paper proposes an event detection method. In this method, high-level features are first extracted from multiple modalities in the videos, and the features are rearranged according to time sequence. Then the association of the modalities is learned by means of DNN to produce a personal video event detector. In our proposed method, audio and image data are first synchronized and then extracted. Then, the result is input into GoogLeNet as well as Multi-Layer Perceptron (MLP) to extract high-level features. The results are then re-arranged in time sequence, and every video is processed to extract one feature each for training by means of DNN.

Emotion Recognition Implementation with Multimodalities of Face, Voice and EEG

  • Udurume, Miracle;Caliwag, Angela;Lim, Wansu;Kim, Gwigon
    • Journal of information and communication convergence engineering
    • /
    • v.20 no.3
    • /
    • pp.174-180
    • /
    • 2022
  • Emotion recognition is an essential component of complete interaction between human and machine. The issues related to emotion recognition are a result of the different types of emotions expressed in several forms such as visual, sound, and physiological signal. Recent advancements in the field show that combined modalities, such as visual, voice and electroencephalography signals, lead to better result compared to the use of single modalities separately. Previous studies have explored the use of multiple modalities for accurate predictions of emotion; however the number of studies regarding real-time implementation is limited because of the difficulty in simultaneously implementing multiple modalities of emotion recognition. In this study, we proposed an emotion recognition system for real-time emotion recognition implementation. Our model was built with a multithreading block that enables the implementation of each modality using separate threads for continuous synchronization. First, we separately achieved emotion recognition for each modality before enabling the use of the multithreaded system. To verify the correctness of the results, we compared the performance accuracy of unimodal and multimodal emotion recognitions in real-time. The experimental results showed real-time user emotion recognition of the proposed model. In addition, the effectiveness of the multimodalities for emotion recognition was observed. Our multimodal model was able to obtain an accuracy of 80.1% as compared to the unimodality, which obtained accuracies of 70.9, 54.3, and 63.1%.

Enhancing Recommender Systems by Fusing Diverse Information Sources through Data Transformation and Feature Selection

  • Thi-Linh Ho;Anh-Cuong Le;Dinh-Hong Vu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.5
    • /
    • pp.1413-1432
    • /
    • 2023
  • Recommender systems aim to recommend items to users by taking into account their probable interests. This study focuses on creating a model that utilizes multiple sources of information about users and items by employing a multimodality approach. The study addresses the task of how to gather information from different sources (modalities) and transform them into a uniform format, resulting in a multi-modal feature description for users and items. This work also aims to transform and represent the features extracted from different modalities so that the information is in a compatible format for integration and contains important, useful information for the prediction model. To achieve this goal, we propose a novel multi-modal recommendation model, which involves extracting latent features of users and items from a utility matrix using matrix factorization techniques. Various transformation techniques are utilized to extract features from other sources of information such as user reviews, item descriptions, and item categories. We also proposed the use of Principal Component Analysis (PCA) and Feature Selection techniques to reduce the data dimension and extract important features as well as remove noisy features to increase the accuracy of the model. We conducted several different experimental models based on different subsets of modalities on the MovieLens and Amazon sub-category datasets. According to the experimental results, the proposed model significantly enhances the accuracy of recommendations when compared to SVD, which is acknowledged as one of the most effective models for recommender systems. Specifically, the proposed model reduces the RMSE by a range of 4.8% to 21.43% and increases the Precision by a range of 2.07% to 26.49% for the Amazon datasets. Similarly, for the MovieLens dataset, the proposed model reduces the RMSE by 45.61% and increases the Precision by 14.06%. Additionally, the experimental results on both datasets demonstrate that combining information from multiple modalities in the proposed model leads to superior outcomes compared to relying on a single type of information.

W3C based Interoperable Multimodal Communicator (W3C 기반 상호연동 가능한 멀티모달 커뮤니케이터)

  • Park, Daemin;Gwon, Daehyeok;Choi, Jinhuyck;Lee, Injae;Choi, Haechul
    • Journal of Broadcast Engineering
    • /
    • v.20 no.1
    • /
    • pp.140-152
    • /
    • 2015
  • HCI(Human Computer Interaction) enables the interaction between people and computers by using a human-familiar interface called as Modality. Recently, to provide an optimal interface according to various devices and service environment, an advanced HCI method using multiple modalities is intensively studied. However, the multimodal interface has difficulties that modalities have different data formats and are hard to be cooperated efficiently. To solve this problem, a multimodal communicator is introduced, which is based on EMMA(Extensible Multimodal Annotation Markup language) and MMI(Multimodal Interaction Framework) of W3C(World Wide Web Consortium) standards. This standard based framework consisting of modality component, interaction manager, and presentation component makes multiple modalities interoperable and provides a wide expansion capability for other modalities. Experimental results show that the multimodal communicator is facilitated by using multiple modalities of eye tracking and gesture recognition for a map browsing scenario.

Uncooperative Person Recognition Based on Stochastic Information Updates and Environment Estimators

  • Kim, Hye-Jin;Kim, Dohyung;Lee, Jaeyeon;Jeong, Il-Kwon
    • ETRI Journal
    • /
    • v.37 no.2
    • /
    • pp.395-405
    • /
    • 2015
  • We address the problem of uncooperative person recognition through continuous monitoring. Multiple modalities, such as face, height, clothes color, and voice, can be used when attempting to recognize a person. In general, not all modalities are available for a given frame; furthermore, only some modalities will be useful as some frames in a video sequence are of a quality that is too low to be able to recognize a person. We propose a method that makes use of stochastic information updates of temporal modalities and environment estimators to improve person recognition performance. The environment estimators provide information on whether a given modality is reliable enough to be used in a particular instance; such indicators mean that we can easily identify and eliminate meaningless data, thus increasing the overall efficiency of the method. Our proposed method was tested using movie clips acquired under an unconstrained environment that included a wide variation of scale and rotation; illumination changes; uncontrolled distances from a camera to users (varying from 0.5 m to 5 m); and natural views of the human body with various types of noise. In this real and challenging scenario, our proposed method resulted in an outstanding performance.