• Title/Summary/Keyword: Multimodal Deep Learning

Search Result 35, Processing Time 0.029 seconds

Multimodal Supervised Contrastive Learning for Crop Disease Diagnosis (멀티 모달 지도 대조 학습을 이용한 농작물 병해 진단 예측 방법)

  • Hyunseok Lee;Doyeob Yeo;Gyu-Sung Ham;Kanghan Oh
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.18 no.6
    • /
    • pp.285-292
    • /
    • 2023
  • With the wide spread of smart farms and the advancements in IoT technology, it is easy to obtain additional data in addition to crop images. Consequently, deep learning-based crop disease diagnosis research utilizing multimodal data has become important. This study proposes a crop disease diagnosis method using multimodal supervised contrastive learning by expanding upon the multimodal self-supervised learning. RandAugment method was used to augment crop image and time series of environment data. These augmented data passed through encoder and projection head for each modality, yielding low-dimensional features. Subsequently, the proposed multimodal supervised contrastive loss helped features from the same class get closer while pushing apart those from different classes. Following this, the pretrained model was fine-tuned for crop disease diagnosis. The visualization of t-SNE result and comparative assessments of crop disease diagnosis performance substantiate that the proposed method has superior performance than multimodal self-supervised learning.

Predicting Session Conversion on E-commerce: A Deep Learning-based Multimodal Fusion Approach

  • Minsu Kim;Woosik Shin;SeongBeom Kim;Hee-Woong Kim
    • Asia pacific journal of information systems
    • /
    • v.33 no.3
    • /
    • pp.737-767
    • /
    • 2023
  • With the availability of big customer data and advances in machine learning techniques, the prediction of customer behavior at the session-level has attracted considerable attention from marketing practitioners and scholars. This study aims to predict customer purchase conversion at the session-level by employing customer profile, transaction, and clickstream data. For this purpose, we develop a multimodal deep learning fusion model with dynamic and static features (i.e., DS-fusion). Specifically, we base page views within focal visist and recency, frequency, monetary value, and clumpiness (RFMC) for dynamic and static features, respectively, to comprehensively capture customer characteristics for buying behaviors. Our model with deep learning architectures combines these features for conversion prediction. We validate the proposed model using real-world e-commerce data. The experimental results reveal that our model outperforms unimodal classifiers with each feature and the classical machine learning models with dynamic and static features, including random forest and logistic regression. In this regard, this study sheds light on the promise of the machine learning approach with the complementary method for different modalities in predicting customer behaviors.

Driver Drowsiness Detection Model using Image and PPG data Based on Multimodal Deep Learning (이미지와 PPG 데이터를 사용한 멀티모달 딥 러닝 기반의 운전자 졸음 감지 모델)

  • Choi, Hyung-Tak;Back, Moon-Ki;Kang, Jae-Sik;Yoon, Seung-Won;Lee, Kyu-Chul
    • Database Research
    • /
    • v.34 no.3
    • /
    • pp.45-57
    • /
    • 2018
  • The drowsiness that occurs in the driving is a very dangerous driver condition that can be directly linked to a major accident. In order to prevent drowsiness, there are traditional drowsiness detection methods to grasp the driver's condition, but there is a limit to the generalized driver's condition recognition that reflects the individual characteristics of drivers. In recent years, deep learning based state recognition studies have been proposed to recognize drivers' condition. Deep learning has the advantage of extracting features from a non-human machine and deriving a more generalized recognition model. In this study, we propose a more accurate state recognition model than the existing deep learning method by learning image and PPG at the same time to grasp driver's condition. This paper confirms the effect of driver's image and PPG data on drowsiness detection and experiment to see if it improves the performance of learning model when used together. We confirmed the accuracy improvement of around 3% when using image and PPG together than using image alone. In addition, the multimodal deep learning based model that classifies the driver's condition into three categories showed a classification accuracy of 96%.

Convergence evaluation method using multisensory and matching painting and music using deep learning based on imaginary soundscape (Imaginary Soundscape 기반의 딥러닝을 활용한 회화와 음악의 매칭 및 다중 감각을 이용한 융합적 평가 방법)

  • Jeong, Hayoung;Kim, Youngjun;Cho, Jundong
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.11
    • /
    • pp.175-182
    • /
    • 2020
  • In this study, we introduced the technique of matching classical music using deep learning to design soundscape that can help the viewer appreciate painting and proposed an evaluation index to evaluate how well matching painting and music. The evaluation index was conducted with suitability evaluation through the Likeard 5-point scale and evaluation in a multimodal aspect. The suitability evaluation score of the 13 test participants for the deep learning based best match between painting and music was 3.74/5.0 and band the average cosine similarity of the multimodal evaluation of 13 participants was 0.79. We expect multimodal evaluation to be an evaluation index that can measure a new user experience. In addition, this study aims to improve the experience of multisensory artworks by proposing the interaction between visual and auditory. The proposed matching of painting and music method can be used in multisensory artwork exhibition and furthermore it will increase the accessibility of visually impaired people to appreciate artworks.

Multimodal Biometrics Recognition from Facial Video with Missing Modalities Using Deep Learning

  • Maity, Sayan;Abdel-Mottaleb, Mohamed;Asfour, Shihab S.
    • Journal of Information Processing Systems
    • /
    • v.16 no.1
    • /
    • pp.6-29
    • /
    • 2020
  • Biometrics identification using multiple modalities has attracted the attention of many researchers as it produces more robust and trustworthy results than single modality biometrics. In this paper, we present a novel multimodal recognition system that trains a deep learning network to automatically learn features after extracting multiple biometric modalities from a single data source, i.e., facial video clips. Utilizing different modalities, i.e., left ear, left profile face, frontal face, right profile face, and right ear, present in the facial video clips, we train supervised denoising auto-encoders to automatically extract robust and non-redundant features. The automatically learned features are then used to train modality specific sparse classifiers to perform the multimodal recognition. Moreover, the proposed technique has proven robust when some of the above modalities were missing during the testing. The proposed system has three main components that are responsible for detection, which consists of modality specific detectors to automatically detect images of different modalities present in facial video clips; feature selection, which uses supervised denoising sparse auto-encoders network to capture discriminative representations that are robust to the illumination and pose variations; and classification, which consists of a set of modality specific sparse representation classifiers for unimodal recognition, followed by score level fusion of the recognition results of the available modalities. Experiments conducted on the constrained facial video dataset (WVU) and the unconstrained facial video dataset (HONDA/UCSD), resulted in a 99.17% and 97.14% Rank-1 recognition rates, respectively. The multimodal recognition accuracy demonstrates the superiority and robustness of the proposed approach irrespective of the illumination, non-planar movement, and pose variations present in the video clips even in the situation of missing modalities.

Development of Gas Type Identification Deep-learning Model through Multimodal Method (멀티모달 방식을 통한 가스 종류 인식 딥러닝 모델 개발)

  • Seo Hee Ahn;Gyeong Yeong Kim;Dong Ju Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.12
    • /
    • pp.525-534
    • /
    • 2023
  • Gas leak detection system is a key to minimize the loss of life due to the explosiveness and toxicity of gas. Most of the leak detection systems detect by gas sensors or thermal imaging cameras. To improve the performance of gas leak detection system using single-modal methods, the paper propose multimodal approach to gas sensor data and thermal camera data in developing a gas type identification model. MultimodalGasData, a multimodal open-dataset, is used to compare the performance of the four models developed through multimodal approach to gas sensors and thermal cameras with existing models. As a result, 1D CNN and GasNet models show the highest performance of 96.3% and 96.4%. The performance of the combined early fusion model of 1D CNN and GasNet reached 99.3%, 3.3% higher than the existing model. We hoped that further damage caused by gas leaks can be minimized through the gas leak detection system proposed in the study.

Automated detection of panic disorder based on multimodal physiological signals using machine learning

  • Eun Hye Jang;Kwan Woo Choi;Ah Young Kim;Han Young Yu;Hong Jin Jeon;Sangwon Byun
    • ETRI Journal
    • /
    • v.45 no.1
    • /
    • pp.105-118
    • /
    • 2023
  • We tested the feasibility of automated discrimination of patients with panic disorder (PD) from healthy controls (HCs) based on multimodal physiological responses using machine learning. Electrocardiogram (ECG), electrodermal activity (EDA), respiration (RESP), and peripheral temperature (PT) of the participants were measured during three experimental phases: rest, stress, and recovery. Eleven physiological features were extracted from each phase and used as input data. Logistic regression (LoR), k-nearest neighbor (KNN), support vector machine (SVM), random forest (RF), and multilayer perceptron (MLP) algorithms were implemented with nested cross-validation. Linear regression analysis showed that ECG and PT features obtained in the stress and recovery phases were significant predictors of PD. We achieved the highest accuracy (75.61%) with MLP using all 33 features. With the exception of MLP, applying the significant predictors led to a higher accuracy than using 24 ECG features. These results suggest that combining multimodal physiological signals measured during various states of autonomic arousal has the potential to differentiate patients with PD from HCs.

Multimodal Face Biometrics by Using Convolutional Neural Networks

  • Tiong, Leslie Ching Ow;Kim, Seong Tae;Ro, Yong Man
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.2
    • /
    • pp.170-178
    • /
    • 2017
  • Biometric recognition is one of the major challenging topics which needs high performance of recognition accuracy. Most of existing methods rely on a single source of biometric to achieve recognition. The recognition accuracy in biometrics is affected by the variability of effects, including illumination and appearance variations. In this paper, we propose a new multimodal biometrics recognition using convolutional neural network. We focus on multimodal biometrics from face and periocular regions. Through experiments, we have demonstrated that facial multimodal biometrics features deep learning framework is helpful for achieving high recognition performance.

Multi-modal Representation Learning for Classification of Imported Goods (수입물품의 품목 분류를 위한 멀티모달 표현 학습)

  • Apgil Lee;Keunho Choi;Gunwoo Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.203-214
    • /
    • 2023
  • The Korea Customs Service is efficiently handling business with an electronic customs system that can effectively handle one-stop business. This is the case and a more effective method is needed. Import and export require HS Code (Harmonized System Code) for classification and tax rate application for all goods, and item classification that classifies the HS Code is a highly difficult task that requires specialized knowledge and experience and is an important part of customs clearance procedures. Therefore, this study uses various types of data information such as product name, product description, and product image in the item classification request form to learn and develop a deep learning model to reflect information well based on Multimodal representation learning. It is expected to reduce the burden of customs duties by classifying and recommending HS Codes and help with customs procedures by promptly classifying items.

Building Detection by Convolutional Neural Network with Infrared Image, LiDAR Data and Characteristic Information Fusion (적외선 영상, 라이다 데이터 및 특성정보 융합 기반의 합성곱 인공신경망을 이용한 건물탐지)

  • Cho, Eun Ji;Lee, Dong-Cheon
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.38 no.6
    • /
    • pp.635-644
    • /
    • 2020
  • Object recognition, detection and instance segmentation based on DL (Deep Learning) have being used in various practices, and mainly optical images are used as training data for DL models. The major objective of this paper is object segmentation and building detection by utilizing multimodal datasets as well as optical images for training Detectron2 model that is one of the improved R-CNN (Region-based Convolutional Neural Network). For the implementation, infrared aerial images, LiDAR data, and edges from the images, and Haralick features, that are representing statistical texture information, from LiDAR (Light Detection And Ranging) data were generated. The performance of the DL models depends on not only on the amount and characteristics of the training data, but also on the fusion method especially for the multimodal data. The results of segmenting objects and detecting buildings by applying hybrid fusion - which is a mixed method of early fusion and late fusion - results in a 32.65% improvement in building detection rate compared to training by optical image only. The experiments demonstrated complementary effect of the training multimodal data having unique characteristics and fusion strategy.