• Title/Summary/Keyword: Multimodal Learning

Search Result 77, Processing Time 0.024 seconds

Multimodal Supervised Contrastive Learning for Crop Disease Diagnosis (멀티 모달 지도 대조 학습을 이용한 농작물 병해 진단 예측 방법)

  • Hyunseok Lee;Doyeob Yeo;Gyu-Sung Ham;Kanghan Oh
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.18 no.6
    • /
    • pp.285-292
    • /
    • 2023
  • With the wide spread of smart farms and the advancements in IoT technology, it is easy to obtain additional data in addition to crop images. Consequently, deep learning-based crop disease diagnosis research utilizing multimodal data has become important. This study proposes a crop disease diagnosis method using multimodal supervised contrastive learning by expanding upon the multimodal self-supervised learning. RandAugment method was used to augment crop image and time series of environment data. These augmented data passed through encoder and projection head for each modality, yielding low-dimensional features. Subsequently, the proposed multimodal supervised contrastive loss helped features from the same class get closer while pushing apart those from different classes. Following this, the pretrained model was fine-tuned for crop disease diagnosis. The visualization of t-SNE result and comparative assessments of crop disease diagnosis performance substantiate that the proposed method has superior performance than multimodal self-supervised learning.

Student Experiences in a Multimodal Composition Class

  • Park, Hyechong;Selfe, Cynthia L.
    • English Language & Literature Teaching
    • /
    • v.17 no.4
    • /
    • pp.229-250
    • /
    • 2011
  • Despite the social turn in literacy studies, few empirical studies have investigated the practical applications and learning experiences of multimodal composition pedagogy. Using a qualitative research approach, this study examines undergraduates' experiences in producing multimodal texts. Findings report that students' experiences in a multimodal composition class epitomize enjoyable learning. Students enjoyed their learning process because (a) the multimodal literacy curriculum filled the pedagogical gap between the conventional school-sponsored alphabetic literacy pedagogy and widespread out-of-school multimodal literacy practices and (b) the usefulness of the curriculum helped students enhance their intrinsic motivation to learn and compose. By questioning fundamental assumptions about what counts as knowledge in the current ecology of literacies, the authors argue for a dynamic view of literacy into practice.

  • PDF

Designing a Framework of Multimodal Contents Creation and Playback System for Immersive Textbook (실감형 교과서를 위한 멀티모달 콘텐츠 저작 및 재생 프레임워크 설계)

  • Kim, Seok-Yeol;Park, Jin-Ah
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.8
    • /
    • pp.1-10
    • /
    • 2010
  • For virtual education, the multimodal learning environment with haptic feedback, termed 'immersive textbook', is necessary to enhance the learning effectiveness. However, the learning contents for immersive textbook are not widely available due to the constraints in creation and playback environments. To address this problem, we propose a framework for producing and displaying the multimodal contents for immersive textbook. Our framework provides an XML-based meta-language to produce the multimodal learning contents in the form of intuitive script. Thus it can help the user, without any prior knowledge of multimodal interactions, produce his or her own learning contents. The contents are then interpreted by script engine and delivered to the user by visual and haptic rendering loops. Also we implemented a prototype based on the aforementioned proposals and performed user evaluation to verify the validity of our framework.

The influence of learning style in understanding analogies and 2D animations in embryology course

  • Narayanan, Suresh;Ananthy, Vimala
    • Anatomy and Cell Biology
    • /
    • v.51 no.4
    • /
    • pp.260-265
    • /
    • 2018
  • Undergraduate students struggle to comprehend embryology because of its dynamic nature. Studies have recommended using a combination of teaching methods to match the student's learning style. But there has been no study to describe the effect of such teaching strategy over the different types of learners. In the present study, an attempt has been made to teach embryology using the combination of analogies and simple 2D animations made with Microsoft powerpoint software. The objective of the study is to estimate the difference in academic improvement and perception scale between the different types of learners after introducing analogies and 2D animation in a lecture environment. Based on Visual, Aural, Read/Write, and Kinesthetic (VARK) scoring system the learners were grouped into unimodal and multimodal learners. There was significant improvement in post-test score among the unimodal (P<0.001) and multimodal learners (P<0.001). When the post-test score was compared between the two groups, the multimodal learners performed better the unimodal learners (P=0.018). But there was no difference in the perception of animations and analogies and long-term assessment between the groups. The multimodal learners performed better than unimodal learners in short term recollection, but in long term retention of knowledge the varied learning style didn't influence its outcome.

Automated detection of panic disorder based on multimodal physiological signals using machine learning

  • Eun Hye Jang;Kwan Woo Choi;Ah Young Kim;Han Young Yu;Hong Jin Jeon;Sangwon Byun
    • ETRI Journal
    • /
    • v.45 no.1
    • /
    • pp.105-118
    • /
    • 2023
  • We tested the feasibility of automated discrimination of patients with panic disorder (PD) from healthy controls (HCs) based on multimodal physiological responses using machine learning. Electrocardiogram (ECG), electrodermal activity (EDA), respiration (RESP), and peripheral temperature (PT) of the participants were measured during three experimental phases: rest, stress, and recovery. Eleven physiological features were extracted from each phase and used as input data. Logistic regression (LoR), k-nearest neighbor (KNN), support vector machine (SVM), random forest (RF), and multilayer perceptron (MLP) algorithms were implemented with nested cross-validation. Linear regression analysis showed that ECG and PT features obtained in the stress and recovery phases were significant predictors of PD. We achieved the highest accuracy (75.61%) with MLP using all 33 features. With the exception of MLP, applying the significant predictors led to a higher accuracy than using 24 ECG features. These results suggest that combining multimodal physiological signals measured during various states of autonomic arousal has the potential to differentiate patients with PD from HCs.

Predicting Session Conversion on E-commerce: A Deep Learning-based Multimodal Fusion Approach

  • Minsu Kim;Woosik Shin;SeongBeom Kim;Hee-Woong Kim
    • Asia pacific journal of information systems
    • /
    • v.33 no.3
    • /
    • pp.737-767
    • /
    • 2023
  • With the availability of big customer data and advances in machine learning techniques, the prediction of customer behavior at the session-level has attracted considerable attention from marketing practitioners and scholars. This study aims to predict customer purchase conversion at the session-level by employing customer profile, transaction, and clickstream data. For this purpose, we develop a multimodal deep learning fusion model with dynamic and static features (i.e., DS-fusion). Specifically, we base page views within focal visist and recency, frequency, monetary value, and clumpiness (RFMC) for dynamic and static features, respectively, to comprehensively capture customer characteristics for buying behaviors. Our model with deep learning architectures combines these features for conversion prediction. We validate the proposed model using real-world e-commerce data. The experimental results reveal that our model outperforms unimodal classifiers with each feature and the classical machine learning models with dynamic and static features, including random forest and logistic regression. In this regard, this study sheds light on the promise of the machine learning approach with the complementary method for different modalities in predicting customer behaviors.

A Survey of Multimodal Systems and Techniques for Motor Learning

  • Tadayon, Ramin;McDaniel, Troy;Panchanathan, Sethuraman
    • Journal of Information Processing Systems
    • /
    • v.13 no.1
    • /
    • pp.8-25
    • /
    • 2017
  • This survey paper explores the application of multimodal feedback in automated systems for motor learning. In this paper, we review the findings shown in recent studies in this field using rehabilitation and various motor training scenarios as context. We discuss popular feedback delivery and sensing mechanisms for motion capture and processing in terms of requirements, benefits, and limitations. The selection of modalities is presented via our having reviewed the best-practice approaches for each modality relative to motor task complexity with example implementations in recent work. We summarize the advantages and disadvantages of several approaches for integrating modalities in terms of fusion and frequency of feedback during motor tasks. Finally, we review the limitations of perceptual bandwidth and provide an evaluation of the information transfer for each modality.

Character-based Subtitle Generation by Learning of Multimodal Concept Hierarchy from Cartoon Videos (멀티모달 개념계층모델을 이용한 만화비디오 컨텐츠 학습을 통한 등장인물 기반 비디오 자막 생성)

  • Kim, Kyung-Min;Ha, Jung-Woo;Lee, Beom-Jin;Zhang, Byoung-Tak
    • Journal of KIISE
    • /
    • v.42 no.4
    • /
    • pp.451-458
    • /
    • 2015
  • Previous multimodal learning methods focus on problem-solving aspects, such as image and video search and tagging, rather than on knowledge acquisition via content modeling. In this paper, we propose the Multimodal Concept Hierarchy (MuCH), which is a content modeling method that uses a cartoon video dataset and a character-based subtitle generation method from the learned model. The MuCH model has a multimodal hypernetwork layer, in which the patterns of the words and image patches are represented, and a concept layer, in which each concept variable is represented by a probability distribution of the words and the image patches. The model can learn the characteristics of the characters as concepts from the video subtitles and scene images by using a Bayesian learning method and can also generate character-based subtitles from the learned model if text queries are provided. As an experiment, the MuCH model learned concepts from 'Pororo' cartoon videos with a total of 268 minutes in length and generated character-based subtitles. Finally, we compare the results with those of other multimodal learning models. The Experimental results indicate that given the same text query, our model generates more accurate and more character-specific subtitles than other models.

Development of Gas Type Identification Deep-learning Model through Multimodal Method (멀티모달 방식을 통한 가스 종류 인식 딥러닝 모델 개발)

  • Seo Hee Ahn;Gyeong Yeong Kim;Dong Ju Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.12
    • /
    • pp.525-534
    • /
    • 2023
  • Gas leak detection system is a key to minimize the loss of life due to the explosiveness and toxicity of gas. Most of the leak detection systems detect by gas sensors or thermal imaging cameras. To improve the performance of gas leak detection system using single-modal methods, the paper propose multimodal approach to gas sensor data and thermal camera data in developing a gas type identification model. MultimodalGasData, a multimodal open-dataset, is used to compare the performance of the four models developed through multimodal approach to gas sensors and thermal cameras with existing models. As a result, 1D CNN and GasNet models show the highest performance of 96.3% and 96.4%. The performance of the combined early fusion model of 1D CNN and GasNet reached 99.3%, 3.3% higher than the existing model. We hoped that further damage caused by gas leaks can be minimized through the gas leak detection system proposed in the study.

Multimodal Biometrics Recognition from Facial Video with Missing Modalities Using Deep Learning

  • Maity, Sayan;Abdel-Mottaleb, Mohamed;Asfour, Shihab S.
    • Journal of Information Processing Systems
    • /
    • v.16 no.1
    • /
    • pp.6-29
    • /
    • 2020
  • Biometrics identification using multiple modalities has attracted the attention of many researchers as it produces more robust and trustworthy results than single modality biometrics. In this paper, we present a novel multimodal recognition system that trains a deep learning network to automatically learn features after extracting multiple biometric modalities from a single data source, i.e., facial video clips. Utilizing different modalities, i.e., left ear, left profile face, frontal face, right profile face, and right ear, present in the facial video clips, we train supervised denoising auto-encoders to automatically extract robust and non-redundant features. The automatically learned features are then used to train modality specific sparse classifiers to perform the multimodal recognition. Moreover, the proposed technique has proven robust when some of the above modalities were missing during the testing. The proposed system has three main components that are responsible for detection, which consists of modality specific detectors to automatically detect images of different modalities present in facial video clips; feature selection, which uses supervised denoising sparse auto-encoders network to capture discriminative representations that are robust to the illumination and pose variations; and classification, which consists of a set of modality specific sparse representation classifiers for unimodal recognition, followed by score level fusion of the recognition results of the available modalities. Experiments conducted on the constrained facial video dataset (WVU) and the unconstrained facial video dataset (HONDA/UCSD), resulted in a 99.17% and 97.14% Rank-1 recognition rates, respectively. The multimodal recognition accuracy demonstrates the superiority and robustness of the proposed approach irrespective of the illumination, non-planar movement, and pose variations present in the video clips even in the situation of missing modalities.