• 제목/요약/키워드: Multimodal Learning

검색결과 77건 처리시간 0.023초

멀티 모달 지도 대조 학습을 이용한 농작물 병해 진단 예측 방법 (Multimodal Supervised Contrastive Learning for Crop Disease Diagnosis)

  • 이현석;여도엽;함규성;오강한
    • 대한임베디드공학회논문지
    • /
    • 제18권6호
    • /
    • pp.285-292
    • /
    • 2023
  • With the wide spread of smart farms and the advancements in IoT technology, it is easy to obtain additional data in addition to crop images. Consequently, deep learning-based crop disease diagnosis research utilizing multimodal data has become important. This study proposes a crop disease diagnosis method using multimodal supervised contrastive learning by expanding upon the multimodal self-supervised learning. RandAugment method was used to augment crop image and time series of environment data. These augmented data passed through encoder and projection head for each modality, yielding low-dimensional features. Subsequently, the proposed multimodal supervised contrastive loss helped features from the same class get closer while pushing apart those from different classes. Following this, the pretrained model was fine-tuned for crop disease diagnosis. The visualization of t-SNE result and comparative assessments of crop disease diagnosis performance substantiate that the proposed method has superior performance than multimodal self-supervised learning.

Student Experiences in a Multimodal Composition Class

  • Park, Hyechong;Selfe, Cynthia L.
    • 영어어문교육
    • /
    • 제17권4호
    • /
    • pp.229-250
    • /
    • 2011
  • Despite the social turn in literacy studies, few empirical studies have investigated the practical applications and learning experiences of multimodal composition pedagogy. Using a qualitative research approach, this study examines undergraduates' experiences in producing multimodal texts. Findings report that students' experiences in a multimodal composition class epitomize enjoyable learning. Students enjoyed their learning process because (a) the multimodal literacy curriculum filled the pedagogical gap between the conventional school-sponsored alphabetic literacy pedagogy and widespread out-of-school multimodal literacy practices and (b) the usefulness of the curriculum helped students enhance their intrinsic motivation to learn and compose. By questioning fundamental assumptions about what counts as knowledge in the current ecology of literacies, the authors argue for a dynamic view of literacy into practice.

  • PDF

실감형 교과서를 위한 멀티모달 콘텐츠 저작 및 재생 프레임워크 설계 (Designing a Framework of Multimodal Contents Creation and Playback System for Immersive Textbook)

  • 김석열;박진아
    • 한국콘텐츠학회논문지
    • /
    • 제10권8호
    • /
    • pp.1-10
    • /
    • 2010
  • 가상교육 환경에 있어서 보다 효과적인 지식 전달을 위해서는 시청각적 정보에만 의존하는 기존의 학습 매체에서 탈피하여 상황에 맞는 촉각 피드백이 포함된 '실감형 교과서'의 도입이 필요하다. 그러나 저작 및 재생 환경상의 제약으로 인해 실감형 교과서를 위한 학습 콘텐츠의 확보와 활용은 아직 요원한 실정이다. 우리는 이러한 문제점에 착안하여 실감형 교과서를 위한 접근성 높은 멀티모달 학습 콘텐츠 저작 및 재생 프레임워크를 제안하였다. 본 프레임워크는 직관적인 콘텐츠 저작을 위한 스크립트 포맷과 이를 재생하기 위한 콘텐츠 재생부로 구성되어 있다. 스크립트 규격 정의 단계에서는 학습 콘텐츠에 요구되는 요소들을 규명하고 이를 반영한 XML 기반의 메타언어를 정의하였다. 그리고 콘텐츠 재생부는 작성된 콘텐츠를 해석하고 사용자로부터의 입력에 대응하여 시각 및 촉각 렌더링 루프를 통해 사용자에게 멀티모달피드백을 제공하도록 설계되었다. 이렇게 제안된 내용을 바탕으로 프로토타입을 구현하고 사용자 평가를 수행하여 본 프레임워크의 효용성을 검증하는 한편 앞으로의 개선 방향에 대해 논의하였다.

The influence of learning style in understanding analogies and 2D animations in embryology course

  • Narayanan, Suresh;Ananthy, Vimala
    • Anatomy and Cell Biology
    • /
    • 제51권4호
    • /
    • pp.260-265
    • /
    • 2018
  • Undergraduate students struggle to comprehend embryology because of its dynamic nature. Studies have recommended using a combination of teaching methods to match the student's learning style. But there has been no study to describe the effect of such teaching strategy over the different types of learners. In the present study, an attempt has been made to teach embryology using the combination of analogies and simple 2D animations made with Microsoft powerpoint software. The objective of the study is to estimate the difference in academic improvement and perception scale between the different types of learners after introducing analogies and 2D animation in a lecture environment. Based on Visual, Aural, Read/Write, and Kinesthetic (VARK) scoring system the learners were grouped into unimodal and multimodal learners. There was significant improvement in post-test score among the unimodal (P<0.001) and multimodal learners (P<0.001). When the post-test score was compared between the two groups, the multimodal learners performed better the unimodal learners (P=0.018). But there was no difference in the perception of animations and analogies and long-term assessment between the groups. The multimodal learners performed better than unimodal learners in short term recollection, but in long term retention of knowledge the varied learning style didn't influence its outcome.

Automated detection of panic disorder based on multimodal physiological signals using machine learning

  • Eun Hye Jang;Kwan Woo Choi;Ah Young Kim;Han Young Yu;Hong Jin Jeon;Sangwon Byun
    • ETRI Journal
    • /
    • 제45권1호
    • /
    • pp.105-118
    • /
    • 2023
  • We tested the feasibility of automated discrimination of patients with panic disorder (PD) from healthy controls (HCs) based on multimodal physiological responses using machine learning. Electrocardiogram (ECG), electrodermal activity (EDA), respiration (RESP), and peripheral temperature (PT) of the participants were measured during three experimental phases: rest, stress, and recovery. Eleven physiological features were extracted from each phase and used as input data. Logistic regression (LoR), k-nearest neighbor (KNN), support vector machine (SVM), random forest (RF), and multilayer perceptron (MLP) algorithms were implemented with nested cross-validation. Linear regression analysis showed that ECG and PT features obtained in the stress and recovery phases were significant predictors of PD. We achieved the highest accuracy (75.61%) with MLP using all 33 features. With the exception of MLP, applying the significant predictors led to a higher accuracy than using 24 ECG features. These results suggest that combining multimodal physiological signals measured during various states of autonomic arousal has the potential to differentiate patients with PD from HCs.

Predicting Session Conversion on E-commerce: A Deep Learning-based Multimodal Fusion Approach

  • Minsu Kim;Woosik Shin;SeongBeom Kim;Hee-Woong Kim
    • Asia pacific journal of information systems
    • /
    • 제33권3호
    • /
    • pp.737-767
    • /
    • 2023
  • With the availability of big customer data and advances in machine learning techniques, the prediction of customer behavior at the session-level has attracted considerable attention from marketing practitioners and scholars. This study aims to predict customer purchase conversion at the session-level by employing customer profile, transaction, and clickstream data. For this purpose, we develop a multimodal deep learning fusion model with dynamic and static features (i.e., DS-fusion). Specifically, we base page views within focal visist and recency, frequency, monetary value, and clumpiness (RFMC) for dynamic and static features, respectively, to comprehensively capture customer characteristics for buying behaviors. Our model with deep learning architectures combines these features for conversion prediction. We validate the proposed model using real-world e-commerce data. The experimental results reveal that our model outperforms unimodal classifiers with each feature and the classical machine learning models with dynamic and static features, including random forest and logistic regression. In this regard, this study sheds light on the promise of the machine learning approach with the complementary method for different modalities in predicting customer behaviors.

A Survey of Multimodal Systems and Techniques for Motor Learning

  • Tadayon, Ramin;McDaniel, Troy;Panchanathan, Sethuraman
    • Journal of Information Processing Systems
    • /
    • 제13권1호
    • /
    • pp.8-25
    • /
    • 2017
  • This survey paper explores the application of multimodal feedback in automated systems for motor learning. In this paper, we review the findings shown in recent studies in this field using rehabilitation and various motor training scenarios as context. We discuss popular feedback delivery and sensing mechanisms for motion capture and processing in terms of requirements, benefits, and limitations. The selection of modalities is presented via our having reviewed the best-practice approaches for each modality relative to motor task complexity with example implementations in recent work. We summarize the advantages and disadvantages of several approaches for integrating modalities in terms of fusion and frequency of feedback during motor tasks. Finally, we review the limitations of perceptual bandwidth and provide an evaluation of the information transfer for each modality.

멀티모달 개념계층모델을 이용한 만화비디오 컨텐츠 학습을 통한 등장인물 기반 비디오 자막 생성 (Character-based Subtitle Generation by Learning of Multimodal Concept Hierarchy from Cartoon Videos)

  • 김경민;하정우;이범진;장병탁
    • 정보과학회 논문지
    • /
    • 제42권4호
    • /
    • pp.451-458
    • /
    • 2015
  • 기존 멀티모달 학습 기법의 대부분은 데이터에 포함된 컨텐츠 모델링을 통한 지식획득보다는 이미지나 비디오 검색 및 태깅 등 구체적 문제 해결에 집중되어 있었다. 본 논문에서는 멀티모달 개념계층모델을 이용하여 만화 비디오로부터 컨텐츠를 학습하는 기법을 제안하고 학습된 모델로부터 등장인물의 특성을 고려한 자막을 생성하는 방법을 제시한다. 멀티모달 개념계층 모델은 개념변수층과 단어와 이미지 패치의 고차 패턴을 표현하는 멀티모달 하이퍼네트워크층으로 구성되며 이러한 모델구조를 통해 각각의 개념변수는 단어와 이미지패치 변수들의 확률분포로 표현된다. 제안하는 모델은 비디오의 자막과 화면 이미지로부터 등장 인물의 특성을 개념으로서 학습하며 이는 순차적 베이지안 학습으로 설명된다. 그리고 학습된 개념을 기반으로 텍스트 질의가 주어질 때 등장인물의 특성을 고려한 비디오 자막을 생성한다. 실험을 위해 총 268분 상영시간의 유아용 비디오 '뽀로로'로부터 등장인물들의 개념이 학습되고 학습된 모델로부터 각각의 등장인물의 특성을 고려한 자막 문장을 생성했으며 이를 기존의 멀티모달 학습모델과 비교했다. 실험결과는 멀티모달 개념계층모델은 다른 모델들에 비해 더 정확한 자막 문장이 생성됨을 보여준다. 또한 동일한 질의어에 대해서도 등장인물의 특성을 반영하는 다양한 문장이 생성됨을 확인하였다.

멀티모달 방식을 통한 가스 종류 인식 딥러닝 모델 개발 (Development of Gas Type Identification Deep-learning Model through Multimodal Method)

  • 안서희;김경영;김동주
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제12권12호
    • /
    • pp.525-534
    • /
    • 2023
  • 가스 누출 감지 시스템은 가스의 폭발성과 독성으로 인한 인명 피해를 최소화할 핵심적인 장치이다. 누출 감지 시스템은 대부분 단일 센서를 활용한 방식으로, 가스 센서나 열화상 카메라를 통한 검출 방식으로 진행되고 있다. 이러한 단일 센서 활용의 가스 누출감지 시스템 성능을 고도화하기 위하여, 본 연구에서는 가스 센서와 열화상 이미지 데이터에 멀티모달형 딥러닝을 적용한 연구를 소개한다. 멀티모달 공인 데이터셋인 MultimodalGasData를 통해 기존 논문과의 성능을 비교하였고, 가스 센서와 열화상 카메라의 단일모달 모델을 기반하여 네 가지 멀티모달 모델을 설계 및 학습하였다. 이를 통해 가스 센서와 열화상 카메라는 각각 1D CNN, GasNet 모델이 96.3%와 96.4%의 가장 높은 성능을 보였다. 앞선 두 단일모달 모델을 기반한 Early Fusion 형식의 멀티모달 모델 성능은 99.3%로 가장 높았으며, 또한 기존 논문의 멀티모달 모델 대비 3.3% 높았다. 본 연구의 높은 신뢰성을 갖춘 가스 누출 감지 시스템을 통해 가스 누출로 인한 추가적인 피해가 최소화되길 기대한다.

Multimodal Biometrics Recognition from Facial Video with Missing Modalities Using Deep Learning

  • Maity, Sayan;Abdel-Mottaleb, Mohamed;Asfour, Shihab S.
    • Journal of Information Processing Systems
    • /
    • 제16권1호
    • /
    • pp.6-29
    • /
    • 2020
  • Biometrics identification using multiple modalities has attracted the attention of many researchers as it produces more robust and trustworthy results than single modality biometrics. In this paper, we present a novel multimodal recognition system that trains a deep learning network to automatically learn features after extracting multiple biometric modalities from a single data source, i.e., facial video clips. Utilizing different modalities, i.e., left ear, left profile face, frontal face, right profile face, and right ear, present in the facial video clips, we train supervised denoising auto-encoders to automatically extract robust and non-redundant features. The automatically learned features are then used to train modality specific sparse classifiers to perform the multimodal recognition. Moreover, the proposed technique has proven robust when some of the above modalities were missing during the testing. The proposed system has three main components that are responsible for detection, which consists of modality specific detectors to automatically detect images of different modalities present in facial video clips; feature selection, which uses supervised denoising sparse auto-encoders network to capture discriminative representations that are robust to the illumination and pose variations; and classification, which consists of a set of modality specific sparse representation classifiers for unimodal recognition, followed by score level fusion of the recognition results of the available modalities. Experiments conducted on the constrained facial video dataset (WVU) and the unconstrained facial video dataset (HONDA/UCSD), resulted in a 99.17% and 97.14% Rank-1 recognition rates, respectively. The multimodal recognition accuracy demonstrates the superiority and robustness of the proposed approach irrespective of the illumination, non-planar movement, and pose variations present in the video clips even in the situation of missing modalities.