• Title/Summary/Keyword: Multimodal model

Search Result 136, Processing Time 0.025 seconds

Building Detection by Convolutional Neural Network with Infrared Image, LiDAR Data and Characteristic Information Fusion (적외선 영상, 라이다 데이터 및 특성정보 융합 기반의 합성곱 인공신경망을 이용한 건물탐지)

  • Cho, Eun Ji;Lee, Dong-Cheon
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.38 no.6
    • /
    • pp.635-644
    • /
    • 2020
  • Object recognition, detection and instance segmentation based on DL (Deep Learning) have being used in various practices, and mainly optical images are used as training data for DL models. The major objective of this paper is object segmentation and building detection by utilizing multimodal datasets as well as optical images for training Detectron2 model that is one of the improved R-CNN (Region-based Convolutional Neural Network). For the implementation, infrared aerial images, LiDAR data, and edges from the images, and Haralick features, that are representing statistical texture information, from LiDAR (Light Detection And Ranging) data were generated. The performance of the DL models depends on not only on the amount and characteristics of the training data, but also on the fusion method especially for the multimodal data. The results of segmenting objects and detecting buildings by applying hybrid fusion - which is a mixed method of early fusion and late fusion - results in a 32.65% improvement in building detection rate compared to training by optical image only. The experiments demonstrated complementary effect of the training multimodal data having unique characteristics and fusion strategy.

Multimodal Emotional State Estimation Model for Implementation of Intelligent Exhibition Services (지능형 전시 서비스 구현을 위한 멀티모달 감정 상태 추정 모형)

  • Lee, Kichun;Choi, So Yun;Kim, Jae Kyeong;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.1-14
    • /
    • 2014
  • Both researchers and practitioners are showing an increased interested in interactive exhibition services. Interactive exhibition services are designed to directly respond to visitor responses in real time, so as to fully engage visitors' interest and enhance their satisfaction. In order to install an effective interactive exhibition service, it is essential to adopt intelligent technologies that enable accurate estimation of a visitor's emotional state from responses to exhibited stimulus. Studies undertaken so far have attempted to estimate the human emotional state, most of them doing so by gauging either facial expressions or audio responses. However, the most recent research suggests that, a multimodal approach that uses people's multiple responses simultaneously may lead to better estimation. Given this context, we propose a new multimodal emotional state estimation model that uses various responses including facial expressions, gestures, and movements measured by the Microsoft Kinect Sensor. In order to effectively handle a large amount of sensory data, we propose to use stratified sampling-based MRA (multiple regression analysis) as our estimation method. To validate the usefulness of the proposed model, we collected 602,599 responses and emotional state data with 274 variables from 15 people. When we applied our model to the data set, we found that our model estimated the levels of valence and arousal in the 10~15% error range. Since our proposed model is simple and stable, we expect that it will be applied not only in intelligent exhibition services, but also in other areas such as e-learning and personalized advertising.

Multimodal Medical Image Fusion Based on Double-Layer Decomposer and Fine Structure Preservation Model (복층 분해기와 상세구조 보존모델에 기반한 다중모드 의료영상 융합)

  • Zhang, Yingmei;Lee, Hyo Jong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.6
    • /
    • pp.185-192
    • /
    • 2022
  • Multimodal medical image fusion (MMIF) fuses two images containing different structural details generated in two different modes into a comprehensive image with saturated information, which can help doctors improve the accuracy of observation and treatment of patients' diseases. Therefore, a method based on double-layer decomposer and fine structure preservation model is proposed. Firstly, a double-layer decomposer is applied to decompose the source images into the energy layers and structure layers, which can preserve details well. Secondly, The structure layer is processed by combining the structure tensor operator (STO) and max-abs. As for the energy layers, a fine structure preservation model is proposed to guide the fusion, further improving the image quality. Finally, the fused image can be achieved by performing an addition operation between the two sub-fused images formed through the fusion rules. Experiments manifest that our method has excellent performance compared with several typical fusion methods.

Multimodal audiovisual speech recognition architecture using a three-feature multi-fusion method for noise-robust systems

  • Sanghun Jeon;Jieun Lee;Dohyeon Yeo;Yong-Ju Lee;SeungJun Kim
    • ETRI Journal
    • /
    • v.46 no.1
    • /
    • pp.22-34
    • /
    • 2024
  • Exposure to varied noisy environments impairs the recognition performance of artificial intelligence-based speech recognition technologies. Degraded-performance services can be utilized as limited systems that assure good performance in certain environments, but impair the general quality of speech recognition services. This study introduces an audiovisual speech recognition (AVSR) model robust to various noise settings, mimicking human dialogue recognition elements. The model converts word embeddings and log-Mel spectrograms into feature vectors for audio recognition. A dense spatial-temporal convolutional neural network model extracts features from log-Mel spectrograms, transformed for visual-based recognition. This approach exhibits improved aural and visual recognition capabilities. We assess the signal-to-noise ratio in nine synthesized noise environments, with the proposed model exhibiting lower average error rates. The error rate for the AVSR model using a three-feature multi-fusion method is 1.711%, compared to the general 3.939% rate. This model is applicable in noise-affected environments owing to its enhanced stability and recognition rate.

Data model of Multimodal Visual Interface (멀티모달 비주얼 인터페이스의 테이터형)

  • Malyanov, Ilya;d'Auriol, Brian J.;Lee, Sung-Young;Lee, Young-Koo
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2011.06b
    • /
    • pp.240-241
    • /
    • 2011
  • Contemporary electronic healthcare systems are getting more and more complex, providing users a broad functionality, but often fail to have accessible interfaces. However, the importance of a good interface is nearly as great as of the rest of the system. Development of an intuitive multimodal interface for a healthcare system is the goal of our research work. This paper discusses data model of the interface.

Prediction of Concrete Pumping Using Various Rheological Models

  • Choi, Myoung Sung;Kim, Young Jin;Kim, Jin Keun
    • International Journal of Concrete Structures and Materials
    • /
    • v.8 no.4
    • /
    • pp.269-278
    • /
    • 2014
  • When concrete is being transported through a pipe, the lubrication layer is formed at the interface between concrete and the pipe wall and is the major factor facilitating concrete pumping. A possible mechanism that illustrates to the formation of the layer is the shear-induced particle migration and determining the rheological parameters is a paramount factor to simulate the concrete flow in pipe. In this study, numerical simulations considering various rheological models in the shear-induced particle migration were conducted and compared with 170 m full-scale pumping tests. It was found that the multimodal viscosity model representing concrete as a three-phase suspension consisting of cement paste, sand and gravel can accurately simulate the lubrication layer. Moreover, considering the particle shape effects of concrete constituents with increased intrinsic viscosity can more exactly predict the pipe flow of pumped concrete.

Image classification and captioning model considering a CAM-based disagreement loss

  • Yoon, Yeo Chan;Park, So Young;Park, Soo Myoung;Lim, Heuiseok
    • ETRI Journal
    • /
    • v.42 no.1
    • /
    • pp.67-77
    • /
    • 2020
  • Image captioning has received significant interest in recent years, and notable results have been achieved. Most previous approaches have focused on generating visual descriptions from images, whereas a few approaches have exploited visual descriptions for image classification. This study demonstrates that a good performance can be achieved for both description generation and image classification through an end-to-end joint learning approach with a loss function, which encourages each task to reach a consensus. When given images and visual descriptions, the proposed model learns a multimodal intermediate embedding, which can represent both the textual and visual characteristics of an object. The performance can be improved for both tasks by sharing the multimodal embedding. Through a novel loss function based on class activation mapping, which localizes the discriminative image region of a model, we achieve a higher score when the captioning and classification model reaches a consensus on the key parts of the object. Using the proposed model, we established a substantially improved performance for each task on the UCSD Birds and Oxford Flowers datasets.

Electromechanical Modeling and Analysis of a Multimodal Piezoelectric Energy Harvester Comprising Three Connected Beams (연결된 세 보 구조를 갖는 다모드 압전 에너지 하베스터의 전기-역학적 모델링 및 해석)

  • Jeong, Sin-Woo;Yoo, Hong Hee
    • Transactions of the Korean Society for Noise and Vibration Engineering
    • /
    • v.26 no.4
    • /
    • pp.458-468
    • /
    • 2016
  • Electromechanical model for analyzing a multimodal piezoelectric energy harvester comprising three connected beams is presented in this paper. This system consists of three beams which are connected alternately. The piezoelectric layer is only attached to the middle beam. With this special structural configuration, the first, second, and third natural frequencies are congregated so that the energy harvester can generate meaningful amount of power consistently when the main frequency component of the excitation varies around the lowest three natural frequencies of the harvester. To investigate the dynamic and electric response of the piezoelectric energy harvester, an electromechanical model is developed using the Kane's method and the accuracy of the model is validated by comparing the results obtained with the model with those obtained with the commercial software ANSYS. The results show that the piezoelectric energy harvester comprising three connected beams has much broader power generating frequency range than that of the conventional piezoelectric energy harvester.

Multimodal MRI analysis model based on deep neural network for glioma grading classification (신경교종 등급 분류를 위한 심층신경망 기반 멀티모달 MRI 영상 분석 모델)

  • Kim, Jonghun;Park, Hyunjin
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.425-427
    • /
    • 2022
  • The grade of glioma is important information related to survival and thus is important to classify the grade of glioma before treatment to evaluate tumor progression and treatment planning. Glioma grading is mostly divided into high-grade glioma (HGG) and low-grade glioma (LGG). In this study, image preprocessing techniques are applied to analyze magnetic resonance imaging (MRI) using the deep neural network model. Classification performance of the deep neural network model is evaluated. The highest-performance EfficientNet-B6 model shows results of accuracy 0.9046, sensitivity 0.9570, specificity 0.7976, AUC 0.8702, and F1-Score 0.8152 in 5-fold cross-validation.

  • PDF

study on the resistance of the transshipment of transport logistics according to the mode choice - focus of cement (물류수송의 환적저항에 따른 수단선택 행태 변화 - 양회 중심으로)

  • Lee, Won-Tae;Kim, Sung-Eun;Kim, Si-Gon;Chung, Sung-Bong
    • Proceedings of the KSR Conference
    • /
    • 2010.06a
    • /
    • pp.1615-1622
    • /
    • 2010
  • Recently, there has been an increase in interest from the aspects of transshipment and connection between the means of transportation. Not only for passengers but also for freight transportation as the need for transportation efficiency is growing while the importance of logistic railway transportation is emerging. The domestic freight transportation is carried out by roads, railroads, ships, and port. However, as other means of transportation, except road, is impossible for Door to Door Service, multimodal transportation accompanied by road transportation is carried out. Here, even though 'transshipment' occurs, because of the lack of basic data regarding this, it is difficult to reflect it in the demand forecasting. With respect to the Korean freight O-D, it was very difficult to have equivalent comparison on the competitiveness and availability of transportation services between the point of departure and the final destination. Taking into account the study of implementation of logit model considering the time and cost of transshipment of multimodal transportation and the transshipment resistance value upon selecting means of freight transportation on multimodal transportation was comparatively insufficient. This study consisted of questionnaire targeting shippers, and based on this, transshipment resistance value was calculated by deriving utility function. By doing so, I intend to examine the effect 'transshipment' has on selecting the means of transportation occurring from freight transportation.

  • PDF