• Title/Summary/Keyword: 멀티 모달

Search Result 274, Processing Time 0.026 seconds

Driver Drowsiness Detection Model using Image and PPG data Based on Multimodal Deep Learning (이미지와 PPG 데이터를 사용한 멀티모달 딥 러닝 기반의 운전자 졸음 감지 모델)

  • Choi, Hyung-Tak;Back, Moon-Ki;Kang, Jae-Sik;Yoon, Seung-Won;Lee, Kyu-Chul
    • Database Research
    • /
    • v.34 no.3
    • /
    • pp.45-57
    • /
    • 2018
  • The drowsiness that occurs in the driving is a very dangerous driver condition that can be directly linked to a major accident. In order to prevent drowsiness, there are traditional drowsiness detection methods to grasp the driver's condition, but there is a limit to the generalized driver's condition recognition that reflects the individual characteristics of drivers. In recent years, deep learning based state recognition studies have been proposed to recognize drivers' condition. Deep learning has the advantage of extracting features from a non-human machine and deriving a more generalized recognition model. In this study, we propose a more accurate state recognition model than the existing deep learning method by learning image and PPG at the same time to grasp driver's condition. This paper confirms the effect of driver's image and PPG data on drowsiness detection and experiment to see if it improves the performance of learning model when used together. We confirmed the accuracy improvement of around 3% when using image and PPG together than using image alone. In addition, the multimodal deep learning based model that classifies the driver's condition into three categories showed a classification accuracy of 96%.

Improved Transformer Model for Multimodal Fashion Recommendation Conversation System (멀티모달 패션 추천 대화 시스템을 위한 개선된 트랜스포머 모델)

  • Park, Yeong Joon;Jo, Byeong Cheol;Lee, Kyoung Uk;Kim, Kyung Sun
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.1
    • /
    • pp.138-147
    • /
    • 2022
  • Recently, chatbots have been applied in various fields and have shown good results, and many attempts to use chatbots in shopping mall product recommendation services are being conducted on e-commerce platforms. In this paper, for a conversation system that recommends a fashion that a user wants based on conversation between the user and the system and fashion image information, a transformer model that is currently performing well in various AI fields such as natural language processing, voice recognition, and image recognition. We propose a multimodal-based improved transformer model that is improved to increase the accuracy of recommendation by using dialogue (text) and fashion (image) information together for data preprocessing and data representation. We also propose a method to improve accuracy through data improvement by analyzing the data. The proposed system has a recommendation accuracy score of 0.6563 WKT (Weighted Kendall's tau), which significantly improved the existing system's 0.3372 WKT by 0.3191 WKT or more.

Multi-modal Representation Learning for Classification of Imported Goods (수입물품의 품목 분류를 위한 멀티모달 표현 학습)

  • Apgil Lee;Keunho Choi;Gunwoo Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.203-214
    • /
    • 2023
  • The Korea Customs Service is efficiently handling business with an electronic customs system that can effectively handle one-stop business. This is the case and a more effective method is needed. Import and export require HS Code (Harmonized System Code) for classification and tax rate application for all goods, and item classification that classifies the HS Code is a highly difficult task that requires specialized knowledge and experience and is an important part of customs clearance procedures. Therefore, this study uses various types of data information such as product name, product description, and product image in the item classification request form to learn and develop a deep learning model to reflect information well based on Multimodal representation learning. It is expected to reduce the burden of customs duties by classifying and recommending HS Codes and help with customs procedures by promptly classifying items.

Various Modal Interruption Research in Digital Convergence of Mobile Service (디지털 컨버전스 기기에서 모달리티와 인터럽션간의 상호관계에 대한 실험적 연구)

  • Lee, Ki-Ho;Jung, Seung-Ki;Kim, Hae-Jin;Kim, Jin-Woo
    • 한국HCI학회:학술대회논문집
    • /
    • 2006.02b
    • /
    • pp.233-239
    • /
    • 2006
  • 차세대 디지털 방송 기술인 DMB(Digital Multimedia Broadcasting) 지상파서비스가 세계최초로 우리나라에서 시작되었다. 현재 DMB 서비스는 디지털 기기의 다양한 기능들과 더불어 '디지털 컨버전스'를 주도하고 있고, 새 기술을 통한 서비스는 디지털 기기에 대한 사용자의 경험을 한층 풍부하게 해주고 있다. 또한 이러한 제품들은 다양한 기능에 대해서 멀티태스킹을 지원하기 때문에, 사용성, 조작방법 등 여러 측면에서 과거의 제품들과 상당히 많은 차이를 보여준다. 본 논문은 다양한 기능이 통합되고, 멀티태스킹이 되는 제품을 설계하는데 있어 감각양식(Modality) 측면에서 사용자들에게 더 나은 경험을 제공할 수 있도록 하는 방법을 제안한다. 지금까지 모달리티와 인터럽션(Interruption)에 대한 연구는 다중 자원 이론(Multiple resource theory)을 바탕으로 연속적인 과업(Task)를 수행함에 있어서 과업의 감각양식이 충돌할 경우, 사용자에게 인지적인 부담을 준다는 측면에서 이루어져 왔다. 그러나, 본 논문에서는 태스크를 수행함에 있어 멀티태스킹의 지원 여부에 따라서 과업의 감각양식이 다양한 순서를 가지고 사용자에게 인터럽션을 일으킬 때, 사용자의 태스크 수행 능력이나 사용자 만족도에 어떻게 영향을 주는지 알아보고자 한다.

  • PDF

A Study on Method for User Gender Prediction Using Multi-Modal Smart Device Log Data (스마트 기기의 멀티 모달 로그 데이터를 이용한 사용자 성별 예측 기법 연구)

  • Kim, Yoonjung;Choi, Yerim;Kim, Solee;Park, Kyuyon;Park, Jonghun
    • The Journal of Society for e-Business Studies
    • /
    • v.21 no.1
    • /
    • pp.147-163
    • /
    • 2016
  • Gender information of a smart device user is essential to provide personalized services, and multi-modal data obtained from the device is useful for predicting the gender of the user. However, the method for utilizing each of the multi-modal data for gender prediction differs according to the characteristics of the data. Therefore, in this study, an ensemble method for predicting the gender of a smart device user by using three classifiers that have text, application, and acceleration data as inputs, respectively, is proposed. To alleviate privacy issues that occur when text data generated in a smart device are sent outside, a classification method which scans smart device text data only on the device and classifies the gender of the user by matching text data with predefined sets of word. An application based classifier assigns gender labels to executed applications and predicts gender of the user by comparing the label ratio. Acceleration data is used with Support Vector Machine to classify user gender. The proposed method was evaluated by using the actual smart device log data collected from an Android application. The experimental results showed that the proposed method outperformed the compared methods.

Multimodal Emotional State Estimation Model for Implementation of Intelligent Exhibition Services (지능형 전시 서비스 구현을 위한 멀티모달 감정 상태 추정 모형)

  • Lee, Kichun;Choi, So Yun;Kim, Jae Kyeong;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.1-14
    • /
    • 2014
  • Both researchers and practitioners are showing an increased interested in interactive exhibition services. Interactive exhibition services are designed to directly respond to visitor responses in real time, so as to fully engage visitors' interest and enhance their satisfaction. In order to install an effective interactive exhibition service, it is essential to adopt intelligent technologies that enable accurate estimation of a visitor's emotional state from responses to exhibited stimulus. Studies undertaken so far have attempted to estimate the human emotional state, most of them doing so by gauging either facial expressions or audio responses. However, the most recent research suggests that, a multimodal approach that uses people's multiple responses simultaneously may lead to better estimation. Given this context, we propose a new multimodal emotional state estimation model that uses various responses including facial expressions, gestures, and movements measured by the Microsoft Kinect Sensor. In order to effectively handle a large amount of sensory data, we propose to use stratified sampling-based MRA (multiple regression analysis) as our estimation method. To validate the usefulness of the proposed model, we collected 602,599 responses and emotional state data with 274 variables from 15 people. When we applied our model to the data set, we found that our model estimated the levels of valence and arousal in the 10~15% error range. Since our proposed model is simple and stable, we expect that it will be applied not only in intelligent exhibition services, but also in other areas such as e-learning and personalized advertising.

Multi-Object Goal Visual Navigation Based on Multimodal Context Fusion (멀티모달 맥락정보 융합에 기초한 다중 물체 목표 시각적 탐색 이동)

  • Jeong Hyun Choi;In Cheol Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.9
    • /
    • pp.407-418
    • /
    • 2023
  • The Multi-Object Goal Visual Navigation(MultiOn) is a visual navigation task in which an agent must visit to multiple object goals in an unknown indoor environment in a given order. Existing models for the MultiOn task suffer from the limitation that they cannot utilize an integrated view of multimodal context because use only a unimodal context map. To overcome this limitation, in this paper, we propose a novel deep neural network-based agent model for MultiOn task. The proposed model, MCFMO, uses a multimodal context map, containing visual appearance features, semantic features of environmental objects, and goal object features. Moreover, the proposed model effectively fuses these three heterogeneous features into a global multimodal context map by using a point-wise convolutional neural network module. Lastly, the proposed model adopts an auxiliary task learning module to predict the observation status, goal direction and the goal distance, which can guide to learn the navigational policy efficiently. Conducting various quantitative and qualitative experiments using the Habitat-Matterport3D simulation environment and scene dataset, we demonstrate the superiority of the proposed model.

A Study of Anomaly Detection for ICT Infrastructure using Conditional Multimodal Autoencoder (ICT 인프라 이상탐지를 위한 조건부 멀티모달 오토인코더에 관한 연구)

  • Shin, Byungjin;Lee, Jonghoon;Han, Sangjin;Park, Choong-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.57-73
    • /
    • 2021
  • Maintenance and prevention of failure through anomaly detection of ICT infrastructure is becoming important. System monitoring data is multidimensional time series data. When we deal with multidimensional time series data, we have difficulty in considering both characteristics of multidimensional data and characteristics of time series data. When dealing with multidimensional data, correlation between variables should be considered. Existing methods such as probability and linear base, distance base, etc. are degraded due to limitations called the curse of dimensions. In addition, time series data is preprocessed by applying sliding window technique and time series decomposition for self-correlation analysis. These techniques are the cause of increasing the dimension of data, so it is necessary to supplement them. The anomaly detection field is an old research field, and statistical methods and regression analysis were used in the early days. Currently, there are active studies to apply machine learning and artificial neural network technology to this field. Statistically based methods are difficult to apply when data is non-homogeneous, and do not detect local outliers well. The regression analysis method compares the predictive value and the actual value after learning the regression formula based on the parametric statistics and it detects abnormality. Anomaly detection using regression analysis has the disadvantage that the performance is lowered when the model is not solid and the noise or outliers of the data are included. There is a restriction that learning data with noise or outliers should be used. The autoencoder using artificial neural networks is learned to output as similar as possible to input data. It has many advantages compared to existing probability and linear model, cluster analysis, and map learning. It can be applied to data that does not satisfy probability distribution or linear assumption. In addition, it is possible to learn non-mapping without label data for teaching. However, there is a limitation of local outlier identification of multidimensional data in anomaly detection, and there is a problem that the dimension of data is greatly increased due to the characteristics of time series data. In this study, we propose a CMAE (Conditional Multimodal Autoencoder) that enhances the performance of anomaly detection by considering local outliers and time series characteristics. First, we applied Multimodal Autoencoder (MAE) to improve the limitations of local outlier identification of multidimensional data. Multimodals are commonly used to learn different types of inputs, such as voice and image. The different modal shares the bottleneck effect of Autoencoder and it learns correlation. In addition, CAE (Conditional Autoencoder) was used to learn the characteristics of time series data effectively without increasing the dimension of data. In general, conditional input mainly uses category variables, but in this study, time was used as a condition to learn periodicity. The CMAE model proposed in this paper was verified by comparing with the Unimodal Autoencoder (UAE) and Multi-modal Autoencoder (MAE). The restoration performance of Autoencoder for 41 variables was confirmed in the proposed model and the comparison model. The restoration performance is different by variables, and the restoration is normally well operated because the loss value is small for Memory, Disk, and Network modals in all three Autoencoder models. The process modal did not show a significant difference in all three models, and the CPU modal showed excellent performance in CMAE. ROC curve was prepared for the evaluation of anomaly detection performance in the proposed model and the comparison model, and AUC, accuracy, precision, recall, and F1-score were compared. In all indicators, the performance was shown in the order of CMAE, MAE, and AE. Especially, the reproduction rate was 0.9828 for CMAE, which can be confirmed to detect almost most of the abnormalities. The accuracy of the model was also improved and 87.12%, and the F1-score was 0.8883, which is considered to be suitable for anomaly detection. In practical aspect, the proposed model has an additional advantage in addition to performance improvement. The use of techniques such as time series decomposition and sliding windows has the disadvantage of managing unnecessary procedures; and their dimensional increase can cause a decrease in the computational speed in inference.The proposed model has characteristics that are easy to apply to practical tasks such as inference speed and model management.

HANbit ACE 교환기 운용관리 시스템의 사용자인터페이스 설계 및 구현

  • 이재흠;김해숙
    • Korea Information Processing Society Review
    • /
    • v.5 no.1
    • /
    • pp.75-85
    • /
    • 1998
  • 교환시스템을 운용하기 위한 사용자 인터페이스는 일반 사용자 그룹이 아닌 교환시스템이 설치되어 있는 교환국의 운용자들을 대상으로 설계된다. ATM교환시스템은 초고속 정보통신망에서의 교환노드로서의 높은 신뢰도를 요구하기 때문에 효율적인 운용관리 및 유지보수가 필요하다 이를 만족시키기 위해서는 그래픽, 소프트웨어공학, 심리학 및 인간공학적인 측면을 반영시킴으로써 매우 어렵고 복잡한 교환시스템을 쉽게 제어하고 모니터링할 수 있는 사용자 인터페이스를필요로 한다 본 운용시스템은 클라이언트-서버모델 구조로 설계하였고 웹브라우져를 통한 다양한 사용자 인터페이스(멀티모달)를 제공하며 멀티플랫폼을 가능하게 구현하였다 본 논문에서는 ATM교환기의 운용시스템이 갖는 특징 구조 그리고 구현된 사용자 인터페이스 모델을 제시한다.

  • PDF

A Soccer Video Analysis Using Product Hierarchical Hidden Markov Model (PHHMM(Product Hierarchical Hidden Markov Model)을 이용한 축구 비디오 분석)

  • Kim, Moo-Sung;Kang, Hang-Bong
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.681-682
    • /
    • 2006
  • 일반적으로 축구 비디오 데이터는 멀티모달과 멀티레이어 속성을 지닌다. 이러한 데이터를 다루기 적합한 모델은 동적 베이지안 네트워크(Dynamic Bayesian Network: DBN) 형태의 위계적 은닉 마르코프 모델(Hierarchical Hidden Markov Model: HHMM)이다. 이러한 HHMM 중 다중속성의 특징들이 서로 상호작용하는 PHHMM(Product Hierarchical Hidden Markov Model)이 있다. 본 논문에서는 PHHMM 을 축구 경기의 Play/Break 이벤트 검색 및 분석에 적용하였고 바람직한 결과를 얻었다.

  • PDF