• Title/Summary/Keyword: multi-modal learning

Search Result 48, Processing Time 0.024 seconds

Efficient Multimodal Background Modeling and Motion Defection (효과적인 다봉 배경 모델링 및 물체 검출)

  • Park, Dae-Yong;Byun, Hae-Ran
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.6
    • /
    • pp.459-463
    • /
    • 2009
  • Background modeling and motion detection is the one of the most significant real time video processing technique. Until now, many researches are conducted into the topic but it still needs much time for robustness. It is more important when other algorithms are used together such as object tracking, classification or behavior understanding. In this paper, we propose efficient multi-modal background modeling methods which can be understood as simplified learning method of Gaussian mixture model. We present its validity using numerical methods and experimentally show detecting performance.

Anomaly Detection Methodology Based on Multimodal Deep Learning (멀티모달 딥 러닝 기반 이상 상황 탐지 방법론)

  • Lee, DongHoon;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.101-125
    • /
    • 2022
  • Recently, with the development of computing technology and the improvement of the cloud environment, deep learning technology has developed, and attempts to apply deep learning to various fields are increasing. A typical example is anomaly detection, which is a technique for identifying values or patterns that deviate from normal data. Among the representative types of anomaly detection, it is very difficult to detect a contextual anomaly that requires understanding of the overall situation. In general, detection of anomalies in image data is performed using a pre-trained model trained on large data. However, since this pre-trained model was created by focusing on object classification of images, there is a limit to be applied to anomaly detection that needs to understand complex situations created by various objects. Therefore, in this study, we newly propose a two-step pre-trained model for detecting abnormal situation. Our methodology performs additional learning from image captioning to understand not only mere objects but also the complicated situation created by them. Specifically, the proposed methodology transfers knowledge of the pre-trained model that has learned object classification with ImageNet data to the image captioning model, and uses the caption that describes the situation represented by the image. Afterwards, the weight obtained by learning the situational characteristics through images and captions is extracted and fine-tuning is performed to generate an anomaly detection model. To evaluate the performance of the proposed methodology, an anomaly detection experiment was performed on 400 situational images and the experimental results showed that the proposed methodology was superior in terms of anomaly detection accuracy and F1-score compared to the existing traditional pre-trained model.

Analytical Study of Static and Dynamic Responses of Multi-story Brick Pagoda of Silleuksa Temple (신륵사 다층전탑의 구조해석에 대한 연구)

  • Lee, Ga-Yoon;Lee, Sung-Min;Lee, Kihak
    • Journal of Korean Association for Spatial Structures
    • /
    • v.22 no.3
    • /
    • pp.33-40
    • /
    • 2022
  • Recently, cultural heritages in South Korea gain many interests of restoration and preservation from the government since many of that have been severely damaged during earthquakes. Many previous studies in both terms of experimental and analytical approaches have been done to examine structural behavior and decide appropriate methods of preservation. Being motivated by such researches, this research aims to investigate a religious stone pagoda dated back to the Goryeo Dynasty in Korea. The structure consists of a granite stone foundation and baked bricks, which resembles the shape of traditional pagodas. In order to examine the structural behavior of the pagoda, an analytical model is implemented using ANSYS, a comprehensive engineering simulation platform. For the time history analysis of the pagoda, several earthquake excitations are chosen and input to simulation modeling. Seismic response of the tower such as time domain, natural frequency, modal shapes and peak acceleration measured at each layer are presented and discussed. In addition, the amplification ratio of the tower is calculated from the accelerations of each layer to determine tower stability in accordance with Korean seismic design guide. The determination and evaluation of status and response of the brick tower by simulation analysis play an important role in the preservation of history as well as valuable architectural heritages in South Korea.

A Study of Anomaly Detection for ICT Infrastructure using Conditional Multimodal Autoencoder (ICT 인프라 이상탐지를 위한 조건부 멀티모달 오토인코더에 관한 연구)

  • Shin, Byungjin;Lee, Jonghoon;Han, Sangjin;Park, Choong-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.57-73
    • /
    • 2021
  • Maintenance and prevention of failure through anomaly detection of ICT infrastructure is becoming important. System monitoring data is multidimensional time series data. When we deal with multidimensional time series data, we have difficulty in considering both characteristics of multidimensional data and characteristics of time series data. When dealing with multidimensional data, correlation between variables should be considered. Existing methods such as probability and linear base, distance base, etc. are degraded due to limitations called the curse of dimensions. In addition, time series data is preprocessed by applying sliding window technique and time series decomposition for self-correlation analysis. These techniques are the cause of increasing the dimension of data, so it is necessary to supplement them. The anomaly detection field is an old research field, and statistical methods and regression analysis were used in the early days. Currently, there are active studies to apply machine learning and artificial neural network technology to this field. Statistically based methods are difficult to apply when data is non-homogeneous, and do not detect local outliers well. The regression analysis method compares the predictive value and the actual value after learning the regression formula based on the parametric statistics and it detects abnormality. Anomaly detection using regression analysis has the disadvantage that the performance is lowered when the model is not solid and the noise or outliers of the data are included. There is a restriction that learning data with noise or outliers should be used. The autoencoder using artificial neural networks is learned to output as similar as possible to input data. It has many advantages compared to existing probability and linear model, cluster analysis, and map learning. It can be applied to data that does not satisfy probability distribution or linear assumption. In addition, it is possible to learn non-mapping without label data for teaching. However, there is a limitation of local outlier identification of multidimensional data in anomaly detection, and there is a problem that the dimension of data is greatly increased due to the characteristics of time series data. In this study, we propose a CMAE (Conditional Multimodal Autoencoder) that enhances the performance of anomaly detection by considering local outliers and time series characteristics. First, we applied Multimodal Autoencoder (MAE) to improve the limitations of local outlier identification of multidimensional data. Multimodals are commonly used to learn different types of inputs, such as voice and image. The different modal shares the bottleneck effect of Autoencoder and it learns correlation. In addition, CAE (Conditional Autoencoder) was used to learn the characteristics of time series data effectively without increasing the dimension of data. In general, conditional input mainly uses category variables, but in this study, time was used as a condition to learn periodicity. The CMAE model proposed in this paper was verified by comparing with the Unimodal Autoencoder (UAE) and Multi-modal Autoencoder (MAE). The restoration performance of Autoencoder for 41 variables was confirmed in the proposed model and the comparison model. The restoration performance is different by variables, and the restoration is normally well operated because the loss value is small for Memory, Disk, and Network modals in all three Autoencoder models. The process modal did not show a significant difference in all three models, and the CPU modal showed excellent performance in CMAE. ROC curve was prepared for the evaluation of anomaly detection performance in the proposed model and the comparison model, and AUC, accuracy, precision, recall, and F1-score were compared. In all indicators, the performance was shown in the order of CMAE, MAE, and AE. Especially, the reproduction rate was 0.9828 for CMAE, which can be confirmed to detect almost most of the abnormalities. The accuracy of the model was also improved and 87.12%, and the F1-score was 0.8883, which is considered to be suitable for anomaly detection. In practical aspect, the proposed model has an additional advantage in addition to performance improvement. The use of techniques such as time series decomposition and sliding windows has the disadvantage of managing unnecessary procedures; and their dimensional increase can cause a decrease in the computational speed in inference.The proposed model has characteristics that are easy to apply to practical tasks such as inference speed and model management.

A study on the Pattern Recognition of the EMG signals using Neural Network and Probabilistic modal for the two dimensional Motions described by External Coordinate (신경회로망과 확률모델을 이용한 2차원운동의 외부좌표에 대한 EMG신호의 패턴인식에 관한 연구)

  • Jang, Young-Gun;Kwon, Jang-Woo;Hong, Seung-Hong
    • Proceedings of the KOSOMBE Conference
    • /
    • v.1991 no.05
    • /
    • pp.65-70
    • /
    • 1991
  • A hybrid model which uses a probabilistic model and a MLP(multi layer perceptron) model for pattern recognition of EMG(electromyogram) signals is proposed in this paper. MLP model has problems which do not guarantee global minima of error due to learning method and have different approximation grade to bayesian probabilities due to different amounts and quality of training data, the number of hidden layers and hidden nodes, etc. Especially in the case of new test data which exclude design samples, the latter problem produces quite different results. The error probability of probabilistic model is closely related to the estimation error of the parameters used in the model and fidelity of assumtion. Generally, it is impossible to introduce the bayesian classifier to the probabilistic model of EMG signals because of unknown priori probabilities and is estimated by MLE(maximum likelihood estimate). In this paper we propose the method which get the MAP(maximum a posteriori probability) in the probabilistic model by estimating the priori probability distribution which minimize the error probability using the MLP. This method minimize the error probability of the probabilistic model as long as the realization of the MLP is optimal and approximate the minimum of error probability of each class of both models selectively. Alocating the reference coordinate of EMG signal to the outside of the body make it easy to suit to the applications which it is difficult to define and seperate using internal body coordinate. Simulation results show the benefit of the proposed model compared to use the MLP and the probabilistic model seperately.

  • PDF

Lip and Voice Synchronization Using Visual Attention (시각적 어텐션을 활용한 입술과 목소리의 동기화 연구)

  • Dongryun Yoon;Hyeonjoong Cho
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.4
    • /
    • pp.166-173
    • /
    • 2024
  • This study explores lip-sync detection, focusing on the synchronization between lip movements and voices in videos. Typically, lip-sync detection techniques involve cropping the facial area of a given video, utilizing the lower half of the cropped box as input for the visual encoder to extract visual features. To enhance the emphasis on the articulatory region of lips for more accurate lip-sync detection, we propose utilizing a pre-trained visual attention-based encoder. The Visual Transformer Pooling (VTP) module is employed as the visual encoder, originally designed for the lip-reading task, predicting the script based solely on visual information without audio. Our experimental results demonstrate that, despite having fewer learning parameters, our proposed method outperforms the latest model, VocaList, on the LRS2 dataset, achieving a lip-sync detection accuracy of 94.5% based on five context frames. Moreover, our approach exhibits an approximately 8% superiority over VocaList in lip-sync detection accuracy, even on an untrained dataset, Acappella.

LH-FAS v2: Head Pose Estimation-Based Lightweight Face Anti-Spoofing (LH-FAS v2: 머리 자세 추정 기반 경량 얼굴 위조 방지 기술)

  • Hyeon-Beom Heo;Hye-Ri Yang;Sung-Uk Jung;Kyung-Jae Lee
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.1
    • /
    • pp.309-316
    • /
    • 2024
  • Facial recognition technology is widely used in various fields but faces challenges due to its vulnerability to fraudulent activities such as photo spoofing. Extensive research has been conducted to overcome this challenge. Most of them, however, require the use of specialized equipment like multi-modal cameras or operation in high-performance environments. In this paper, we introduce LH-FAS v2 (: Lightweight Head-pose-based Face Anti-Spoofing v2), a system designed to operate on a commercial webcam without any specialized equipment, to address the issue of facial recognition spoofing. LH-FAS v2 utilizes FSA-Net for head pose estimation and ArcFace for facial recognition, effectively assessing changes in head pose and verifying facial identity. We developed the VD4PS dataset, incorporating photo spoofing scenarios to evaluate the model's performance. The experimental results show the model's balanced accuracy and speed, indicating that head pose estimation-based facial anti-spoofing technology can be effectively used to counteract photo spoofing.

Social Network Analysis of TV Drama via Location Knowledge-learned Deep Hypernetworks (장소 정보를 학습한 딥하이퍼넷 기반 TV드라마 소셜 네트워크 분석)

  • Nan, Chang-Jun;Kim, Kyung-Min;Zhang, Byoung-Tak
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.11
    • /
    • pp.619-624
    • /
    • 2016
  • Social-aware video displays not only the relationships between characters but also diverse information on topics such as economics, politics and culture as a story unfolds. Particularly, the speaking habits and behavioral patterns of people in different situations are very important for the analysis of social relationships. However, when dealing with this dynamic multi-modal data, it is difficult for a computer to analyze the drama data effectively. To solve this problem, previous studies employed the deep concept hierarchy (DCH) model to automatically construct and analyze social networks in a TV drama. Nevertheless, since location knowledge was not included, they can only analyze the social network as a whole in stories. In this research, we include location knowledge and analyze the social relations in different locations. We adopt data from approximately 4400 minutes of a TV drama Friends as our dataset. We process face recognition on the characters by using a convolutional- recursive neural networks model and utilize a bag of features model to classify scenes. Then, in different scenes, we establish the social network between the characters by using a deep concept hierarchy model and analyze the change in the social network while the stories unfold.