• Title/Summary/Keyword: unsupervised feature learning

Search Result 78, Processing Time 0.031 seconds

The Optimal Column Grouping Technique for the Compensation of Column Shortening (기둥축소량 보정을 위한 기둥의 최적그루핑기법)

  • Kim, Yeong-Min
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.24 no.2
    • /
    • pp.141-148
    • /
    • 2011
  • This study presents the optimal grouping technique of columns which groups together columns of similar shortening trends to improve the efficiency of column shortening compensation. Here, Kohonen's self-organizing feature map which can classify patterns of input data by itself with unsupervised learning was used as the optimal grouping algorithm. The Kohonen network applied in this study is composed of two input neurons and variable output neurons, here the number of output neuron is equal to the column groups to be classified. In input neurons the normalized mean and standard deviation of shortening of each columns are inputted and in the output neurons the classified column groups are presented. The applicability of the proposed algorithm was evaluated by applying it to the two buildings where column shortening analyses had already been performed. The proposed algorithm was able to classify columns with similar shortening trends as one group, and from this we were able to ascertain the field-applicability of the proposed algorithm as the optimal grouping of column shortening.

Anomaly Detection In Real Power Plant Vibration Data by MSCRED Base Model Improved By Subset Sampling Validation (Subset 샘플링 검증 기법을 활용한 MSCRED 모델 기반 발전소 진동 데이터의 이상 진단)

  • Hong, Su-Woong;Kwon, Jang-Woo
    • Journal of Convergence for Information Technology
    • /
    • v.12 no.1
    • /
    • pp.31-38
    • /
    • 2022
  • This paper applies an expert independent unsupervised neural network learning-based multivariate time series data analysis model, MSCRED(Multi-Scale Convolutional Recurrent Encoder-Decoder), and to overcome the limitation, because the MCRED is based on Auto-encoder model, that train data must not to be contaminated, by using learning data sampling technique, called Subset Sampling Validation. By using the vibration data of power plant equipment that has been labeled, the classification performance of MSCRED is evaluated with the Anomaly Score in many cases, 1) the abnormal data is mixed with the training data 2) when the abnormal data is removed from the training data in case 1. Through this, this paper presents an expert-independent anomaly diagnosis framework that is strong against error data, and presents a concise and accurate solution in various fields of multivariate time series data.

A Study on Automatic Classification Technique of Malware Packing Type (악성코드 패킹유형 자동분류 기술 연구)

  • Kim, Su-jeong;Ha, Ji-hee;Lee, Tae-jin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.28 no.5
    • /
    • pp.1119-1127
    • /
    • 2018
  • Most of the cyber attacks are caused by malicious codes. The damage caused by cyber attacks are gradually expanded to IoT and CPS, which is not limited to cyberspace but a serious threat to real life. Accordingly, various malicious code analysis techniques have been appeared. Dynamic analysis have been widely used to easily identify the resulting malicious behavior, but are struggling with an increase in Anti-VM malware that is not working in VM environment detection. On the other hand, static analysis has difficulties in analysis due to various packing techniques. In this paper, we proposed malware classification techniques regardless of known packers or unknown packers through the proposed model. To do this, we designed a model of supervised learning and unsupervised learning for the features that can be used in the PE structure, and conducted the results verification through 98,000 samples. It is expected that accurate analysis will be possible through customized analysis technology for each class.

An Efficient Multidimensional Scaling Method based on CUDA and Divide-and-Conquer (CUDA 및 분할-정복 기반의 효율적인 다차원 척도법)

  • Park, Sung-In;Hwang, Kyu-Baek
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.4
    • /
    • pp.427-431
    • /
    • 2010
  • Multidimensional scaling (MDS) is a widely used method for dimensionality reduction, of which purpose is to represent high-dimensional data in a low-dimensional space while preserving distances among objects as much as possible. MDS has mainly been applied to data visualization and feature selection. Among various MDS methods, the classical MDS is not readily applicable to data which has large numbers of objects, on normal desktop computers due to its computational complexity. More precisely, it needs to solve eigenpair problems on dissimilarity matrices based on Euclidean distance. Thus, running time and required memory of the classical MDS highly increase as n (the number of objects) grows up, restricting its use in large-scale domains. In this paper, we propose an efficient approximation algorithm for the classical MDS based on divide-and-conquer and CUDA. Through a set of experiments, we show that our approach is highly efficient and effective for analysis and visualization of data consisting of several thousands of objects.

Feature-based Gene Classification and Region Clustering using Gene Expression Grid Data in Mouse Hippocampal Region (쥐 해마의 유전자 발현 그리드 데이터를 이용한 특징기반 유전자 분류 및 영역 군집화)

  • Kang, Mi-Sun;Kim, HyeRyun;Lee, Sukchan;Kim, Myoung-Hee
    • Journal of KIISE
    • /
    • v.43 no.1
    • /
    • pp.54-60
    • /
    • 2016
  • Brain gene expression information is closely related to the structural and functional characteristics of the brain. Thus, extensive research has been carried out on the relationship between gene expression patterns and the brain's structural organization. In this study, Principal Component Analysis was used to extract features of gene expression patterns, and genes were automatically classified by spatial distribution. Voxels were then clustered with classified specific region expressed genes. Finally, we visualized the clustering results for mouse hippocampal region gene expression with the Allen Brain Atlas. This experiment allowed us to classify the region-specific gene expression of the mouse hippocampal region and provided visualization of clustering results and a brain atlas in an integrated manner. This study has the potential to allow neuroscientists to search for experimental groups of genes more quickly and design an effective test according to the new form of data. It is also expected that it will enable the discovery of a more specific sub-region beyond the current known anatomical regions of the brain.

Classification of hysteretic loop feature for runoff generation through a unsupervised machine learning algorithm (비지도 기계학습을 통한 유출 발생 내 이력 현상 구분)

  • Lee, Eunhyung;Jeon, Hangtak;Kim, Dahong;Friday, Bassey Bassey;Kim, Sanghyun
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2022.05a
    • /
    • pp.360-360
    • /
    • 2022
  • 토양수분과 유출 간 관계를 정량화하는 것은 수문 기작 및 유출 발생 과정의 이해를 위한 중요한 정보를 제공한다. 특히, 유출과정의 특성화는 수문 사상에 따른 불포화대 내 토양수 및 토사 손실 제어와 산사태 및 비점오염원 발생 예측을 위해 필수적이다. 유출과정과 관련된 비선형성과 복잡성을 확인하기 위해 토양수분과 유출 사이의 이력 거동이 조사되었다. 특히, 수문 과정 내 이력 현상 구체화를 위해 정성적인 시각적 분류 및 정량적 평가를 위한 이력 지수들이 개발되었다. 정성적인 시각적 분류는 시간에 따라 시계 및 반시계방향으로 다중 루프 형상을 나누는 방식으로 진행되었고, 정량적 평가의 경우 이력 고리(Hysteretic loop) 내 상승 고리(Rising limb)와 하강 고리(Falling limb)의 차이를 기준으로 한 지수로 이력 현상을 특성화하였다. 이전에 제안된 방법론들은 연구자의 판단이 들어가기 때문에 보편적이지 않고 이력 현상을 개발된 지수에 맞춤에 따라 자료 손실이 나타나는 한계가 존재한다. 자료의 손실 없이 불포화대 내 발생 가능한 대표 이력 현상을 자동으로 추출하기 위해 적합한 비지도 학습기반 기계학습 방법론의 제안이 필요하다. 우리 연구에서는 국내 산지 사면에서 강우 사상 동안 다중 깊이(10, 30, 60cm)로 56개의 토양수분 측정지점에서 확보된 토양수분 시계열 자료와 산지 사면 내 위어를 통해 확보된 유출 시계열 자료를 사용하였다. 먼저, 기존에 분류 방법을 기반으로 계절 및 공간특성에 따라 지배적으로 발생하는 토양수분-유출 간 이력 현상을 특성화하였다. 다음으로, 토양수분-유출 간 이력 패턴을 자료 손실 없이 형상화하여 자동으로 데이터베이스화하는 알고리즘을 개발하였다. 마지막으로, 비지도 학습방법을 이용하여 데이터베이스화된 실제 발현 이력 현상 내 확률분포를 최대한 가깝게 추정하는 은닉층을 반복적인 재구성 학습을 통해 구현함으로써 대표 이력 현상 패턴을 추출하였다.

  • PDF

Anomaly Detection for User Action with Generative Adversarial Networks (적대적 생성 모델을 활용한 사용자 행위 이상 탐지 방법)

  • Choi, Nam woong;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.43-62
    • /
    • 2019
  • At one time, the anomaly detection sector dominated the method of determining whether there was an abnormality based on the statistics derived from specific data. This methodology was possible because the dimension of the data was simple in the past, so the classical statistical method could work effectively. However, as the characteristics of data have changed complexly in the era of big data, it has become more difficult to accurately analyze and predict the data that occurs throughout the industry in the conventional way. Therefore, SVM and Decision Tree based supervised learning algorithms were used. However, there is peculiarity that supervised learning based model can only accurately predict the test data, when the number of classes is equal to the number of normal classes and most of the data generated in the industry has unbalanced data class. Therefore, the predicted results are not always valid when supervised learning model is applied. In order to overcome these drawbacks, many studies now use the unsupervised learning-based model that is not influenced by class distribution, such as autoencoder or generative adversarial networks. In this paper, we propose a method to detect anomalies using generative adversarial networks. AnoGAN, introduced in the study of Thomas et al (2017), is a classification model that performs abnormal detection of medical images. It was composed of a Convolution Neural Net and was used in the field of detection. On the other hand, sequencing data abnormality detection using generative adversarial network is a lack of research papers compared to image data. Of course, in Li et al (2018), a study by Li et al (LSTM), a type of recurrent neural network, has proposed a model to classify the abnormities of numerical sequence data, but it has not been used for categorical sequence data, as well as feature matching method applied by salans et al.(2016). So it suggests that there are a number of studies to be tried on in the ideal classification of sequence data through a generative adversarial Network. In order to learn the sequence data, the structure of the generative adversarial networks is composed of LSTM, and the 2 stacked-LSTM of the generator is composed of 32-dim hidden unit layers and 64-dim hidden unit layers. The LSTM of the discriminator consists of 64-dim hidden unit layer were used. In the process of deriving abnormal scores from existing paper of Anomaly Detection for Sequence data, entropy values of probability of actual data are used in the process of deriving abnormal scores. but in this paper, as mentioned earlier, abnormal scores have been derived by using feature matching techniques. In addition, the process of optimizing latent variables was designed with LSTM to improve model performance. The modified form of generative adversarial model was more accurate in all experiments than the autoencoder in terms of precision and was approximately 7% higher in accuracy. In terms of Robustness, Generative adversarial networks also performed better than autoencoder. Because generative adversarial networks can learn data distribution from real categorical sequence data, Unaffected by a single normal data. But autoencoder is not. Result of Robustness test showed that he accuracy of the autocoder was 92%, the accuracy of the hostile neural network was 96%, and in terms of sensitivity, the autocoder was 40% and the hostile neural network was 51%. In this paper, experiments have also been conducted to show how much performance changes due to differences in the optimization structure of potential variables. As a result, the level of 1% was improved in terms of sensitivity. These results suggest that it presented a new perspective on optimizing latent variable that were relatively insignificant.

Estimation of Inundation Area by Linking of Rainfall-Duration-Flooding Quantity Relationship Curve with Self-Organizing Map (강우량-지속시간-침수량 관계곡선과 자기조직화 지도의 연계를 통한 범람범위 추정)

  • Kim, Hyun Il;Keum, Ho Jun;Han, Kun Yeun
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.38 no.6
    • /
    • pp.839-850
    • /
    • 2018
  • The flood damage in urban areas due to torrential rain is increasing with urbanization. For this reason, accurate and rapid flooding forecasting and expected inundation maps are needed. Predicting the extent of flooding for certain rainfalls is a very important issue in preparing flood in advance. Recently, government agencies are trying to provide expected inundation maps to the public. However, there is a lack of quantifying the extent of inundation caused by a particular rainfall scenario and the real-time prediction method for flood extent within a short time. Therefore the real-time prediction of flood extent is needed based on rainfall-runoff-inundation analysis. One/two dimensional model are continued to analyize drainage network, manhole overflow and inundation propagation by rainfall condition. By applying the various rainfall scenarios considering rainfall duration/distribution and return periods, the inundation volume and depth can be estimated and stored on a database. The Rainfall-Duration-Flooding Quantity (RDF) relationship curve based on the hydraulic analysis results and the Self-Organizing Map (SOM) that conducts unsupervised learning are applied to predict flooded area with particular rainfall condition. The validity of the proposed methodology was examined by comparing the results of the expected flood map with the 2-dimensional hydraulic model. Based on the result of the study, it is judged that this methodology will be useful to provide an unknown flood map according to medium-sized rainfall or frequency scenario. Furthermore, it will be used as a fundamental data for flood forecast by establishing the RDF curve which the relationship of rainfall-outflow-flood is considered and the database of expected inundation maps.