• Title/Summary/Keyword: data anomaly classification

Search Result 93, Processing Time 0.026 seconds

Anomaly Detection for User Action with Generative Adversarial Networks (적대적 생성 모델을 활용한 사용자 행위 이상 탐지 방법)

  • Choi, Nam woong;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.43-62
    • /
    • 2019
  • At one time, the anomaly detection sector dominated the method of determining whether there was an abnormality based on the statistics derived from specific data. This methodology was possible because the dimension of the data was simple in the past, so the classical statistical method could work effectively. However, as the characteristics of data have changed complexly in the era of big data, it has become more difficult to accurately analyze and predict the data that occurs throughout the industry in the conventional way. Therefore, SVM and Decision Tree based supervised learning algorithms were used. However, there is peculiarity that supervised learning based model can only accurately predict the test data, when the number of classes is equal to the number of normal classes and most of the data generated in the industry has unbalanced data class. Therefore, the predicted results are not always valid when supervised learning model is applied. In order to overcome these drawbacks, many studies now use the unsupervised learning-based model that is not influenced by class distribution, such as autoencoder or generative adversarial networks. In this paper, we propose a method to detect anomalies using generative adversarial networks. AnoGAN, introduced in the study of Thomas et al (2017), is a classification model that performs abnormal detection of medical images. It was composed of a Convolution Neural Net and was used in the field of detection. On the other hand, sequencing data abnormality detection using generative adversarial network is a lack of research papers compared to image data. Of course, in Li et al (2018), a study by Li et al (LSTM), a type of recurrent neural network, has proposed a model to classify the abnormities of numerical sequence data, but it has not been used for categorical sequence data, as well as feature matching method applied by salans et al.(2016). So it suggests that there are a number of studies to be tried on in the ideal classification of sequence data through a generative adversarial Network. In order to learn the sequence data, the structure of the generative adversarial networks is composed of LSTM, and the 2 stacked-LSTM of the generator is composed of 32-dim hidden unit layers and 64-dim hidden unit layers. The LSTM of the discriminator consists of 64-dim hidden unit layer were used. In the process of deriving abnormal scores from existing paper of Anomaly Detection for Sequence data, entropy values of probability of actual data are used in the process of deriving abnormal scores. but in this paper, as mentioned earlier, abnormal scores have been derived by using feature matching techniques. In addition, the process of optimizing latent variables was designed with LSTM to improve model performance. The modified form of generative adversarial model was more accurate in all experiments than the autoencoder in terms of precision and was approximately 7% higher in accuracy. In terms of Robustness, Generative adversarial networks also performed better than autoencoder. Because generative adversarial networks can learn data distribution from real categorical sequence data, Unaffected by a single normal data. But autoencoder is not. Result of Robustness test showed that he accuracy of the autocoder was 92%, the accuracy of the hostile neural network was 96%, and in terms of sensitivity, the autocoder was 40% and the hostile neural network was 51%. In this paper, experiments have also been conducted to show how much performance changes due to differences in the optimization structure of potential variables. As a result, the level of 1% was improved in terms of sensitivity. These results suggest that it presented a new perspective on optimizing latent variable that were relatively insignificant.

A hybrid intrusion detection system based on CBA and OCSVM for unknown threat detection (알려지지 않은 위협 탐지를 위한 CBA와 OCSVM 기반 하이브리드 침입 탐지 시스템)

  • Shin, Gun-Yoon;Kim, Dong-Wook;Yun, Jiyoung;Kim, Sang-Soo;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.22 no.3
    • /
    • pp.27-35
    • /
    • 2021
  • With the development of the Internet, various IT technologies such as IoT, Cloud, etc. have been developed, and various systems have been built in countries and companies. Because these systems generate and share vast amounts of data, they needed a variety of systems that could detect threats to protect the critical data contained in the system, which has been actively studied to date. Typical techniques include anomaly detection and misuse detection, and these techniques detect threats that are known or exhibit behavior different from normal. However, as IT technology advances, so do technologies that threaten systems, and these methods of detection. Advanced Persistent Threat (APT) attacks national or companies systems to steal important information and perform attacks such as system down. These threats apply previously unknown malware and attack technologies. Therefore, in this paper, we propose a hybrid intrusion detection system that combines anomaly detection and misuse detection to detect unknown threats. Two detection techniques have been applied to enable the detection of known and unknown threats, and by applying machine learning, more accurate threat detection is possible. In misuse detection, we applied Classification based on Association Rule(CBA) to generate rules for known threats, and in anomaly detection, we used One-Class SVM(OCSVM) to detect unknown threats. Experiments show that unknown threat detection accuracy is about 94%, and we confirm that unknown threats can be detected.

Network Anomaly Detection Technologies Using Unsupervised Learning AutoEncoders (비지도학습 오토 엔코더를 활용한 네트워크 이상 검출 기술)

  • Kang, Koohong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.4
    • /
    • pp.617-629
    • /
    • 2020
  • In order to overcome the limitations of the rule-based intrusion detection system due to changes in Internet computing environments, the emergence of new services, and creativity of attackers, network anomaly detection (NAD) using machine learning and deep learning technologies has received much attention. Most of these existing machine learning and deep learning technologies for NAD use supervised learning methods to learn a set of training data set labeled 'normal' and 'attack'. This paper presents the feasibility of the unsupervised learning AutoEncoder(AE) to NAD from data sets collecting of secured network traffic without labeled responses. To verify the performance of the proposed AE mode, we present the experimental results in terms of accuracy, precision, recall, f1-score, and ROC AUC value on the NSL-KDD training and test data sets. In particular, we model a reference AE through the deep analysis of diverse AEs varying hyper-parameters such as the number of layers as well as considering the regularization and denoising effects. The reference model shows the f1-scores 90.4% and 89% of binary classification on the KDDTest+ and KDDTest-21 test data sets based on the threshold of the 82-th percentile of the AE reconstruction error of the training data set.

A Study on Efficient AI Model Drift Detection Methods for MLOps (MLOps를 위한 효율적인 AI 모델 드리프트 탐지방안 연구)

  • Ye-eun Lee;Tae-jin Lee
    • Journal of Internet Computing and Services
    • /
    • v.24 no.5
    • /
    • pp.17-27
    • /
    • 2023
  • Today, as AI (Artificial Intelligence) technology develops and its practicality increases, it is widely used in various application fields in real life. At this time, the AI model is basically learned based on various statistical properties of the learning data and then distributed to the system, but unexpected changes in the data in a rapidly changing data situation cause a decrease in the model's performance. In particular, as it becomes important to find drift signals of deployed models in order to respond to new and unknown attacks that are constantly created in the security field, the need for lifecycle management of the entire model is gradually emerging. In general, it can be detected through performance changes in the model's accuracy and error rate (loss), but there are limitations in the usage environment in that an actual label for the model prediction result is required, and the detection of the point where the actual drift occurs is uncertain. there is. This is because the model's error rate is greatly influenced by various external environmental factors, model selection and parameter settings, and new input data, so it is necessary to precisely determine when actual drift in the data occurs based only on the corresponding value. There are limits to this. Therefore, this paper proposes a method to detect when actual drift occurs through an Anomaly analysis technique based on XAI (eXplainable Artificial Intelligence). As a result of testing a classification model that detects DGA (Domain Generation Algorithm), anomaly scores were extracted through the SHAP(Shapley Additive exPlanations) Value of the data after distribution, and as a result, it was confirmed that efficient drift point detection was possible.

A study on Classification of Insider threat using Markov Chain Model

  • Kim, Dong-Wook;Hong, Sung-Sam;Han, Myung-Mook
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.4
    • /
    • pp.1887-1898
    • /
    • 2018
  • In this paper, a method to classify insider threat activity is introduced. The internal threats help detecting anomalous activity in the procedure performed by the user in an organization. When an anomalous value deviating from the overall behavior is displayed, we consider it as an inside threat for classification as an inside intimidator. To solve the situation, Markov Chain Model is employed. The Markov Chain Model shows the next state value through an arbitrary variable affected by the previous event. Similarly, the current activity can also be predicted based on the previous activity for the insider threat activity. A method was studied where the change items for such state are defined by a transition probability, and classified as detection of anomaly of the inside threat through values for a probability variable. We use the properties of the Markov chains to list the behavior of the user over time and to classify which state they belong to. Sequential data sets were generated according to the influence of n occurrences of Markov attribute and classified by machine learning algorithm. In the experiment, only 15% of the Cert: insider threat dataset was applied, and the result was 97% accuracy except for NaiveBayes. As a result of our research, it was confirmed that the Markov Chain Model can classify insider threats and can be fully utilized for user behavior classification.

Feature Selection with PCA based on DNS Query for Malicious Domain Classification (비정상도메인 분류를 위한 DNS 쿼리 기반의 주성분 분석을 이용한 성분추출)

  • Lim, Sun-Hee;Cho, Jaeik;Kim, Jong-Hyun;Lee, Byung Gil
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.1 no.1
    • /
    • pp.55-60
    • /
    • 2012
  • Recent botnets are widely using the DNS services at the connection of C&C server in order to evade botnet's detection. It is necessary to study on DNS analysis in order to counteract anomaly-based technique using the DNS. This paper studies collection of DNS traffic for experimental data and supervised learning for DNS traffic-based malicious domain classification such as query of domain name corresponding to C&C server from zombies. Especially, this paper would aim to determine significant features of DNS-based classification system for malicious domain extraction by the Principal Component Analysis(PCA).

Classification of Operating State of Screw Decanter using Video-Based Optical Flow and LSTM Classifier

  • Lee, Sang-Hyeop;Wesonga, Sheilla;Park, Jang-Sik
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.25 no.2_1
    • /
    • pp.169-176
    • /
    • 2022
  • Prognostics and health management (PHM) is recently converging throughout the industry, one of the trending issue is to detect abnormal conditions at decanter centrifuge during water treatment facilities. Wastewater treatment operation produces corrosive gas which results failures on attached sensors. This scenario causes frequent sensor replacement and requires highly qualified manager's visual inspection while replacing important parts such as bearings and screws. In this paper, we propose anomaly detection by measuring the vibration of the decanter centrifuge based on the video camera images. Measuring the vibration of the screw decanter by applying the optical flow technique, the amount of movement change of the corresponding pixel is measured and fed into the LST M model. As a result, it is possible to detect the normal/warning/dangerous state based on LSTM classification. In the future work, we aim to gather more abnormal data in order to increase the further accuracy so that it can be utilized in the field of industry.

Performance Comparison of Machine Learning Algorithms for Network Traffic Security in Medical Equipment (의료기기 네트워크 트래픽 보안 관련 머신러닝 알고리즘 성능 비교)

  • Seung Hyoung Ko;Joon Ho Park;Da Woon Wang;Eun Seok Kang;Hyun Wook Han
    • Journal of Information Technology Services
    • /
    • v.22 no.5
    • /
    • pp.99-108
    • /
    • 2023
  • As the computerization of hospitals becomes more advanced, security issues regarding data generated from various medical devices within hospitals are gradually increasing. For example, because hospital data contains a variety of personal information, attempts to attack it have been continuously made. In order to safely protect data from external attacks, each hospital has formed an internal team to continuously monitor whether the computer network is safely protected. However, there are limits to how humans can monitor attacks that occur on networks within hospitals in real time. Recently, artificial intelligence models have shown excellent performance in detecting outliers. In this paper, an experiment was conducted to verify how well an artificial intelligence model classifies normal and abnormal data in network traffic data generated from medical devices. There are several models used for outlier detection, but among them, Random Forest and Tabnet were used. Tabnet is a deep learning algorithm related to receive and classify structured data. Two algorithms were trained using open traffic network data, and the classification accuracy of the model was measured using test data. As a result, the random forest algorithm showed a classification accuracy of 93%, and Tapnet showed a classification accuracy of 99%. Therefore, it is expected that most outliers that may occur in a hospital network can be detected using an excellent algorithm such as Tabnet.

Detection of Ship Movement Anomaly using AIS Data: A Study (AIS 데이터 분석을 통한 이상 거동 선박의 식별에 관한 연구)

  • Oh, Jae-Yong;Kim, Hye-Jin;Park, Se-Kil
    • Journal of Navigation and Port Research
    • /
    • v.42 no.4
    • /
    • pp.277-282
    • /
    • 2018
  • Recently, the Vessel Traffic Service (VTS) coverage has expanded to include coastal areas following the increased attention on vessel traffic safety. However, it has increased the workload on the VTS operators. In some cases, when the traffic volume increases sharply during the rush hour, the VTS operator may not be aware of the risks. Therefore, in this paper, we proposed a new method to recognize ship movement anomalies automatically to support the VTS operator's decision-making. The proposed method generated traffic pattern model without any category information using the unsupervised learning algorithm.. The anomaly score can be calculated by classification and comparison of the trained model. Finally, we reviewed the experimental results using a ship-handling simulator and the actual trajectory data to verify the feasibility of the proposed method.

Proposal of a new method for learning of diesel generator sounds and detecting abnormal sounds using an unsupervised deep learning algorithm

  • Hweon-Ki Jo;Song-Hyun Kim;Chang-Lak Kim
    • Nuclear Engineering and Technology
    • /
    • v.55 no.2
    • /
    • pp.506-515
    • /
    • 2023
  • This study is to find a method to learn engine sound after the start-up of a diesel generator installed in nuclear power plant with an unsupervised deep learning algorithm (CNN autoencoder) and a new method to predict the failure of a diesel generator using it. In order to learn the sound of a diesel generator with a deep learning algorithm, sound data recorded before and after the start-up of two diesel generators was used. The sound data of 20 min and 2 h were cut into 7 s, and the split sound was converted into a spectrogram image. 1200 and 7200 spectrogram images were created from sound data of 20 min and 2 h, respectively. Using two different deep learning algorithms (CNN autoencoder and binary classification), it was investigated whether the diesel generator post-start sounds were learned as normal. It was possible to accurately determine the post-start sounds as normal and the pre-start sounds as abnormal. It was also confirmed that the deep learning algorithm could detect the virtual abnormal sounds created by mixing the unusual sounds with the post-start sounds. This study showed that the unsupervised anomaly detection algorithm has a good accuracy increased about 3% with comparing to the binary classification algorithm.