• Title/Summary/Keyword: Anomaly data detection

Search Result 402, Processing Time 0.021 seconds

Comparison of System Call Sequence Embedding Approaches for Anomaly Detection (이상 탐지를 위한 시스템콜 시퀀스 임베딩 접근 방식 비교)

  • Lee, Keun-Seop;Park, Kyungseon;Kim, Kangseok
    • Journal of Convergence for Information Technology
    • /
    • v.12 no.2
    • /
    • pp.47-53
    • /
    • 2022
  • Recently, with the change of the intelligent security paradigm, study to apply various information generated from various information security systems to AI-based anomaly detection is increasing. Therefore, in this study, in order to convert log-like time series data into a vector, which is a numerical feature, the CBOW and Skip-gram inference methods of deep learning-based Word2Vec model and statistical method based on the coincidence frequency were used to transform the published ADFA system call data. In relation to this, an experiment was carried out through conversion into various embedding vectors considering the dimension of vector, the length of sequence, and the window size. In addition, the performance of the embedding methods used as well as the detection performance were compared and evaluated through GRU-based anomaly detection model using vectors generated by the embedding model as an input. Compared to the statistical model, it was confirmed that the Skip-gram maintains more stable performance without biasing a specific window size or sequence length, and is more effective in making each event of sequence data into an embedding vector.

Comparative Analysis of Machine Learning Techniques for IoT Anomaly Detection Using the NSL-KDD Dataset

  • Zaryn, Good;Waleed, Farag;Xin-Wen, Wu;Soundararajan, Ezekiel;Maria, Balega;Franklin, May;Alicia, Deak
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.1
    • /
    • pp.46-52
    • /
    • 2023
  • With billions of IoT (Internet of Things) devices populating various emerging applications across the world, detecting anomalies on these devices has become incredibly important. Advanced Intrusion Detection Systems (IDS) are trained to detect abnormal network traffic, and Machine Learning (ML) algorithms are used to create detection models. In this paper, the NSL-KDD dataset was adopted to comparatively study the performance and efficiency of IoT anomaly detection models. The dataset was developed for various research purposes and is especially useful for anomaly detection. This data was used with typical machine learning algorithms including eXtreme Gradient Boosting (XGBoost), Support Vector Machines (SVM), and Deep Convolutional Neural Networks (DCNN) to identify and classify any anomalies present within the IoT applications. Our research results show that the XGBoost algorithm outperformed both the SVM and DCNN algorithms achieving the highest accuracy. In our research, each algorithm was assessed based on accuracy, precision, recall, and F1 score. Furthermore, we obtained interesting results on the execution time taken for each algorithm when running the anomaly detection. Precisely, the XGBoost algorithm was 425.53% faster when compared to the SVM algorithm and 2,075.49% faster than the DCNN algorithm. According to our experimental testing, XGBoost is the most accurate and efficient method.

Design and Implementation of Machine Learning System for Fine Dust Anomaly Detection based on Big Data (빅데이터 기반 미세먼지 이상 탐지 머신러닝 시스템 설계 및 구현)

  • Jae-Won Lee;Chi-Ho Lin
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.24 no.1
    • /
    • pp.55-58
    • /
    • 2024
  • In this paper, we propose a design and implementation of big data-based fine dust anomaly detection machine learning system. The proposed is system that classifies the fine dust air quality index through meteorological information composed of fine dust and big data. This system classifies fine dust through the design of an anomaly detection algorithm according to the outliers for each air quality index classification categories based on machine learning. Depth data of the image collected from the camera collects images according to the level of fine dust, and then creates a fine dust visibility mask. And, with a learning-based fingerprinting technique through a mono depth estimation algorithm, the fine dust level is derived by inferring the visibility distance of fine dust collected from the monoscope camera. For experimentation and analysis of this method, after creating learning data by matching the fine dust level data and CCTV image data by region and time, a model is created and tested in a real environment.

Anomaly Detection Performance Analysis of Neural Networks using Soundex Algorithm and N-gram Techniques based on System Calls (시스템 호출 기반의 사운덱스 알고리즘을 이용한 신경망과 N-gram 기법에 대한 이상 탐지 성능 분석)

  • Park, Bong-Goo
    • Journal of Internet Computing and Services
    • /
    • v.6 no.5
    • /
    • pp.45-56
    • /
    • 2005
  • The weak foundation of the computing environment caused information leakage and hacking to be uncontrollable, Therefore, dynamic control of security threats and real-time reaction to identical or similar types of accidents after intrusion are considered to be important, h one of the solutions to solve the problem, studies on intrusion detection systems are actively being conducted. To improve the anomaly IDS using system calls, this study focuses on neural networks learning using the soundex algorithm which is designed to change feature selection and variable length data into a fixed length learning pattern, That Is, by changing variable length sequential system call data into a fixed iength behavior pattern using the soundex algorithm, this study conducted neural networks learning by using a backpropagation algorithm. The backpropagation neural networks technique is applied for anomaly detection of system calls using Sendmail Data of UNM to demonstrate its performance.

  • PDF

A method for concrete crack detection using U-Net based image inpainting technique

  • Kim, Su-Min;Sohn, Jung-Mo;Kim, Do-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.10
    • /
    • pp.35-42
    • /
    • 2020
  • In this study, we propose a crack detection method using limited data with a U-Net based image inpainting technique that is a modified unsupervised anomaly detection method. Concrete cracking occurs due to a variety of causes and is a factor that can cause serious damage to the structure in the long term. In general, crack investigation uses an inspector's visual inspection on the concrete surfaces, which is less objective in judgment and has a high possibility of human error. Therefore, a method with objective and accurate image analysis processing is required. In recent years, the methods using deep learning have been studied to detect cracks quickly and accurately. However, when the amount of crack data on the building or infrastructure to be inspected is small, existing crack detection models using it often show a limited performance. Therefore, in this study, an unsupervised anomaly detection method was used to augment the data on the object to be inspected, and as a result of learning using the data, we confirmed the performance of 98.78% of accuracy and 82.67% of harmonic average (F1_Score).

Developing an Intrusion Detection Framework for High-Speed Big Data Networks: A Comprehensive Approach

  • Siddique, Kamran;Akhtar, Zahid;Khan, Muhammad Ashfaq;Jung, Yong-Hwan;Kim, Yangwoo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.8
    • /
    • pp.4021-4037
    • /
    • 2018
  • In network intrusion detection research, two characteristics are generally considered vital to building efficient intrusion detection systems (IDSs): an optimal feature selection technique and robust classification schemes. However, the emergence of sophisticated network attacks and the advent of big data concepts in intrusion detection domains require two more significant aspects to be addressed: employing an appropriate big data computing framework and utilizing a contemporary dataset to deal with ongoing advancements. As such, we present a comprehensive approach to building an efficient IDS with the aim of strengthening academic anomaly detection research in real-world operational environments. The proposed system has the following four characteristics: (i) it performs optimal feature selection using information gain and branch-and-bound algorithms; (ii) it employs machine learning techniques for classification, namely, Logistic Regression, Naïve Bayes, and Random Forest; (iii) it introduces bulk synchronous parallel processing to handle the computational requirements of large-scale networks; and (iv) it utilizes a real-time contemporary dataset generated by the Information Security Centre of Excellence at the University of Brunswick (ISCX-UNB) to validate its efficacy. Experimental analysis shows the effectiveness of the proposed framework, which is able to achieve high accuracy, low computational cost, and reduced false alarms.

A Study on Anomaly Detection based on User's Command Analysis (사용자 명령어 분석을 통한 비정상 행위 판정에 관한 연구)

  • 윤정혁;오상현;이원석
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.10 no.4
    • /
    • pp.59-71
    • /
    • 2000
  • Due to the advance of computer and communication technology, intrusions or crimes using a computer have been increased rapidly while various information has been provided to users conveniently. As a results, many studies are necessary to detect the activities of intruders effectively. In this paper, a new association algorithm for the anomaly detection model is proposed in the process of generating user\`s normal patterns. It is that more recently observed behavior get more affection on the process of data mining. In addition, by clustering generated normal patterns for each use or a group of similar users, it is possible to identify the usual frequency of programs or command usage for each user or a group of uses. The performance of the proposed anomaly detection system has been tested on various system Parameters in order to identify their practical ranges for maximizing its detection rate.

Imbalanced SVM-Based Anomaly Detection Algorithm for Imbalanced Training Datasets

  • Wang, GuiPing;Yang, JianXi;Li, Ren
    • ETRI Journal
    • /
    • v.39 no.5
    • /
    • pp.621-631
    • /
    • 2017
  • Abnormal samples are usually difficult to obtain in production systems, resulting in imbalanced training sample sets. Namely, the number of positive samples is far less than the number of negative samples. Traditional Support Vector Machine (SVM)-based anomaly detection algorithms perform poorly for highly imbalanced datasets: the learned classification hyperplane skews toward the positive samples, resulting in a high false-negative rate. This article proposes a new imbalanced SVM (termed ImSVM)-based anomaly detection algorithm, which assigns a different weight for each positive support vector in the decision function. ImSVM adjusts the learned classification hyperplane to make the decision function achieve a maximum GMean measure value on the dataset. The above problem is converted into an unconstrained optimization problem to search the optimal weight vector. Experiments are carried out on both Cloud datasets and Knowledge Discovery and Data Mining datasets to evaluate ImSVM. Highly imbalanced training sample sets are constructed. The experimental results show that ImSVM outperforms over-sampling techniques and several existing imbalanced SVM-based techniques.

Anomalous Pattern Analysis of Large-Scale Logs with Spark Cluster Environment

  • Sion Min;Youyang Kim;Byungchul Tak
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.3
    • /
    • pp.127-136
    • /
    • 2024
  • This study explores the correlation between system anomalies and large-scale logs within the Spark cluster environment. While research on anomaly detection using logs is growing, there remains a limitation in adequately leveraging logs from various components of the cluster and considering the relationship between anomalies and the system. Therefore, this paper analyzes the distribution of normal and abnormal logs and explores the potential for anomaly detection based on the occurrence of log templates. By employing Hadoop and Spark, normal and abnormal log data are generated, and through t-SNE and K-means clustering, templates of abnormal logs in anomalous situations are identified to comprehend anomalies. Ultimately, unique log templates occurring only during abnormal situations are identified, thereby presenting the potential for anomaly detection.

Region and Global-Specific PatchCore based Anomaly Detection from Chest X-ray Images

  • Hyunbin Kim;Junchul Chun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.8
    • /
    • pp.2298-2315
    • /
    • 2024
  • This paper introduces a method aimed at diagnosing the presence or absence of lesions by detecting anomalies in Chest X-ray images. The proposed approach is based on the PatchCore anomaly detection method, which extracts a feature vector containing location information of an image patch from normal image data and calculates the anomaly distance from the normal vector. However, applying PatchCore directly to medical image processing presents challenges due to the possibility of diseases occurring only in specific organs and the presence of image noise unrelated to lesions. In this study, we present an image alignment method that utilizes affine transformation parameter prediction to standardize already captured X-ray images into a specific composition. Additionally, we introduce a region-specific abnormality detection method that requires affine-transformed chest X-ray images. Furthermore, we propose a method to enhance application efficiency and performance through feature map hard masking. The experimental results demonstrate that our proposed approach achieved a maximum AUROC (Area Under the Receiver Operating Characteristic) of 0.774. Compared to a previous study conducted on the same dataset, our method shows a 6.9% higher performance and improved accuracy.