• Title/Summary/Keyword: LOF(Local Outlier Factor)

Search Result 6, Processing Time 0.018 seconds

Density-based Outlier Detection for Very Large Data (대용량 자료 분석을 위한 밀도기반 이상치 탐지)

  • Kim, Seung;Cho, Nam-Wook;Kang, Suk-Ho
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.35 no.2
    • /
    • pp.71-88
    • /
    • 2010
  • A density-based outlier detection such as an LOF (Local Outlier Factor) tries to find an outlying observation by using density of its surrounding space. In spite of several advantages of a density-based outlier detection method, the computational complexity of outlier detection has been one of major barriers in its application. In this paper, we present an LOF algorithm that can reduce computation time of a density based outlier detection algorithm. A kd-tree indexing and approximated k-nearest neighbor search algorithm (ANN) are adopted in the proposed method. A set of experiments was conducted to examine performance of the proposed algorithm. The results show that the proposed method can effectively detect local outliers in reduced computation time.

The Use of Local Outlier Factor(LOF) for Improving Performance of Independent Component Analysis(ICA) based Statistical Process Control(SPC) (LOF를 이용한 ICA 기반 통계적 공정관리의 성능 개선 방법론)

  • Lee, Jae-Shin;Kang, Bok-Young;Kang, Suk-Ho
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.36 no.1
    • /
    • pp.39-55
    • /
    • 2011
  • Process monitoring has been emphasized for the monitoring of complex system such as chemical processing industries to achieve the efficiency enhancement, quality management, safety improvement. Recently, ICA (Independent Component Analysis) based MSPC (Multivariate Statistical Process Control) was widely used in process monitoring approaches. Moreover, DICA (Dynamic ICA) has been introduced to consider the system dynamics. However, the existing approaches show the limitation that their performances are strongly dependent on the statistical distributions of control variables. To improve the limitation, we propose a novel approach for process monitoring by integrating DICA and LOF (Local Outlier Factor). In this paper, we aim to improve the fault detection rate with the proposed method. LOF detects local outliers by using density of surrounding space so that its performance is regardless of data distribution. Therefore, the proposed method not only can consider the system dynamics but can also assure robust performance regardless of the statistical distributions of control variables. Comparison experiments were conducted on the widely used benchmark dataset, Tennessee Eastman process (TE process), and showed the improved performance than existing approaches.

A study on improvement of Support Vector Machine with Incremental Local Outlier Factor (Incremental Local Outlier Factor를 이용한 Support Vector Machine의 성능 개선에 관한 연구)

  • Kim, Min-Kyu;Son, Su-Il;Yoo, Suk-In
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2011.06c
    • /
    • pp.354-357
    • /
    • 2011
  • Support Vector Machine (SVM)은 주어진 데이터 중에서 각 클래스를 잘 표현하는 Support Vector (SV)를 계산함으로써 새로운 데이터를 분류하는 알고리즘이다. SVM은 전체 데이터 분포를 고려하지 않기 때문에 잘못된 데이터에 의해 분류가 잘못될 가능성이 적다. 하지만, SV가 잘못되었을 경우에는 정확도가 감소하게 되는 문제점이 있다. 본 논문에서는 SV가 잘못 주어진 데이터일 가능성을 고려, 아웃라이어 검출 알고리즘인 Local Outlier Factor (LOF) 알고리즘을 이용해 주어진 데이터 중 잘못된 데이터를 제거함으로써 SVM의 분류 정확도를 높이는 알고리즘을 제안하였다. 추가적으로, Incremental LOF를 이용해 새로운 데이터 중 판단하기 어려운 데이터를 제거함으로써 SVM의 정확도를 보다 향상시켰다. 제안된 방법은 두 개의 클래스를 가진 데이터에 대해서 실험하였다.

Plagiarism Detection among Source Codes using Adaptive Methods

  • Lee, Yun-Jung;Lim, Jin-Su;Ji, Jeong-Hoon;Cho, Hwaun-Gue;Woo, Gyun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.6
    • /
    • pp.1627-1648
    • /
    • 2012
  • We propose an adaptive method for detecting plagiarized pairs from a large set of source code. This method is adaptive in that it uses an adaptive algorithm and it provides an adaptive threshold for determining plagiarism. Conventional algorithms are based on greedy string tiling or on local alignments of two code strings. However, most of them are not adaptive; they do not consider the characteristics of the program set, thereby causing a problem for a program set in which all the programs are inherently similar. We propose adaptive local alignment-a variant of local alignment that uses an adaptive similarity matrix. Each entry of this matrix is the logarithm of the probabilities of the keywords based on their frequency in a given program set. We also propose an adaptive threshold based on the local outlier factor (LOF), which represents the likelihood of an entity being an outlier. Experimental results indicate that our method is more sensitive than JPlag, which uses greedy string tiling for detecting plagiarism-suspected code pairs. Further, the adaptive threshold based on the LOF is shown to be effective, and the detection performance shows high sensitivity with negligible loss of specificity, compared with that using a fixed threshold.

Extended KNN Imputation Based LOF Prediction Algorithm for Real-time Business Process Monitoring Method (실시간 비즈니스 프로세스 모니터링 방법론을 위한 확장 KNN 대체 기반 LOF 예측 알고리즘)

  • Kang, Bok-Young;Kim, Dong-Soo;Kang, Suk-Ho
    • The Journal of Society for e-Business Studies
    • /
    • v.15 no.4
    • /
    • pp.303-317
    • /
    • 2010
  • In this paper, we propose a novel approach to fault prediction for real-time business process monitoring method using extended KNN imputation based LOF prediction. Existing rule-based approaches to process monitoring has some limitations like late alarm for fault occurrence or no indicators about real-time progress, since there exist unobserved attributes according to the monitoring phase during process executions. To improve these limitations, we propose an algorithm for LOF prediction by adopting the imputation method to assume unobserved attributes. LOF of ongoing instance is calculated by assuming next probable progresses after the monitoring phase, which is conducted during entire monitoring phases so that we can predict the abnormal termination of the ongoing instance. By visualizing the real-time progress in terms of the probability on abnormal termination, we can provide more proactive operations to opportunities or risks during the real-time monitoring.

An Effective Anomaly Detection Approach based on Hybrid Unsupervised Learning Technologies in NIDS

  • Kangseok Kim
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.2
    • /
    • pp.494-510
    • /
    • 2024
  • Internet users are exposed to sophisticated cyberattacks that intrusion detection systems have difficulty detecting. Therefore, research is increasing on intrusion detection methods that use artificial intelligence technology for detecting novel cyberattacks. Unsupervised learning-based methods are being researched that learn only from normal data and detect abnormal behaviors by finding patterns. This study developed an anomaly-detection method based on unsupervised machines and deep learning for a network intrusion detection system (NIDS). We present a hybrid anomaly detection approach based on unsupervised learning techniques using the autoencoder (AE), Isolation Forest (IF), and Local Outlier Factor (LOF) algorithms. An oversampling approach that increased the detection rate was also examined. A hybrid approach that combined deep learning algorithms and traditional machine learning algorithms was highly effective in setting the thresholds for anomalies without subjective human judgment. It achieved precision and recall rates respectively of 88.2% and 92.8% when combining two AEs, IF, and LOF while using an oversampling approach to learn more unknown normal data improved the detection accuracy. This approach achieved precision and recall rates respectively of 88.2% and 94.6%, further improving the detection accuracy compared with the hybrid method. Therefore, in NIDS the proposed approach provides high reliability for detecting cyberattacks.