• Title/Summary/Keyword: outlier detection method

Search Result 128, Processing Time 0.028 seconds

Improvement of Network Intrusion Detection Rate by Using LBG Algorithm Based Data Mining (LBG 알고리즘 기반 데이터마이닝을 이용한 네트워크 침입 탐지율 향상)

  • Park, Seong-Chul;Kim, Jun-Tae
    • Journal of Intelligence and Information Systems
    • /
    • v.15 no.4
    • /
    • pp.23-36
    • /
    • 2009
  • Network intrusion detection have been continuously improved by using data mining techniques. There are two kinds of methods in intrusion detection using data mining-supervised learning with class label and unsupervised learning without class label. In this paper we have studied the way of improving network intrusion detection accuracy by using LBG clustering algorithm which is one of unsupervised learning methods. The K-means method, that starts with random initial centroids and performs clustering based on the Euclidean distance, is vulnerable to noisy data and outliers. The nonuniform binary split algorithm uses binary decomposition without assigning initial values, and it is relatively fast. In this paper we applied the EM(Expectation Maximization) based LBG algorithm that incorporates the strength of two algorithms to intrusion detection. The experimental results using the KDD cup dataset showed that the accuracy of detection can be improved by using the LBG algorithm.

  • PDF

Modified Multivariate $T^2$-Chart based on Robust Estimation (로버스트 추정에 근거한 수정된 다변량 $T^2$- 관리도)

  • 성웅현;박동련
    • Journal of Korean Society for Quality Management
    • /
    • v.29 no.1
    • /
    • pp.1-10
    • /
    • 2001
  • We consider the problem of detecting special variations in multivariate $T^2$-control chart when two or more multivariate outliers are present. Since a multivariate outlier may reflect slippage in mean, variance, or correlation, it can distort the sample mean vector and sample covariance matrix. Damaged sample mean vector and sample covariance matrix have difficulty in examining special variations clearly, An alternative to detection outliers or special variations is to use robust estimators of mean vector and covariance matrix that are less sensitive to extreme observations than are the standard estimators $\bar{x}$ and $\textbf{S}$. We applied popular minimum volume ellipsoid(MVE) and minimum covariance determinant(MCD) method to estimate mean vector and covariance matrix and compared its results with standard $T^2$-control chart using simulated multivariate data with outliers. We found that the modified $T^2$-control chart based on the above robust methods were more effective in detecting special variations clearly than the standard $T^2$-control chart.

  • PDF

Background Subtraction for Moving Cameras based on trajectory-controlled segmentation and Label Inference

  • Yin, Xiaoqing;Wang, Bin;Li, Weili;Liu, Yu;Zhang, Maojun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.10
    • /
    • pp.4092-4107
    • /
    • 2015
  • We propose a background subtraction method for moving cameras based on trajectory classification, image segmentation and label inference. In the trajectory classification process, PCA-based outlier detection strategy is used to remove the outliers in the foreground trajectories. Combining optical flow trajectory with watershed algorithm, we propose a trajectory-controlled watershed segmentation algorithm which effectively improves the edge-preserving performance and prevents the over-smooth problem. Finally, label inference based on Markov Random field is conducted for labeling the unlabeled pixels. Experimental results on the motionseg database demonstrate the promising performance of the proposed approach compared with other competing methods.

GNSS/Multiple IMUs Based Navigation Strategy Using the Mahalanobis Distance in Partially GNSS-denied Environments (GNSS 부분 음영 지역에서 마할라노비스 거리를 이용한 GNSS/다중 IMU 센서 기반 측위 알고리즘)

  • Kim, Jiyeon;Song, Moogeun;Kim, Jaehoon;Lee, Dongik
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.4
    • /
    • pp.239-247
    • /
    • 2022
  • The existing studies on the localization in the GNSS (Global Navigation Satellite System) denied environment usually exploit low-cost MEMS IMU (Micro Electro Mechanical Systems Inertial Measurement Unit) sensors to replace the GNSS signals. However, the navigation system still requires GNSS signals for the normal environment. This paper presents an integrated GNSS/INS (Inertial Navigation System) navigation system which combines GNSS and multiple IMU sensors using extended Kalman filter in partially GNSS-denied environments. The position and velocity of the INS and GNSS are used as the inputs to the integrated navigation system. The Mahalanobis distance is used for novelty detection to detect the outlier of GNSS measurements. When the abnormality is detected in GNSS signals, GNSS data is excluded from the fusion process. The performance of the proposed method is evaluated using MATLAB/Simulink. The simulation results show that the proposed algorithm can achieve a higher degree of positioning accuracy in the partially GNSS-denied environment.

Development of Healthcare Data Quality Control Algorithm Using Interactive Decision Tree: Focusing on Hypertension in Diabetes Mellitus Patients (대화식 의사결정나무를 이용한 보건의료 데이터 질 관리 알고리즘 개발: 당뇨환자의 고혈압 동반을 중심으로)

  • Hwang, Kyu-Yeon;Lee, Eun-Sook;Kim, Go-Won;Hong, Seong-Ok;Park, Jung-Sun;Kwak, Mi-Sook;Lee, Ye-Jin;Lim, Chae-Hyeok;Park, Tae-Hyun;Park, Jong-Ho;Kang, Sung-Hong
    • The Korean Journal of Health Service Management
    • /
    • v.10 no.3
    • /
    • pp.63-74
    • /
    • 2016
  • Objectives : There is a need to develop a data quality management algorithm to improve the quality of healthcare data using a data quality management system. In this study, we developed a data quality control algorithms associated with diseases related to hypertension in patients with diabetes mellitus. Methods : To make a data quality algorithm, we extracted the 2011 and 2012 discharge damage survey data from diabetes mellitus patients. Derived variables were created using the primary diagnosis, diagnostic unit, primary surgery and treatment, minor surgery and treatment items. Results : Significant factors in diabetes mellitus patients with hypertension were sex, age, ischemic heart disease, and diagnostic ultrasound of the heart. Depending on the decision tree results, we found four groups with extreme values for diabetes accompanying hypertension patients. Conclusions : There is a need to check the actual data contained in the Outlier (extreme value) groups to improve the quality of the data.

A Parameter-Free Approach for Clustering and Outlier Detection in Image Databases (이미지 데이터베이스에서 매개변수를 필요로 하지 않는 클러스터링 및 아웃라이어 검출 방법)

  • Oh, Hyun-Kyo;Yoon, Seok-Ho;Kim, Sang-Wook
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.1
    • /
    • pp.80-91
    • /
    • 2010
  • As the volume of image data increases dramatically, its good organization of image data is crucial for efficient image retrieval. Clustering is a typical way of organizing image data. However, traditional clustering methods have a difficulty of requiring a user to provide the number of clusters as a parameter before clustering. In this paper, we discuss an approach for clustering image data that does not require the parameter. Basically, the proposed approach is based on Cross-Association that finds a structure or patterns hidden in data using the relationship between individual objects. In order to apply Cross-Association to clustering of image data, we convert the image data into a graph first. Then, we perform Cross-Association on the graph thus obtained and interpret the results in the clustering perspective. We also propose the method of hierarchical clustering and the method of outlier detection based on Cross-Association. By performing a series of experiments, we verify the effectiveness of the proposed approach. Finally, we discuss the finding of a good value of k used in k-nearest neighbor search and also compare the clustering results with symmetric and asymmetric ways used in building a graph.

Effect of Genetic Correlations on the P Values from Randomization Test and Detection of Significant Gene Groups (유전자 연관성이 랜덤검정 P값과 유의 유전자군의 탐색에 미치는 영향)

  • Yi, Mi-Sung;Song, Hae-Hiang
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.4
    • /
    • pp.781-792
    • /
    • 2009
  • At an early stage of genomic investigations, a small sample of microarrays is used in gene expression experiments to identify small subsets of candidate genes for a further accurate investigation. Unlike the statistical analysis methods for a large sample of microarrays, an appropriate statistical method for identifying small subsets is a randomization test that provides exact P values. These exact P values from a randomization test for a small sample of microarrays are discrete. The possible existence of differentially expressed genes in the sample of a full set of genes can be tested for the null hypothesis of a uniform distribution. Subsets of smaller P values are of prime interest for a further accurate investigation and identifying these outlier cells from a multinomial distribution of P values is possible by M test of Fuchs et al. (1980). Above all, the genome-wide gene expressions in microarrays are correlated, but the majority of statistical analysis methods in the microarray analysis are based on an independence assumption of genes and ignore the possibly correlated expression levels. We investigated with simulation studies the effect that correlated gene expression levels could have on the randomization test results and M test results, and found that the effects are often not ignorable.

Simulation and Performance Assessment of a Geiger-mode Imaging LADAR System (가이거모드 영상 LADAR 시스템의 시뮬레이션과 성능예측)

  • Kim, Seongjoon;Lee, Impyeong;Lee, Youngcheol
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.15 no.5
    • /
    • pp.687-698
    • /
    • 2012
  • LADAR systems can rapidly acquire 3D point clouds by sampling the target surfaces using laser pulses. Such point clouds are widely used for diverse applications such as DSM/DTM generation, forest biomass estimation, target detection, wire avoidance and so on. Many kinds of LADAR systems have been developed with their respective purposes and applications. Particularly, Geiger mode imaging LADAR systems are increasingly utilized since they are energy efficient thank to extremely sensitive detectors incorporated into the systems. The purpose of this research is the performance assessment of a Geiger mode imaging LADAR system based on simulation with the real system parameters. We thus developed a simulation method of such a LADAR system by modeling its geometric, radiometric, optic and electronic aspects. Based on the simulation, we performed the performance assessment of a newly designed system to derive the outlier ratio and false alarm rate expected during its operation in almost real environment with reasonable system parameters. The proposed simulation and performance assessment method will be effectively utilized for system design and optimization, and test data generation.

User Authentication Based on Keystroke Dynamics of Free Text and One-Class Classifiers (자유로운 문자열의 키스트로크 다이나믹스와 일범주 분류기를 활용한 사용자 인증)

  • Seo, Dongmin;Kang, Pilsung
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.42 no.4
    • /
    • pp.280-289
    • /
    • 2016
  • User authentication is an important issue on computer network systems. Most of the current computer network systems use the ID-password string match as the primary user authentication method. However, in password-based authentication, whoever acquires the password of a valid user can access the system without any restrictions. In this paper, we present a keystroke dynamics-based user authentication to resolve limitations of the password-based authentication. Since most previous studies employed a fixed-length text as an input data, we aims at enhancing the authentication performance by combining four different variable creation methods from a variable-length free text as an input data. As authentication algorithms, four one-class classifiers are employed. We verify the proposed approach through an experiment based on actual keystroke data collected from 100 participants who provided more than 17,000 keystrokes for both Korean and English. The experimental results show that our proposed method significantly improve the authentication performance compared to the existing approaches.

Conjugate Point Extraction for High-Resolution Stereo Satellite Images Orientation

  • Oh, Jae Hong;Lee, Chang No
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.37 no.2
    • /
    • pp.55-62
    • /
    • 2019
  • The stereo geometry establishment based on the precise sensor modeling is prerequisite for accurate stereo data processing. Ground control points are generally required for the accurate sensor modeling though it is not possible over the area where the accessibility is limited or reference data is not available. For the areas, the relative orientation should be carried out to improve the geometric consistency between the stereo data though it does not improve the absolute positional accuracy. The relative orientation requires conjugate points that are well distributed over the entire image region. Therefore the automatic conjugate point extraction is required because the manual operation is labor-intensive. In this study, we applied the method consisting of the key point extraction, the search space minimization based on the epipolar line, and the rigorous outlier detection based on the RPCs (Rational Polynomial Coefficients) bias compensation modeling. We tested different parameters of window sizes for Kompsat-2 across track stereo data and analyzed the RPCs precision after the bias compensation for the cases whether the epipolar line information is used or not. The experimental results showed that matching outliers were inevitable for the different matching parameterization but they were successfully detected and removed with the rigorous method for sub-pixel level of stereo RPCs precision.