• Title/Summary/Keyword: Outlier Detection Method

Search Result 126, Processing Time 0.031 seconds

Adaptive boosting in ensembles for outlier detection: Base learner selection and fusion via local domain competence

  • Bii, Joash Kiprotich;Rimiru, Richard;Mwangi, Ronald Waweru
    • ETRI Journal
    • /
    • v.42 no.6
    • /
    • pp.886-898
    • /
    • 2020
  • Unusual data patterns or outliers can be generated because of human errors, incorrect measurements, or malicious activities. Detecting outliers is a difficult task that requires complex ensembles. An ideal outlier detection ensemble should consider the strengths of individual base detectors while carefully combining their outputs to create a strong overall ensemble and achieve unbiased accuracy with minimal variance. Selecting and combining the outputs of dissimilar base learners is a challenging task. This paper proposes a model that utilizes heterogeneous base learners. It adaptively boosts the outcomes of preceding learners in the first phase by assigning weights and identifying high-performing learners based on their local domains, and then carefully fuses their outcomes in the second phase to improve overall accuracy. Experimental results from 10 benchmark datasets are used to train and test the proposed model. To investigate its accuracy in terms of separating outliers from inliers, the proposed model is tested and evaluated using accuracy metrics. The analyzed data are presented as crosstabs and percentages, followed by a descriptive method for synthesis and interpretation.

The Use of Local Outlier Factor(LOF) for Improving Performance of Independent Component Analysis(ICA) based Statistical Process Control(SPC) (LOF를 이용한 ICA 기반 통계적 공정관리의 성능 개선 방법론)

  • Lee, Jae-Shin;Kang, Bok-Young;Kang, Suk-Ho
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.36 no.1
    • /
    • pp.39-55
    • /
    • 2011
  • Process monitoring has been emphasized for the monitoring of complex system such as chemical processing industries to achieve the efficiency enhancement, quality management, safety improvement. Recently, ICA (Independent Component Analysis) based MSPC (Multivariate Statistical Process Control) was widely used in process monitoring approaches. Moreover, DICA (Dynamic ICA) has been introduced to consider the system dynamics. However, the existing approaches show the limitation that their performances are strongly dependent on the statistical distributions of control variables. To improve the limitation, we propose a novel approach for process monitoring by integrating DICA and LOF (Local Outlier Factor). In this paper, we aim to improve the fault detection rate with the proposed method. LOF detects local outliers by using density of surrounding space so that its performance is regardless of data distribution. Therefore, the proposed method not only can consider the system dynamics but can also assure robust performance regardless of the statistical distributions of control variables. Comparison experiments were conducted on the widely used benchmark dataset, Tennessee Eastman process (TE process), and showed the improved performance than existing approaches.

Outlier Detection from LiDAR Data based on the Relative Density (상대적 밀도를 이용한 LiDAR 데이터의 Outlier 검출)

  • 문지영;이임평;김성준;김경옥
    • Proceedings of the Korean Society of Surveying, Geodesy, Photogrammetry, and Cartography Conference
    • /
    • 2004.11a
    • /
    • pp.507-512
    • /
    • 2004
  • LiDAR data often include outliers, the points being signficantly separated from other points and so seeming not to be measured from physical surfaces. Outliers should be removed before processing further the data for applications. Many methods have been developed for other data rather than LiDAR data as a part of data mining processes but their straightforward application to LiDAR data did not provide satisfactory results. In this study, we have thus modified one of such methods by considering the properties of LiDAR data and developed a method based on the relative point density. The proposed method have been applied to simulated and real data. The results confirms its promising performance with respect to the processing time and the detection accuracy

  • PDF

Fault Detection Method for Multivariate Process using Mahalanobis Distance and ICA (마할라노비스 거리와 독립성분분석을 이용한 다변량 공정 고장탐지 방법에 관한 연구)

  • Jung, Seunghwan;Kim, Sungshin
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.14 no.1
    • /
    • pp.22-28
    • /
    • 2021
  • Multivariate processes, such as chemical and mechanical process, power plants are operated in a state where several facilities are complexly connected, the fault of a particular system can also have fatal consequences for the entire process. In addition, since process data is measured in an unstable environment, outlier is likely to be include in the data. Therefore, monitoring technology is essential, which can remove outlier from measured data and detect failures in advance. In this paper, data obtained from dynamic and multivariate process models was used to detect fault in various type of processes. The dynamic process is a simulation of a process with autoregressive property, and the multivariate process is a model that describes a situation when a specific sensor fault. Mahalanobis distance was used to remove outlier contained in the data generated by dynamic process model and multivariate process model, and fault detection was performed using ICA. For comparison, we compared performance with and a conventional single ICA method. The proposed fault detection method improves performance by 0.84%p for bias data and 6.82%p for drift data in the dynamic process. In the case of the multivariate process, the performance was improves by 3.78%p, therefore, the proposed method showed better fault detection performance.

Post-Processing for Reducing Corner Outliers (Corner outlier 제거를 위한 후처리 기법)

  • 홍윤표;전병우
    • Proceedings of the IEEK Conference
    • /
    • 2003.11a
    • /
    • pp.11-14
    • /
    • 2003
  • In block-based lossy video compression, severe quantization causes discontinuities along block boundaries so that annoying blocking artifacts are visible in decoded video imases. These blocking artifacts significantly decrease the subjective image quality. In order to reduce the blocking artifacts in decoded images, many algorithms have been proposed However studies on so called, corner outliers, have been very limited. Corner outliers make image edges look disconnected from those of neighboring blocks at cross block boundary. In order to solve this problem, we propose a corner outlier detection and compensation algorithm as post-processing in spatial domain The experiment results show that the proposed method provides much improved subjective image quality.

  • PDF

Robust Location Estimation based on TDOA and FDOA using Outlier Detection Algorithm (이상치 검출 알고리즘을 이용한 TDOA와 FDOA 기반 이동 신호원 위치 추정 기법)

  • Yoo, Hogeun;Lee, Jaehoon
    • Journal of Convergence for Information Technology
    • /
    • v.10 no.9
    • /
    • pp.15-21
    • /
    • 2020
  • This paper presents the outlier detection algorithm in the estimation method of a source location and velocity based on two-step weighted least-squares method using time difference of arrival(TDOA) and frequency difference of arrival(FDOA) data. Since the accuracy of the estimated location and velocity of a moving source can be reduced by the outliers of TDOA and FDOA data, it is important to detect and remove the outliers. In this paper, the method to find the minimum inlier data and the method to determine whether TDOA and FDOA data are included in inliers or outliers are presented. The results of numerical simulations show that the accuracy of the estimated location and velocity is improved by removing the outliers of TDOA and FDOA data.

A New Forest Fire Detection Algorithm using Outlier Detection Method on Regression Analysis between Surface temperature and NDVI

  • Huh, Yong;Byun, Young-Gi;Son, Jeong-Hoon;Yu, Ki-Yun;Kim, Yong-Il
    • Proceedings of the KSRS Conference
    • /
    • v.2
    • /
    • pp.574-577
    • /
    • 2006
  • In this paper, we developed a forest fire detection algorithm which uses a regression function between NDVI and land surface temperature. Previous detection algorithms use the land surface temperature as a main factor to discriminate fire pixels from non-fire pixels. These algorithms assume that the surface temperatures of non-fire pixels are intrinsically analogous and obey Gaussian normal distribution, regardless of land surface types and conditions. And the temperature thresholds for detecting fire pixels are derived from the statistical distribution of non-fire pixels’ temperature using heuristic methods. This assumption makes the temperature distribution of non-fire pixels very diverse and sometimes slightly overlapped with that of fire pixel. So, sometimes there occur omission errors in the cases of small fires. To ease such problem somewhat, we separated non-fire pixels into each land cover type by clustering algorithm and calculated the residuals between the temperature of a pixel under examination whether fire pixel or not and estimated temperature of the pixel using the linear regression between surface temperature and NDVI. As a result, this algorithm could modify the temperature threshold considering land types and conditions and showed improved detection accuracy.

  • PDF

Efficient Outlier Detection of the Water Temperature Monitoring Data (수온 관측 자료의 효율적인 이상 자료 탐지)

  • Cho, Hongyeon;Jeong, Shin Taek;Ko, Dong Hui;Son, Kyeong-Pyo
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.26 no.5
    • /
    • pp.285-291
    • /
    • 2014
  • The statistical information of the coastal water temperature monitoring data can be biased because of outliers and missing intervals. Though a number of outlier detection methods have been developed, their applications are very limited to the in-situ monitoring data because of the assumptions of the a prior information of the outliers and no-missing condition, and the excessive computational time for some methods. In this study, the practical robust method is developed that can be efficiently and effectively detect the outliers in case of the big-data. This model is composed of these two parts, one part is the construction part of the approximate components of the monitoring data using the robust smoothing and data re-sampling method, and the other part is the main iterative outlier detection part using the detailed components of the data estimated by the approximate components. This model is tested using the two-years 5-minute interval water temperature data in Lake Saemangeum. It can be estimated that the outlier proportion of the data is about 1.6-3.7%. It shows that most of the outliers in the data are detected and removed with satisfaction by the model. In order to effectively detect and remove the outliers, the outlier detection using the long-span smoothing should be applied earlier than that using the short-span smoothing.

Outlier Detection Method for Mobile Banking with User Input Pattern and E-finance Transaction Pattern (사용자 입력 패턴 및 전자 금융 거래 패턴을 이용한 모바일 뱅킹 이상치 탐지 방법)

  • Min, Hee Yeon;Park, Jin Hyung;Lee, Dong Hoon;Kim, In Seok
    • Journal of Internet Computing and Services
    • /
    • v.15 no.1
    • /
    • pp.157-170
    • /
    • 2014
  • As the increase of transaction using mobile banking continues, threat to the mobile financial security is also increasing. Mobile banking service performs the financial transaction using the dedicate application which is made by financial corporation. It provides the same services as the internet banking service. Personal information such as credit card number, which is stored in the mobile banking application can be used to the additional attack caused by a malicious attack or the loss of the mobile devices. Therefore, in this paper, to cope with the mobile financial accident caused by personal information exposure, we suggest outlier detection method which can judge whether the transaction is conducted by the appropriate user or not. This detection method utilizes the user's input patterns and transaction patterns when a user uses the banking service on the mobile devices. User's input and transaction pattern data involves the information which can be used to discern a certain user. Thus, if these data are utilized appropriately, they can be the information to distinguish abnormal transaction from the transaction done by the appropriate user. In this paper, we collect the data of user's input patterns on a smart phone for the experiment. And we use the experiment data which domestic financial corporation uses to detect outlier as the data of transaction pattern. We verify that our proposal can detect the abnormal transaction efficiently, as a result of detection experiment based on the collected input and transaction pattern data.

Developing data quality management algorithm for Hypertension Patients accompanied with Diabetes Mellitus By Data Mining (데이터 마이닝을 이용한 고혈압환자의 당뇨질환 동반에 관한 데이터 질 관리 알고리즘 개발)

  • Hwang, Kyu-Yeon;Lee, Eun-Sook;Kim, Go-Won;Hong, Sung-Ok;Park, Jong-Son;Kwak, Mi-Sook;Lee, Ye-Jin;Im, Chae-Hyuk;Park, Tae-Hyun;Park, Jong-Ho;Kang, Sung-Hong
    • Journal of Digital Convergence
    • /
    • v.14 no.7
    • /
    • pp.309-319
    • /
    • 2016
  • There is a need to develop a data quality management algorithm in order to improve the quality of health care data. In this study, we developed a data quality control algorithms associated diseases related to diabetes in patients with hypertension. To make a data quality algorithm, we extracted hypertension patients from 2011 and 2012 discharge damage survey data. As the result of developing Data quality management algorithm, significant factors in hypertension patients with diabetes are gender, age, Glomerular disorders in diabetes mellitus, Diabetic retinopathy, Diabetic polyneuropathy, Closed [percutaneous] [needle] biopsy of kidney. Depending on the decision tree results, we defined Outlier which was probability values associated with a patient having diabetes corporal with hypertension or more than 80%, or not more than 20%, and found six groups with extreme values for diabetes accompanying hypertension patients. Thus there is a need to check the actual data contained in the Outlier(extreme value) groups to improve the quality of the data.