• Title/Summary/Keyword: Outliers detection

Search Result 178, Processing Time 0.025 seconds

Outlier Detection of Autoregressive Models Using Robust Regression Estimators (로버스트 추정법을 이용한 자기상관회귀모형에서의 특이치 검출)

  • Lee Dong-Hee;Park You-Sung;Kim Kee-Whan
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.2
    • /
    • pp.305-317
    • /
    • 2006
  • Outliers adversely affect model identification, parameter estimation, and forecast in time series data. In particular, when outliers consist of a patch of additive outliers, the current outlier detection procedures suffer from the masking and swamping effects which make them inefficient. In this paper, we propose new outlier detection procedure based on high breakdown estimators, called as the dual robust filtering. Empirical and simulation studies in the autoregressive model with orders p show that the proposed procedure is effective.

Adaptive boosting in ensembles for outlier detection: Base learner selection and fusion via local domain competence

  • Bii, Joash Kiprotich;Rimiru, Richard;Mwangi, Ronald Waweru
    • ETRI Journal
    • /
    • v.42 no.6
    • /
    • pp.886-898
    • /
    • 2020
  • Unusual data patterns or outliers can be generated because of human errors, incorrect measurements, or malicious activities. Detecting outliers is a difficult task that requires complex ensembles. An ideal outlier detection ensemble should consider the strengths of individual base detectors while carefully combining their outputs to create a strong overall ensemble and achieve unbiased accuracy with minimal variance. Selecting and combining the outputs of dissimilar base learners is a challenging task. This paper proposes a model that utilizes heterogeneous base learners. It adaptively boosts the outcomes of preceding learners in the first phase by assigning weights and identifying high-performing learners based on their local domains, and then carefully fuses their outcomes in the second phase to improve overall accuracy. Experimental results from 10 benchmark datasets are used to train and test the proposed model. To investigate its accuracy in terms of separating outliers from inliers, the proposed model is tested and evaluated using accuracy metrics. The analyzed data are presented as crosstabs and percentages, followed by a descriptive method for synthesis and interpretation.

Robust Generalized Labeled Multi-Bernoulli Filter and Smoother for Multiple Target Tracking using Variational Bayesian

  • Li, Peng;Wang, Wenhui;Qiu, Junda;You, Congzhe;Shu, Zhenqiu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.3
    • /
    • pp.908-928
    • /
    • 2022
  • Multiple target tracking mainly focuses on tracking unknown number of targets in the complex environment of clutter and missed detection. The generalized labeled multi-Bernoulli (GLMB) filter has been shown to be an effective approach and attracted extensive attention. However, in the scenarios where the clutter rate is high or measurement-outliers often occur, the performance of the GLMB filter will significantly decline due to the Gaussian-based likelihood function is sensitive to clutter. To solve this problem, this paper presents a robust GLMB filter and smoother to improve the tracking performance in the scenarios with high clutter rate, low detection probability, and measurement-outliers. Firstly, a Student-T distribution variational Bayesian (TDVB) filtering technology is employed to update targets' states. Then, The likelihood weight in the tracking process is deduced again. Finally, a trajectory smoothing method is proposed to improve the integrative tracking performance. The proposed method are compared with recent multiple target tracking filters, and the simulation results show that the proposed method can effectively improve tracking accuracy in the scenarios with high clutter rate, low detection rate and measurement-outliers. Code is published on GitHub.

Efficient Outlier Detection of the Water Temperature Monitoring Data (수온 관측 자료의 효율적인 이상 자료 탐지)

  • Cho, Hongyeon;Jeong, Shin Taek;Ko, Dong Hui;Son, Kyeong-Pyo
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.26 no.5
    • /
    • pp.285-291
    • /
    • 2014
  • The statistical information of the coastal water temperature monitoring data can be biased because of outliers and missing intervals. Though a number of outlier detection methods have been developed, their applications are very limited to the in-situ monitoring data because of the assumptions of the a prior information of the outliers and no-missing condition, and the excessive computational time for some methods. In this study, the practical robust method is developed that can be efficiently and effectively detect the outliers in case of the big-data. This model is composed of these two parts, one part is the construction part of the approximate components of the monitoring data using the robust smoothing and data re-sampling method, and the other part is the main iterative outlier detection part using the detailed components of the data estimated by the approximate components. This model is tested using the two-years 5-minute interval water temperature data in Lake Saemangeum. It can be estimated that the outlier proportion of the data is about 1.6-3.7%. It shows that most of the outliers in the data are detected and removed with satisfaction by the model. In order to effectively detect and remove the outliers, the outlier detection using the long-span smoothing should be applied earlier than that using the short-span smoothing.

A study of a new statistic for detection of outliers and/or influential observations in regression diagnostics (회귀진단에서 이상치와 영향관측치를 동시에 발견하는 새로운 통계량에 관한 연구)

  • 강은미
    • The Korean Journal of Applied Statistics
    • /
    • v.6 no.1
    • /
    • pp.67-78
    • /
    • 1993
  • A new diagnostic statistic for detecting outliers and influential observations in linear models is suggested and studied in this paper. The proposed statistic is a weighted sum of two measures; one is for detecting outliers and the other is for detecting influential observations. The merit of this statistic is that it is possible to distinguish outliers from influential observations. We have done some Monte-Carlo Simulation to find the probability distribution of this statistic.

  • PDF

Outlier Detection and Replacement for Vertical Wind Speed in the Measurement of Actual Evapotranspiration (실제증발산 측정 시 연직 풍속 이상치 탐색 및 대체)

  • Park, Chun Gun;Rim, Chang-Soo;Lim, Kwang-Suop;Chae, Hyo-Sok
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.34 no.5
    • /
    • pp.1455-1461
    • /
    • 2014
  • In this study, using flux data measured in Deokgokje reservoir watershed near Deokyu mountain in May, June, and July 2011, statistical analysis was conducted for outlier detection and replacement for vertical wind speed in the measurement of evapotranspiration based on eddy covariance method. To statistically analyze the outliers of vertical wind speed, the outlier detection method based on interquartile range (IQR) in boxplot was employed and the detected outliers were deleted or replaced with mean. The comparison was conducted for the measured evapotranspiration before and after the outlier replacement. The study results showed that there is a difference between evapotranspiration before outlier replacement and evapotranspiration after outlier replacement, especially during the rainy day. Therefore, based on the study results, the outliers should be deleted or replaced in the measurement of evapotranspiration.

Robust Location Estimation based on TDOA and FDOA using Outlier Detection Algorithm (이상치 검출 알고리즘을 이용한 TDOA와 FDOA 기반 이동 신호원 위치 추정 기법)

  • Yoo, Hogeun;Lee, Jaehoon
    • Journal of Convergence for Information Technology
    • /
    • v.10 no.9
    • /
    • pp.15-21
    • /
    • 2020
  • This paper presents the outlier detection algorithm in the estimation method of a source location and velocity based on two-step weighted least-squares method using time difference of arrival(TDOA) and frequency difference of arrival(FDOA) data. Since the accuracy of the estimated location and velocity of a moving source can be reduced by the outliers of TDOA and FDOA data, it is important to detect and remove the outliers. In this paper, the method to find the minimum inlier data and the method to determine whether TDOA and FDOA data are included in inliers or outliers are presented. The results of numerical simulations show that the accuracy of the estimated location and velocity is improved by removing the outliers of TDOA and FDOA data.

Outlier Reduction using C-SCGP for Target Localization based on RSS/AOA in Wireless Sensor Networks (무선 센서 네트워크에서 C-SCGP를 이용한 RSS/AOA 이상치 제거 기반 표적 위치추정 기법)

  • Kang, SeYoung;Lee, Jaehoon;Song, JongIn;Chung, Wonzoo
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.11
    • /
    • pp.31-37
    • /
    • 2021
  • In this paper, we propose an outlier detection algorithm called C-SCGP to prevent the degradation of localization performance based on RSS (Received Signal Strength) and AOA (Angle of Arrival) in the presence of outliers in wireless sensor networks. Since the accuracy of target estimation can significantly deteriorate due to various cause of outliers such as malfunction of sensor, jamming, and severe noise, it is important to detect and filter out all outliers. The single cluster graph partitioning (SCGP) algorithm has been widely used to remove such outliers. The proposed continuous-SCGP (C-SCGP) algorithm overcomes the weakness of the SCGP that requires the threshold and computing probability of outliers, which are impratical in many applications. The results of numerical simulations show that the performance of C-SCGP without setting threshold and probability computation is the same performance of SCGP.

Simultaneous outlier detection and variable selection via difference-based regression model and stochastic search variable selection

  • Park, Jong Suk;Park, Chun Gun;Lee, Kyeong Eun
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.2
    • /
    • pp.149-161
    • /
    • 2019
  • In this article, we suggest the following approaches to simultaneous variable selection and outlier detection. First, we determine possible candidates for outliers using properties of an intercept estimator in a difference-based regression model, and the information of outliers is reflected in the multiple regression model adding mean shift parameters. Second, we select the best model from the model including the outlier candidates as predictors using stochastic search variable selection. Finally, we evaluate our method using simulations and real data analysis to yield promising results. In addition, we need to develop our method to make robust estimates. We will also to the nonparametric regression model for simultaneous outlier detection and variable selection.

Using Geometry based Anomaly Detection to check the Integrity of IFC classifications in BIM Models (기하정보 기반 이상탐지분석을 이용한 BIM 개별 부재 IFC 분류 무결성 검토에 관한 연구)

  • Koo, Bonsang;Shin, Byungjin
    • Journal of KIBIM
    • /
    • v.7 no.1
    • /
    • pp.18-27
    • /
    • 2017
  • Although Industry Foundation Classes (IFC) provide standards for exchanging Building Information Modeling (BIM) data, authoring tools still require manual mapping between BIM entities and IFC classes. This leads to errors and omissions, which results in corrupted data exchanges that are unreliable and thus compromise the validity of IFC. This research explored precedent work by Krijnen and Tamke, who suggested ways to automate the mapping of IFC classes using a machine learning technique, namely anomaly detection. The technique incorporates geometric features of individual components to find outliers among entities in identical IFC classes. This research primarily focused on applying this approach on two architectural BIM models and determining its feasibility as well as limitations. Results indicated that the approach, while effective, misclassified outliers when an IFC class had several dissimilar entities. Another issue was the lack of entities for some specific IFC classes that prohibited the anomaly detection from comparing differences. Future research to improve these issues include the addition of geometric features, using novelty detection and the inclusion of a probabilistic graph model, to improve classification accuracy.