• Title/Summary/Keyword: outlier detection

Search Result 230, Processing Time 0.021 seconds

Unified methods for variable selection and outlier detection in a linear regression

  • Seo, Han Son
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.6
    • /
    • pp.575-582
    • /
    • 2019
  • The problem of selecting variables in the presence of outliers is considered. Variable selection and outlier detection are not separable problems because each observation affects the fitted regression equation differently and has a different influence on each variable. We suggest a simultaneous method for variable selection and outlier detection in a linear regression model. The suggested procedure uses a sequential method to detect outliers and uses all possible subset regressions for model selections. A simplified version of the procedure is also proposed to reduce the computational burden. The procedures are compared to other variable selection methods using real data sets known to contain outliers. Examples show that the proposed procedures are effective and superior to robust algorithms in selecting the best model.

A Study on Outlier Detection in Smart Manufacturing Applications

  • Kim, Jeong-Hun;Chuluunsaikhan, Tserenpurev;Nasridinov, Aziz
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.10a
    • /
    • pp.760-761
    • /
    • 2019
  • Smart manufacturing is a process of integrating computer-related technologies in production and by doing so, achieving more efficient production management. The recent development of supercomputers has led to the broad utilization of artificial intelligence (AI) and machine learning techniques useful in predicting specific patterns. Despite the usefulness of AI and machine learning techniques in smart manufacturing processes, there are many fundamental issues with the direct deployment of these technologies related to data management. In this paper, we focus on solving the outlier detection issue in smart manufacturing applications. More specifically, we apply a state-of-the-art outlier detection technique, called Elliptic Envelope, to detect anomalies in simulation-based collected data.

Outlier Detection and Replacement for Vertical Wind Speed in the Measurement of Actual Evapotranspiration (실제증발산 측정 시 연직 풍속 이상치 탐색 및 대체)

  • Park, Chun Gun;Rim, Chang-Soo;Lim, Kwang-Suop;Chae, Hyo-Sok
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.34 no.5
    • /
    • pp.1455-1461
    • /
    • 2014
  • In this study, using flux data measured in Deokgokje reservoir watershed near Deokyu mountain in May, June, and July 2011, statistical analysis was conducted for outlier detection and replacement for vertical wind speed in the measurement of evapotranspiration based on eddy covariance method. To statistically analyze the outliers of vertical wind speed, the outlier detection method based on interquartile range (IQR) in boxplot was employed and the detected outliers were deleted or replaced with mean. The comparison was conducted for the measured evapotranspiration before and after the outlier replacement. The study results showed that there is a difference between evapotranspiration before outlier replacement and evapotranspiration after outlier replacement, especially during the rainy day. Therefore, based on the study results, the outliers should be deleted or replaced in the measurement of evapotranspiration.

Improving the Accuracy of Image Matching using Various Outlier Removal Algorithms (다양한 오정합 제거 알고리즘을 이용한 영상정합의 정확도 향상)

  • Lee, Yong-Il;Kim, Jun-Chul;Lee, Young-Ran;Shin, Sung-Woong
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.27 no.1
    • /
    • pp.667-675
    • /
    • 2009
  • Image matching is widely applied in image application areas, such as remote sensing and GIS. In general, the initial set of matching points always includes outlier which affect the accuracy of image matching. The purpose of this paper is to develop a robust approach for outlier detection and removal in order to keep accuracy in image matching applications. In this paper we use three automatic outlier detection techniques of backward matching and affine transformation, and RANSAC(RANdom SAmple Consensus) algorithm. Moreover, we calculate overlapping apply and steps block-based processing for fast and efficient image matching in pre-processing steps. The suggested approach in this paper has been applied to real frame image pairs and the results have been analyzed in terms of the robustness and the efficiency.

Realization of an outlier detection algorithm using R (R을 이용한 이상점 탐지 알고리즘의 구현)

  • Song, Gyu-Moon;Moon, Ji-Eun;Park, Cheol-Yong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.3
    • /
    • pp.449-458
    • /
    • 2011
  • Illegal waste dumping is one of the major problems that the government agency monitoring water quality has to face. Recently government agency installed COD (chemical oxygen demand) auto-monitering machines in river. In this article we provide an outlier detection algorithm using R based on the time series intervention model that detects some outlier values among those COD time series values generated from an auto-monitering machine. Through this algorithm using R, we can achieve an automatic algorithm that does not need manual intervention in each step, and that can further be used in simulation study.

Machine learning application to seismic site classification prediction model using Horizontal-to-Vertical Spectral Ratio (HVSR) of strong-ground motions

  • Francis G. Phi;Bumsu Cho;Jungeun Kim;Hyungik Cho;Yun Wook Choo;Dookie Kim;Inhi Kim
    • Geomechanics and Engineering
    • /
    • v.37 no.6
    • /
    • pp.539-554
    • /
    • 2024
  • This study explores development of prediction model for seismic site classification through the integration of machine learning techniques with horizontal-to-vertical spectral ratio (HVSR) methodologies. To improve model accuracy, the research employs outlier detection methods and, synthetic minority over-sampling technique (SMOTE) for data balance, and evaluates using seven machine learning models using seismic data from KiK-net. Notably, light gradient boosting method (LGBM), gradient boosting, and decision tree models exhibit improved performance when coupled with SMOTE, while Multiple linear regression (MLR) and Support vector machine (SVM) models show reduced efficacy. Outlier detection techniques significantly enhance accuracy, particularly for LGBM, gradient boosting, and voting boosting. The ensemble of LGBM with the isolation forest and SMOTE achieves the highest accuracy of 0.91, with LGBM and local outlier factor yielding the highest F1-score of 0.79. Consistently outperforming other models, LGBM proves most efficient for seismic site classification when supported by appropriate preprocessing procedures. These findings show the significance of outlier detection and data balancing for precise seismic soil classification prediction, offering insights and highlighting the potential of machine learning in optimizing site classification accuracy.

Outlier Detection of the Coastal Water Temperature Monitoring Data Using the Approximate and Detail Components (어림과 나머지 성분을 이용한 연안 수온자료의 이상자료 감지)

  • Cho, Hong-Yeon;Oh, Ji-Hee
    • Journal of the Korean Society for Marine Environment & Energy
    • /
    • v.15 no.2
    • /
    • pp.156-162
    • /
    • 2012
  • Outlier detection and treatment process is highly required as the first step for the statistical analysis of the monitoring data having many outliers frequently occurred in the coastal environmental monitoring projects. In this study, the outlier detection method using the approximate and detail (or residual) components of the (raw) data is suggested. The approximate and detail components of the data can be separated by the diverse filtering and smoothing methods. The decomposition of the data is carried out by the harmonic analysis and local regression curve, respectively. Then, the Grubbs' test and modified z-score method widely used to detect outliers in the data are applied to the detail components of the water temperature data. The new data set is reconstructed after removed the outliers detected by these methods. It can be shown that the suggested process is successfully applied to the outlier detection of the coastal water temperature monitoring data provided by the Real-time Information System for Aquaculture Environment, National Fisheries Research and Development Institute (NFRDI).

Development of a WPAN-based Self-positioning System for Indoor Flying Robots (실내 비행 로봇을 위한 WPAN 기반 자가 측위 시스템 개발)

  • Lim, Jeong-Min;Jeong, Won-Min;Sung, Tae-Kyung
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.21 no.5
    • /
    • pp.490-495
    • /
    • 2015
  • As flying robots are becoming popular, there are increased needs to use themforsuch purposes as parcel delivery, serving in restaurants, and stage performances. To control flying robots such as quad copters, localization is essential. In order to properly position flying robots, many techniques are in development, including IR (infra-red)-based systemswhich catch markers on a flying robot in order that it can position itself. However, this technique demonstrates only short coverage. Furthermore, localization from inertial sensors diverges as time passes. For this reason, this paper suggests a TWR (two-way ranging) based positioning technique. Despite the weaknesses in currently available TWR system, this paper suggests a self-positioning and outlier detection technique in order to provide reliable position information with a faster update rate. The self-positioning system sends a shorter message which reduces wireless traffic. By detecting and removing outlier measurements, a positioning result with better accuracy is acquired. Finally, this paper shows that the suggesting system detects outlierssequentially from less than half the number of anchors in localization system according to the degree of outlier in measurement and the noise level. By performing an outlier algorithm, better positioning accuracy is acquired as shown in the experimental result.

The Filtering Method to Reduce Corner Outlier Artifacts in HEVC (Corner Outlier Artifacts를 감소시키기 위한 HEVC 필터링 방법)

  • Ko, Kyung-hwan
    • Journal of Broadcast Engineering
    • /
    • v.22 no.3
    • /
    • pp.313-320
    • /
    • 2017
  • The In-loop filtering methods such as de-blocking filter and SAO(Sample Adaptive Offset) applied to the HEVC standard achieves coding efficiency and subjective quality improvement by reducing the blocking artifacts and the ringing artifacts. However, despite the use of In-loop filtering methods, the artifacts called a corner outlier occurring at the corner points of block boundaries are not removed. In this paper, the corner outlier artifacts are reduced by the detection, determination, and filtering processes on the corner outlier pixels. Experimental results show that the proposed method improves the subjective picture quality and slightly increases the coding efficiency in Inter prediction.

Anomaly Detection in Livestock Environmental Time Series Data Using LSTM Autoencoders: A Comparison of Performance Based on Threshold Settings (LSTM 오토인코더를 활용한 축산 환경 시계열 데이터의 이상치 탐지: 경계값 설정에 따른 성능 비교)

  • Se Yeon Chung;Sang Cheol Kim
    • Smart Media Journal
    • /
    • v.13 no.4
    • /
    • pp.48-56
    • /
    • 2024
  • In the livestock industry, detecting environmental outliers and predicting data are crucial tasks. Outliers in livestock environment data, typically gathered through time-series methods, can signal rapid changes in the environment and potential unexpected epidemics. Prompt detection and response to these outliers are essential to minimize stress in livestock and reduce economic losses for farmers by early detection of epidemic conditions. This study employs two methods to experiment and compare performances in setting thresholds that define outliers in livestock environment data outlier detection. The first method is an outlier detection using Mean Squared Error (MSE), and the second is an outlier detection using a Dynamic Threshold, which analyzes variability against the average value of previous data to identify outliers. The MSE-based method demonstrated a 94.98% accuracy rate, while the Dynamic Threshold method, which uses standard deviation, showed superior performance with 99.66% accuracy.