• Title/Summary/Keyword: Outlier Data

Search Result 410, Processing Time 0.03 seconds

Outlier Detection from High Sensitive Geiger Mode Imaging LIDAR Data retaining a High Outlier Ratio (높은 이상점 비율을 갖는 고감도 가이거모드 영상 라이다 데이터로부터 이상점 검출)

  • Kim, Seongjoon;Lee, Impyeong;Lee, Youngcheol;Jo, Minsik
    • Korean Journal of Remote Sensing
    • /
    • v.28 no.5
    • /
    • pp.573-586
    • /
    • 2012
  • Point clouds acquired by a LIDAR(Light Detection And Ranging, also LADAR) system often contain erroneous points called outliers seeming not to be on physical surfaces, which should be carefully detected and eliminated before further processing for applications. Particularly in case of LIDAR systems employing with a Gieger-mode array detector (GmFPA) of high sensitivity, the outlier ratio is significantly high, which makes existing algorithms often fail to detect the outliers from such a data set. In this paper, we propose a method to discriminate outliers from a point cloud with high outlier ratio acquired by a GmFPA LIDAR system. The underlying assumption of this method is that a meaningful targe surface occupy at least two adjacent pixels and the ranges from these pixels are similar. We applied the proposed method to simulated LIDAR data of different point density and outlier ratio and analyzed the performance according to different thresholds and data properties. Consequently, we found that the outlier detection probabilities are about 99% in most cases. We also confirmed that the proposed method is robust to data properties and less sensitive to the thresholds. The method will be effectively utilized for on-line realtime processing and post-processing of GmFPA LIDAR data.

The Effect of Outliers in Regression Analysis (회귀 분석에서 이상치가 미치는 영향)

  • Kim, Kwang-Soo;Bae, Young-Ju;Lee, Jin-Gue
    • Journal of Korean Society for Quality Management
    • /
    • v.24 no.2
    • /
    • pp.158-171
    • /
    • 1996
  • Outlier is one that appears to deviate extremely from other data in collected data. Thus treatment of outlier is very important work, because it is to distort the meaning of whole data in its analysis and to reduce the accuracy and validity for adequate models. The aim of this paper is to present some ways of handling outliers in given data and to investigate the effect of the analysis result before and after outlier reject. As a variety of methods has been proposed, we sellect the linear regression analysis and two linear programming techniques and compare to each result.

  • PDF

Outlier Tests in Sample Surveys

  • Namkyung, Pyong;Lee, Joon Suk
    • Communications for Statistical Applications and Methods
    • /
    • v.7 no.2
    • /
    • pp.447-456
    • /
    • 2000
  • In this paper, we considered three methods for outlier identification sample surveys. First, we studied method of handling and adjusting outliers in normal population. Second, we studied existing methods using mean, maximum and minimum and proposed a test using of median which well reflects characteristic of data regardless of sampling distribution. Finally, we showed our test using median works better than Dixon and mean test through simulation.

  • PDF

A Data Mining Tool for Massive Trajectory Data (대규모 궤적 데이타를 위한 데이타 마이닝 툴)

  • Lee, Jae-Gil
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.3
    • /
    • pp.145-153
    • /
    • 2009
  • Trajectory data are ubiquitous in the real world. Recent progress on satellite, sensor, RFID, video, and wireless technologies has made it possible to systematically track object movements and collect huge amounts of trajectory data. Accordingly, there is an ever-increasing interest in performing data analysis over trajectory data. In this paper, we develop a data mining tool for massive trajectory data. This mining tool supports three operations, clustering, classification, and outlier detection, which are the most widely used ones. Trajectory clustering discovers common movement patterns, trajectory classification predicts the class labels of moving objects based on their trajectories, and trajectory outlier detection finds trajectories that are grossly different from or inconsistent with the remaining set of trajectories. The primary advantage of the mining tool is to take advantage of the information of partial trajectories in the process of data mining. The effectiveness of the mining tool is shown using various real trajectory data sets. We believe that we have provided practical software for trajectory data mining which can be used in many real applications.

Outlier Detection Based on Discrete Wavelet Transform with Application to Saudi Stock Market Closed Price Series

  • RASHEDI, Khudhayr A.;ISMAIL, Mohd T.;WADI, S. Al;SERROUKH, Abdeslam
    • The Journal of Asian Finance, Economics and Business
    • /
    • v.7 no.12
    • /
    • pp.1-10
    • /
    • 2020
  • This study investigates the problem of outlier detection based on discrete wavelet transform in the context of time series data where the identification and treatment of outliers constitute an important component. An outlier is defined as a data point that deviates so much from the rest of observations within a data sample. In this work we focus on the application of the traditional method suggested by Tukey (1977) for detecting outliers in the closed price series of the Saudi Arabia stock market (Tadawul) between Oct. 2011 and Dec. 2019. The method is applied to the details obtained from the MODWT (Maximal-Overlap Discrete Wavelet Transform) of the original series. The result show that the suggested methodology was successful in detecting all of the outliers in the series. The findings of this study suggest that we can model and forecast the volatility of returns from the reconstructed series without outliers using GARCH models. The estimated GARCH volatility model was compared to other asymmetric GARCH models using standard forecast error metrics. It is found that the performance of the standard GARCH model were as good as that of the gjrGARCH model over the out-of-sample forecasts for returns among other GARCH specifications.

Elimination of Outlier from Technology Growth Curve using M-estimator for Defense Science and Technology Survey (M-추정을 사용한 국방과학기술 수준조사 기술성장모형의 이상치 제거)

  • Kim, Jangheon
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.23 no.1
    • /
    • pp.76-86
    • /
    • 2020
  • Technology growth curve methodology is commonly used in technology forecasting. A technology growth curve represents the paths of product performance in relation to time or investment in R&D. It is a useful tool to compare the technological performances between Korea and advanced nations and to describe the inflection points, the limit of improvement of a technology and their technology innovation strategies, etc. However, the curve fitting to a set of survey data often leads to model mis-specification, biased parameter estimation and incorrect result since data through survey with experts frequently contain outlier in process of curve fitting due to the subjective response characteristics. This paper propose a method to eliminate of outlier from a technology growth curve using M-estimator. The experimental results prove the overall improvement in technology growth curves by several pilot tests using real-data in Defense Science and Technology Survey reports.

Machine learning application to seismic site classification prediction model using Horizontal-to-Vertical Spectral Ratio (HVSR) of strong-ground motions

  • Francis G. Phi;Bumsu Cho;Jungeun Kim;Hyungik Cho;Yun Wook Choo;Dookie Kim;Inhi Kim
    • Geomechanics and Engineering
    • /
    • v.37 no.6
    • /
    • pp.539-554
    • /
    • 2024
  • This study explores development of prediction model for seismic site classification through the integration of machine learning techniques with horizontal-to-vertical spectral ratio (HVSR) methodologies. To improve model accuracy, the research employs outlier detection methods and, synthetic minority over-sampling technique (SMOTE) for data balance, and evaluates using seven machine learning models using seismic data from KiK-net. Notably, light gradient boosting method (LGBM), gradient boosting, and decision tree models exhibit improved performance when coupled with SMOTE, while Multiple linear regression (MLR) and Support vector machine (SVM) models show reduced efficacy. Outlier detection techniques significantly enhance accuracy, particularly for LGBM, gradient boosting, and voting boosting. The ensemble of LGBM with the isolation forest and SMOTE achieves the highest accuracy of 0.91, with LGBM and local outlier factor yielding the highest F1-score of 0.79. Consistently outperforming other models, LGBM proves most efficient for seismic site classification when supported by appropriate preprocessing procedures. These findings show the significance of outlier detection and data balancing for precise seismic soil classification prediction, offering insights and highlighting the potential of machine learning in optimizing site classification accuracy.

Fuzzy Learning Rule Using the Distance between Datum and the Centroids of Clusters (데이터와 클러스터들의 대표값들 사이의 거리를 이용한 퍼지학습법칙)

  • Kim, Yong-Soo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.4
    • /
    • pp.472-476
    • /
    • 2007
  • Learning rule affects importantly the performance of neural network. This paper proposes a new fuzzy learning rule that uses the learning rate considering the distance between the input vector and the prototypes of classes. When the learning rule updates the prototypes of classes, this consideration reduces the effect of outlier on the prototypes of classes. This comes from making the effect of the input vector, which locates near the decision boundary, larger than an outlier. Therefore, it can prevents an outlier from deteriorating the decision boundary. This new fuzzy learning rule is integrated into IAFC(Integrated Adaptive Fuzzy Clustering) fuzzy neural network. Iris data set is used to compare the performance of the proposed fuzzy neural network with those of other supervised neural networks. The results show that the proposed fuzzy neural network is better than other supervised neural networks.

Outlier Filtering and Missing Data Imputation Algorithm using TCS Data (TCS데이터를 이용한 이상치제거 및 결측보정 알고리즘 개발)

  • Do, Myung-Sik;Lee, Hyang-Mee;NamKoong, Seong
    • Journal of Korean Society of Transportation
    • /
    • v.26 no.4
    • /
    • pp.241-250
    • /
    • 2008
  • With the ever-growing amount of traffic, there is an increasing need for good quality travel time information. Various existing outlier filtering and missing data imputation algorithms using AVI data for interrupted and uninterrupted traffic flow have been proposed. This paper is devoted to development of an outlier filtering and missing data imputation algorithm by using Toll Collection System (TCS) data. TCS travel time data collected from August to September 2007 were employed. Travel time data from TCS are made out of records of every passing vehicle; these data have potential for providing real-time travel time information. However, the authors found that as the distance between entry tollgates and exit tollgates increases, the variance of travel time also increases. Also, time gaps appeared in the case of long distances between tollgates. Finally, the authors propose a new method for making representative values after removal of abnormal and "noise" data and after analyzing existing methods. The proposed algorithm is effective.

Structural Health Monitoring Methodology based on Outlier Analysis using Acceleration of Subway Stations (가속도 응답을 이용한 이상치 해석 기반 역사 구조 건전성 평가 기법 개발)

  • Shin, Jeong-Ryol;An, Tae-Ki;Lee, Chang-Gil;Park, Seung-Hee
    • Proceedings of the KSR Conference
    • /
    • 2011.10a
    • /
    • pp.281-286
    • /
    • 2011
  • Station structures, one of important infrastructures, which have been being operated since the 1970s, are especially vulnerable to even the medium-level earthquake and they could be damaged by long-term internal or external vibrations such as ambient vibrations. Recently, much attention has been paid to real-time monitoring of the fatal defect or long-term deterioration of civil infrastructures to ensure their safety and adequate performance throughout their life span. In this study, a structural health monitoring methodology using acceleration responses is proposed to evaluate the health-state of the station structures and to detect initial damage-stage. A damage index is developed using the acceleration data and it is applied to outlier analysis, one of unsupervised learning based pattern recognition methods. A threshold value for the outlier analysis is determined based on confidence level of the probabilistic distribution of the acceleration data. The probabilistic distribution is selected according to the feature of the collected data.

  • PDF