• Title/Summary/Keyword: 이상치 제거

Search Result 417, Processing Time 0.025 seconds

An Outlier Data Analysis using Support Vector Regression (Support Vector Regression을 이용한 이상치 데이터분석)

  • Jun, Sung-Hae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.6
    • /
    • pp.876-880
    • /
    • 2008
  • Outliers are the observations which are very larger or smaller than most observations in the given data set. These are shown by some sources. The result of the analysis with outliers may be depended on them. In general, we do data analysis after removing outliers. But, in data mining applications such as fraud detection and intrusion detection, outliers are included in training data because they have crucial information. In regression models, simple and multiple regression models need to eliminate outliers from given training data by standadized and studentized residuals to construct good model. In this paper, we use support vector regression(SVR) based on statistical teaming theory to analyze data with outliers in regression. We verify the improved performance of our work by the experiment using synthetic data sets.

Error Filtering Algorithm for Accurate Travel Speed Measurement Using UTIS (UTIS 구간통행속도 이상치 제거 알고리즘)

  • Ki, Yong-Kul;Ahn, Gye-Hyeong;Kim, Eun-Jeong;Jeong, Jun-Ha;Bae, Kwang-Soo;Lee, Choul-Ki
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.9 no.6
    • /
    • pp.33-42
    • /
    • 2010
  • Travel speed is an important parameter in measurement of road traffic. UTIS(Urban Traffic Information System) was developed as a type of section detector. However, UTIS incur errors caused by irregular vehicle trajectories, wireless communication range and so on. This paper suggests a new model that use an error-filtering algorithm to improve the accuracy of travel speed measurements. In the field test, the variance of the percent errors measured by the new model was reduced. Therefore, it can be concluded that the proposed model significantly improves travel speed measuring accuracy.

Development of Statistical System for Checking Multivariate Normality and Outliers (다변량 정규성과 이상치 검정을 위한 통계 시스템 개발)

  • 최용석;김종건;강명래
    • The Korean Journal of Applied Statistics
    • /
    • v.14 no.2
    • /
    • pp.223-231
    • /
    • 2001
  • 다변량분석 기법을 위해서는 자료가 정규성(normality)가정을 만족해야한다. 본 연구에서는 GUI환경에서 일변량 및 다변량자료의 정규성검정, 이상치제거 및 변수변환을 하는 시스템을 Visual Basic 언어로서 구축하여 사용자들이 보다 편리하게 사용할 수 있음을 소개 하고자 한다.

  • PDF

Robust Location Estimation based on TDOA and FDOA using Outlier Detection Algorithm (이상치 검출 알고리즘을 이용한 TDOA와 FDOA 기반 이동 신호원 위치 추정 기법)

  • Yoo, Hogeun;Lee, Jaehoon
    • Journal of Convergence for Information Technology
    • /
    • v.10 no.9
    • /
    • pp.15-21
    • /
    • 2020
  • This paper presents the outlier detection algorithm in the estimation method of a source location and velocity based on two-step weighted least-squares method using time difference of arrival(TDOA) and frequency difference of arrival(FDOA) data. Since the accuracy of the estimated location and velocity of a moving source can be reduced by the outliers of TDOA and FDOA data, it is important to detect and remove the outliers. In this paper, the method to find the minimum inlier data and the method to determine whether TDOA and FDOA data are included in inliers or outliers are presented. The results of numerical simulations show that the accuracy of the estimated location and velocity is improved by removing the outliers of TDOA and FDOA data.

Outlier Detection Based on MapReduce for Analyzing Big Data (대용량 데이터 분석을 위한 맵리듀스 기반의 이상치 탐지)

  • Hong, Yejin;Na, Eunhee;Jung, Yonghwan;Kim, Yangwoo
    • Journal of Internet Computing and Services
    • /
    • v.18 no.1
    • /
    • pp.27-35
    • /
    • 2017
  • In near future, IoT data is expected to be a major portion of Big Data. Moreover, sensor data is expected to be major portion of IoT data, and its' research is actively carried out currently. However, processed results may not be trusted and used if outlier data is included in the processing of sensor data. Therefore, method for detection and deletion of those outlier data before processing is studied in this paper. Moreover, we used Spark which is memory based distributed processing environment for fast processing of big sensor data. The detection and deletion of outlier data consist of four stages, and each stage is implemented with Mapper and Reducer operation. The proposed method is compared in three different processing environments, and it is expected that the outlier detection and deletion performance is best in the distributed Spark environment as data volume is increasing.

The Quartile Deviation and the Control Chart Model of Improvement Confidence for Link Travel Speed from GPS Probe Data (사분위편차 및 관리도 모형에 의한 GPS 수집기반 구간통행속도 데이터 이상치 제거방안 연구)

  • Han, Won-Sub;Kim, Dong-Hyo;Hyun, Cheol-Seung;Lee, Ho-Won;Oh, Yong-Tae;Lee, Choul-Ki
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.7 no.6
    • /
    • pp.21-30
    • /
    • 2008
  • The travel speed collected by the prove-car equipped with the GPS has the problems, which are the data's stability and finding out the representative travel speed, by the influence of the traffic signal and etc. at the interrupted traffic. This study was conducted to develop the method of filtering the outlier data from the data collected by the prove-car. The method to remove the outlier data from the serial data which were collected by the prove-car was adapted to each of the quartile deviation statistics model and the management graphic statistics model. The rate of removing the outlier data by the quartile deviation method was $0{\sim}3.7%$ while the rate by the management graphic statistic methods was $0.3{\sim}7.2%$. Both methods show the low removal rate at the dawn time when the traffic is inactivity, on the other hand the remove rate is high during the daytime. However, both methods have the problem such that the threshold level for removing the outlier data was established at the low bound in the case as good as the statistics model. Therefore, it is required for the experience calibration.

  • PDF

Outlier Filtering and Missing Data Imputation Algorithm using TCS Data (TCS데이터를 이용한 이상치제거 및 결측보정 알고리즘 개발)

  • Do, Myung-Sik;Lee, Hyang-Mee;NamKoong, Seong
    • Journal of Korean Society of Transportation
    • /
    • v.26 no.4
    • /
    • pp.241-250
    • /
    • 2008
  • With the ever-growing amount of traffic, there is an increasing need for good quality travel time information. Various existing outlier filtering and missing data imputation algorithms using AVI data for interrupted and uninterrupted traffic flow have been proposed. This paper is devoted to development of an outlier filtering and missing data imputation algorithm by using Toll Collection System (TCS) data. TCS travel time data collected from August to September 2007 were employed. Travel time data from TCS are made out of records of every passing vehicle; these data have potential for providing real-time travel time information. However, the authors found that as the distance between entry tollgates and exit tollgates increases, the variance of travel time also increases. Also, time gaps appeared in the case of long distances between tollgates. Finally, the authors propose a new method for making representative values after removal of abnormal and "noise" data and after analyzing existing methods. The proposed algorithm is effective.

A Performance Improvement on Navigation Applying Measurement Estimation in Urban Weak Signal Environment (도심에서의 측정치 추정을 적용한 항법성능 향상 연구)

  • Park, Sul Gee;Cho, Deuk Jae
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.11
    • /
    • pp.2745-2752
    • /
    • 2014
  • In recent years, Transport Demand Management has been conducted for the efficient management of transport. In ITS applications in particular, the prerequisite is accurate and reliable positioning. However, the major problems are satellite signal outage, and multipath. This paper proposes that outage and multipath measurement can be detected and estimated using elevation angle and signal to noise ratio data association relation in stand-alone GPS. In order to verify the performance of the proposed method, it is then evaluated by the car test. the evaluation test environment has low accuracy and unreliable positioning because of signal outage or multipath such as steep hill and high buildings. In the evaluation test result, 918times abnormal signal occurred and it was confirmed that the proposed method showed more improved 9.48m(RMS) horizontal positioning error than without proposed method.

A Study on Translation-Invariant Wavelet De-Noising with Multi-Thresholding Function (다중 임계치 함수의 TI 웨이브렛 잡음제거 기법)

  • Choi, Jae-Yong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.7
    • /
    • pp.333-338
    • /
    • 2006
  • This paper proposes an improved do-noising method using multi-thresholding function based on translation-invariant (W) wavelet proposed by Donoho et al. for underwater radiated noise measurement. The traditional wavelet thresholding de-noising method causes Pseudo-Gibbs phenomena near singularities due to discrete wavelet transform. In order to suppress Pseudo-Gibbs Phenomena, a do-noising method combining multi-thresholding function with the translation-invariant wavelet transform is proposed in this paper. The multi-thresholding function is a modified soft-thresholding to each node according to the discriminated threshold so as to reject かon external noise and white gaussian noise. It is verified by numerical simulation. And the experimental results are confirmed through sea-trial using multi-single sensors.

Study on Enhancement of TRANSGUIDE Outlier Filter Method under Unstable Traffic Flow for Reliable Travel Time Estimation -Focus on Dedicated Short Range Communications Probes- (불안정한 교통류상태에서 TRANSGUIDE 이상치 제거 기법 개선을 통한 교통 통행시간 예측 향상 연구 -DSRC 수집정보를 중심으로-)

  • Khedher, Moataz Bellah Ben;Yun, Duk Geun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.3
    • /
    • pp.249-257
    • /
    • 2017
  • Filtering the data for travel time records obtained from DSRC probes is essential for a better estimation of the link travel time. This study addresses the major deficiency in the performance of TRANSGUIDE in removing anomalous data. This algorithm is unable to handle unstable traffic flow conditions for certain time intervals, where fluctuations are observed. In this regard, this study proposes an algorithm that is capable of overcoming the weaknesses of TRANSGUIDE. If TRANSGUIDE fails to validate sufficient number of observations inside one time interval, another process specifies a new validity range based on the median absolute deviation (MAD), a common statistical approach. The proposed algorithm suggests the parameters, ${\alpha}$ and ${\beta}$, to consider the maximum allowed outlier within a one-time interval to respond to certain traffic flow conditions. The parameter estimation relies on historical data because it needs to be updated frequently. To test the proposed algorithm, the DSRC probe travel time data were collected from a multilane highway road section. Calibration of the model was performed by statistical data analysis through using cumulative relative frequency. The qualitative evaluation shows satisfactory performance. The proposed model overcomes the deficiency associated with the rapid change in travel time.