• 제목/요약/키워드: Outlier Data

검색결과 412건 처리시간 0.325초

A Note on Bayesian Prediction Analysis for the Rayleigh Model in the presence of Outliers

  • Ko, Jeong-Hwan;Kim, Yeung-Hoon
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 한국데이터정보과학회 2003년도 춘계학술대회
    • /
    • pp.171-176
    • /
    • 2003
  • This paper deals with the problem of predicting order statistics in samples from a Rayleigh population when an outlier is present. Bayesian predictive distribution and prediction bounds of the p-th order statistics is obtained where an outlier of type $\theta\delta$ is present. In this connection, some identies are derived.

  • PDF

Study on Lifelog Anomaly Detection using VAE-based Machine Learning Model (VAE(Variational AutoEncoder) 기반 머신러닝 모델을 활용한 체중 라이프로그 이상탐지에 관한 연구)

  • Kim, Jiyong;Park, Minseo
    • The Journal of the Convergence on Culture Technology
    • /
    • 제8권4호
    • /
    • pp.91-98
    • /
    • 2022
  • Lifelog data continuously collected through a wearable device may contain many outliers, so in order to improve data quality, it is necessary to find and remove outliers. In general, since the number of outliers is less than the number of normal data, a class imbalance problem occurs. To solve this imbalance problem, we propose a method that applies Variational AutoEncoder to outliers. After preprocessing the outlier data with proposed method, it is verified through a number of machine learning models(classification). As a result of verification using body weight data, it was confirmed that the performance was improved in all classification models. Based on the experimental results, when analyzing lifelog body weight data, we propose to apply the LightGBM model with the best performance after preprocessing the data using the outlier processing method proposed in this study.

A Nonparametric Approach for Noisy Point Data Preprocessing

  • Xi, Yongjian;Duan, Ye;Zhao, Hongkai
    • International Journal of CAD/CAM
    • /
    • 제9권1호
    • /
    • pp.31-36
    • /
    • 2010
  • 3D point data acquired from laser scan or stereo vision can be quite noisy. A preprocessing step is often needed before a surface reconstruction algorithm can be applied. In this paper, we propose a nonparametric approach for noisy point data preprocessing. In particular, we proposed an anisotropic kernel based nonparametric density estimation method for outlier removal, and a hill-climbing line search approach for projecting data points onto the real surface boundary. Our approach is simple, robust and efficient. We demonstrate our method on both real and synthetic point datasets.

Signal Compensation of LiDAR Sensors and Noise Filtering (LiDAR 센서 신호 보정 및 노이즈 필터링 기술 개발)

  • Park, Hong-Sun;Choi, Joon-Ho
    • Journal of Sensor Science and Technology
    • /
    • 제28권5호
    • /
    • pp.334-339
    • /
    • 2019
  • In this study, we propose a compensation method of raw LiDAR data with noise and noise filtering for signal processing of LiDAR sensors during the development phase. The raw LiDAR data include constant errors generated by delays in transmitting and receiving signals, which can be resolved by LiDAR signal compensation. The signal compensation consists of two stage. First one is LiDAR sensor calibration for a compensation of geometric distortion. Second is walk error compensation. LiDAR data also include fluctuation and outlier noise, the latter of which is removed by data filtering. In this study, we compensate for the fluctuation by using the Kalman filter method, and we remove the outlier noise by applying a Gaussian weight function.

Implementation of Bayesian Filter Method and Range Measurement Analysis for Underwater Robot Localization (수중로봇 위치추정을 위한 베이시안 필터 방법의 실현과 거리 측정 특성 분석)

  • Noh, Sung Woo;Ko, Nak Yong;Kim, Tae Gyun
    • The Journal of Korea Robotics Society
    • /
    • 제9권1호
    • /
    • pp.28-38
    • /
    • 2014
  • This paper verifies the performance of Extended Kalman Filter(EKF) and MCL(Monte Carlo Localization) approach to localization of an underwater vehicle through experiments. Especially, the experiments use acoustic range sensor whose measurement accuracy and uncertainty is not yet proved. Along with localization, the experiment also discloses the uncertainty features of the range measurement such as bias and variance. The proposed localization method rejects outlier range data and the experiment shows that outlier rejection improves localization performance. It is as expected that the proposed method doesn't yield as precise location as those methods which use high priced DVL(Doppler Velocity Log), IMU(Inertial Measurement Unit), and high accuracy range sensors. However, it is noticeable that the proposed method can achieve the accuracy which is affordable for correction of accumulated dead reckoning error, even though it uses only range data of low reliability and accuracy.

Outlier Detection in Growth Curve Model Using Mean-Shift Model (평균이동모형을 이용한 성장곡선모형의 이상점 진단에 관한 연구)

  • Shim, Kyu-Bark
    • Journal of the Korean Data and Information Science Society
    • /
    • 제10권2호
    • /
    • pp.369-385
    • /
    • 1999
  • For the growth curve model with arbitrary covariance structure, known as unstructured covariance matrix, the problems of detecting outliers are discussed in this paper. In order to detect outliers in the growth curve model, the likelihood ratio testing statistics in mean shift model is established and its distribution is derived. After we detected outliers in growth curve model, we test homo and/or hetero-geneous covariance matrices using PSR Quasi-Bayes Criterion. For illustration, one numerical example is discussed, which compares between before and after outlier deleting.

  • PDF

Adaptive boosting in ensembles for outlier detection: Base learner selection and fusion via local domain competence

  • Bii, Joash Kiprotich;Rimiru, Richard;Mwangi, Ronald Waweru
    • ETRI Journal
    • /
    • 제42권6호
    • /
    • pp.886-898
    • /
    • 2020
  • Unusual data patterns or outliers can be generated because of human errors, incorrect measurements, or malicious activities. Detecting outliers is a difficult task that requires complex ensembles. An ideal outlier detection ensemble should consider the strengths of individual base detectors while carefully combining their outputs to create a strong overall ensemble and achieve unbiased accuracy with minimal variance. Selecting and combining the outputs of dissimilar base learners is a challenging task. This paper proposes a model that utilizes heterogeneous base learners. It adaptively boosts the outcomes of preceding learners in the first phase by assigning weights and identifying high-performing learners based on their local domains, and then carefully fuses their outcomes in the second phase to improve overall accuracy. Experimental results from 10 benchmark datasets are used to train and test the proposed model. To investigate its accuracy in terms of separating outliers from inliers, the proposed model is tested and evaluated using accuracy metrics. The analyzed data are presented as crosstabs and percentages, followed by a descriptive method for synthesis and interpretation.

Performance Evaluation of Battery Remaining Time Estimation Methods According to Outlier Data Processing Policies in Mobile Devices (모바일 기기에서 이상치 데이터 처리 정책에 따른 배터리 잔여 시간 예측 기법의 평가)

  • Tak, Sungwoo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • 제26권7호
    • /
    • pp.1078-1090
    • /
    • 2022
  • The distribution patterns of battery usage time data per battery level are able to affect the performance of estimating battery remaining time in mobile devices. Outliers may mainly affect the estimation performance of statistical regression methods. In this paper, we propose a software framework that detects and processes outliers to improve the estimation performance of statistical regression methods. The proposed framework first detects outliers that degrade the estimation performance. The proposed framework replaces outliers with smoothed data. The difference between an outlier and its replaced data will be properly distributed into individual data. Finally, individual data are reinforced to improve the estimation performance. The numerical results obtained by experimenting the proposed framework confirmed that it yielded good performance of estimating battery remaining time.

An Outlier Cluster Detection Technique for Real-time Network Intrusion Detection Systems (실시간 네트워크 침입탐지 시스템을 위한 아웃라이어 클러스터 검출 기법)

  • Chang, Jae-Young;Park, Jong-Myoung;Kim, Han-Joon
    • Journal of Internet Computing and Services
    • /
    • 제8권6호
    • /
    • pp.43-53
    • /
    • 2007
  • Intrusion detection system(IDS) has recently evolved while combining signature-based detection approach with anomaly detection approach. Although signature-based IDS tools have been commonly used by utilizing machine learning algorithms, they only detect network intrusions with already known patterns, Ideal IDS tools should always keep the signature database of your detection system up-to-date. The system needs to generate the signatures to detect new possible attacks while monitoring and analyzing incoming network data. In this paper, we propose a new outlier cluster detection algorithm with density (or influence) function, Our method assumes that an outlier is a kind of cluster with similar instances instead of a single object in the context of network intrusion, Through extensive experiments using KDD 1999 Cup Intrusion Detection dataset. we show that the proposed method outperform the conventional outlier detection method using Euclidean distance function, specially when attacks occurs frequently.

  • PDF

Improved LTE Fingerprint Positioning Through Clustering-based Repeater Detection and Outlier Removal

  • Kwon, Jae Uk;Chae, Myeong Seok;Cho, Seong Yun
    • Journal of Positioning, Navigation, and Timing
    • /
    • 제11권4호
    • /
    • pp.369-379
    • /
    • 2022
  • In weighted k-nearest neighbor (WkNN)-based Fingerprinting positioning step, a process of comparing the requested positioning signal with signal information for each reference point stored in the fingerprint DB is performed. At this time, the higher the number of matched base station identifiers, the higher the possibility that the terminal exists in the corresponding location, and in fact, an additional weight is added to the location in proportion to the number of matching base stations. On the other hand, if the matching number of base stations is small, the selected candidate reference point has high dependence on the similarity value of the signal. But one problem arises here. The positioning signal can be compared with the repeater signal in the signal information stored on the DB, and the corresponding reference point can be selected as a candidate location. The selected reference point is likely to be an outlier, and if a certain weight is applied to the corresponding location, the error of the estimated location information increases. In order to solve this problem, this paper proposes a WkNN technique including an outlier removal function. To this end, it is first determined whether the repeater signal is included in the DB information of the matched base station. If the reference point for the repeater signal is selected as the candidate position, the reference position corresponding to the outlier is removed based on the clustering technique. The performance of the proposed technique is verified through data acquired in Seocho 1 and 2 dongs in Seoul.