• Title/Summary/Keyword: potential outliers

Search Result 32, Processing Time 0.024 seconds

Outlier tests on potential outliers (잠재적 이상치군에 대한 검정)

  • Seo, Han Son
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.1
    • /
    • pp.159-167
    • /
    • 2017
  • Observations identified as potential outliers are usually tested for real outliers; however, some outlier detection methods skip a formal test or perform a test using simulated p-values. We introduce test procedures for outliers by testing subsets of potential outliers rather than by testing individual observations of potential outliers to avoid masking or swamping effects. Examples to illustrate methods and a Monte Carlo study to compare the power of the various methods are presented.

Temporal and spatial outlier detection in wireless sensor networks

  • Nguyen, Hoc Thai;Thai, Nguyen Huu
    • ETRI Journal
    • /
    • v.41 no.4
    • /
    • pp.437-451
    • /
    • 2019
  • Outlier detection techniques play an important role in enhancing the reliability of data communication in wireless sensor networks (WSNs). Considering the importance of outlier detection in WSNs, many outlier detection techniques have been proposed. Unfortunately, most of these techniques still have some potential limitations, that is, (a) high rate of false positives, (b) high time complexity, and (c) failure to detect outliers online. Moreover, these approaches mainly focus on either temporal outliers or spatial outliers. Therefore, this paper aims to introduce novel algorithms that successfully detect both temporal outliers and spatial outliers. Our contributions are twofold: (i) modifying the Hampel Identifier (HI) algorithm to achieve high accuracy identification rate in temporal outlier detection, (ii) combining the Gaussian process (GP) model and graph-based outlier detection technique to improve the performance of the algorithm in spatial outlier detection. The results demonstrate that our techniques outperform the state-of-the-art methods in terms of accuracy and work well with various data types.

Detecting outliers in multivariate data and visualization-R scripts (다변량 자료에서 특이점 검출 및 시각화 - R 스크립트)

  • Kim, Sung-Soo
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.4
    • /
    • pp.517-528
    • /
    • 2018
  • We provide R scripts to detect outliers in multivariate data and visualization. Detecting outliers is provided using three approaches 1) Robust Mahalanobis distance, 2) High Dimensional data, 3) density-based approach methods. We use the following techniques to visualize detected potential outliers 1) multidimensional scaling (MDS) and minimal spanning tree (MST) with k-means clustering, 2) MDS with fviz cluster, 3) principal component analysis (PCA) with fviz cluster. For real data sets, we use MLB pitching data including Ryu, Hyun-jin in 2013 and 2014. The developed R scripts can be downloaded at "http://www.knou.ac.kr/~sskim/ddpoutlier.html" (R scripts and also R package can be downloaded here).

Alternative robust estimation methods for parameters of Gumbel distribution: an application to wind speed data with outliers

  • Aydin, Demet
    • Wind and Structures
    • /
    • v.26 no.6
    • /
    • pp.383-395
    • /
    • 2018
  • An accurate determination of wind speed distribution is the basis for an evaluation of the wind energy potential required to design a wind turbine, so it is important to estimate unknown parameters of wind speed distribution. In this paper, Gumbel distribution is used in modelling wind speed data, and alternative robust estimation methods to estimate its parameters are considered. The methodologies used to obtain the estimators of the parameters are least absolute deviation, weighted least absolute deviation, median/MAD and least median of squares. The performances of the estimators are compared with traditional estimation methods (i.e., maximum likelihood and least squares) according to bias, mean square deviation and total mean square deviation criteria using a Monte-Carlo simulation study for the data with and without outliers. The simulation results show that least median of squares and median/MAD estimators are more efficient than others for data with outliers in many cases. However, median/MAD estimator is not consistent for location parameter of Gumbel distribution in all cases. In real data application, it is firstly demonstrated that Gumbel distribution fits the daily mean wind speed data well and is also better one to model the data than Weibull distribution with respect to the root mean square error and coefficient of determination criteria. Next, the wind data modified by outliers is analysed to show the performance of the proposed estimators by using numerical and graphical methods.

Impact of Outliers on the Statistical Measures of the Environmental Monitoring Data in Busan Coastal Sea (이상자료가 연안 환경자료의 통계 척도에 미치는 영향)

  • Cho, Hong-Yeon;Lee, Ki-Seop;Ahn, Soon-Mo
    • Ocean and Polar Research
    • /
    • v.38 no.2
    • /
    • pp.149-159
    • /
    • 2016
  • The statistical measures of the coastal environmental data are used in a variety of statistical inferences, hypothesis tests, and data-driven modeling. If the measures are biased, then the statistical estimations and models may also be biased and this potential for bias is great when data contain some outliers defined as extraordinary large or small data values. This study aims to suggest more robust statistical measures as alternatives to more commonly used measures and to assess the performance these robust measures through a quantitative evaluation of more typical measures, such as in terms of locations, spreads, and shapes, with regard to environmental monitoring data in the Busan coastal sea. The detection of outliers within the data was carried out on the basis of Rosner's test. About 5-10% of the nutrient data were found to contain outliers based on Rosner's test. After removal (zero-weighting) of the outliers in the data sets, the relative change ratios of the mean and standard deviation between before and after outlier-removal conditions revealed the figures 13 and 33%, respectively. The variation magnitudes of skewness and kurtosis are 1.36 and 8.11 in a decreasing trend, respectively. On the other hand, the change ratios for more robust measures regarding the mean and standard deviation are 3.7-10.5%, and the variation magnitudes of robust skewness and kurtosis are about only 2-4% of the magnitude of the non-robust measures. The robust measures can be regarded as outlier-resistant statistical measures based on the relatively small changes in the scenarios before and after outlier removal conditions.

Anomaly Detection in Livestock Environmental Time Series Data Using LSTM Autoencoders: A Comparison of Performance Based on Threshold Settings (LSTM 오토인코더를 활용한 축산 환경 시계열 데이터의 이상치 탐지: 경계값 설정에 따른 성능 비교)

  • Se Yeon Chung;Sang Cheol Kim
    • Smart Media Journal
    • /
    • v.13 no.4
    • /
    • pp.48-56
    • /
    • 2024
  • In the livestock industry, detecting environmental outliers and predicting data are crucial tasks. Outliers in livestock environment data, typically gathered through time-series methods, can signal rapid changes in the environment and potential unexpected epidemics. Prompt detection and response to these outliers are essential to minimize stress in livestock and reduce economic losses for farmers by early detection of epidemic conditions. This study employs two methods to experiment and compare performances in setting thresholds that define outliers in livestock environment data outlier detection. The first method is an outlier detection using Mean Squared Error (MSE), and the second is an outlier detection using a Dynamic Threshold, which analyzes variability against the average value of previous data to identify outliers. The MSE-based method demonstrated a 94.98% accuracy rate, while the Dynamic Threshold method, which uses standard deviation, showed superior performance with 99.66% accuracy.

Robust Response Transformation Using Outlier Detection in Regression Model (회귀모형에서 이상치 검색을 이용한 로버스트 변수변환방법)

  • Seo, Han-Son;Lee, Ga-Yoen;Yoon, Min
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.1
    • /
    • pp.205-213
    • /
    • 2012
  • Transforming response variable is a general tool to adapt data to a linear regression model. However, it is well known that response transformations in linear regression are very sensitive to one or a few outliers. Many methods have been suggested to develop transformations that will not be influenced by potential outliers. Recently Cheng (2005) suggested to using a trimmed likelihood estimator based on the idea of the least trimmed squares estimator(LTS). However, the method requires presetting the number of outliers and needs many computations. A new method is proposed, that can solve the problems addressed and improve the robustness of the estimates. The method uses a stepwise procedure, suggested by Hadi and Simonoff (1993), to detect outliers that determine response transformations.

Leak Detection in a Water Pipe Network Using the Principal Component Analysis (주성분 분석을 이용한 상수도 관망의 누수감지)

  • Park, Suwan;Ha, Jaehong;Kim, Kimin
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2018.05a
    • /
    • pp.276-276
    • /
    • 2018
  • In this paper the potential of the Principle Component Analysis(PCA) technique that can be used to detect leaks in water pipe network blocks was evaluated. For this purpose the PCA was conducted to evaluate the relevance of the calculated outliers of a PCA model utilizing the recorded pipe flows and the recorded pipe leak incidents of a case study water distribution system. The PCA technique was enhanced by applying the computational algorithms developed in this study. The algorithms were designed to extract a partial set of flow data from the original 24 hour flow data so that the variability of the flows in the determined partial data set are minimal. The relevance of the calculated outliers of a PCA model and the recorded pipe leak incidents was analyzed. The results showed that the effectiveness of detecting leaks may improve by applying the developed algorithm. However, the analysis suggested that further development on the algorithm is needed to enhance the applicability of the PCA in detecting leaks in real-world water pipe networks.

  • PDF

Cultural Tunneling Effect: Conceptual adoption & Application in movie industry

  • Roh, Seungkook
    • Asia Marketing Journal
    • /
    • v.16 no.3
    • /
    • pp.77-100
    • /
    • 2014
  • Many researchers have analyzed the relationship between the financial success patterns of a motion picture and many other factors, such as the production cost, marketing, stars, awards, reviews, genre, and rating. Through these studies, many researchers and investors concluded that big budgets to make a blockbuster movie can serve as an insurance policy to meet their ROI; thus the box office is dominated by blockbuster movies. High-budget blockbuster movies are more likely to receive attention because these movies are more recognizable given their high expenses for production and casting. Therefore, audiences choose blockbusters in an effort to reduce the searching cost and to mitigate the possibility of a regrettable choice. This behavior of consumers, in turn, causes distributors to allocate screens for blockbusters, resulting in "concentration of blockbuster consumption." As such, low-budget films cannot easily become popular due to the lack of distribution. Indeed, low-budget films released on a small number of screens often end up becoming dismal failures. However, there are exceptional examples which are contrary to the general idea in the movie industry that a big budget and showings on a large number of screens can guarantee the success of a movie. Although researchers have attempted to analyze the performances of movies with small budgets, such movies are likely to be regarded as outliers and then be entirely discarded, as they are far from the 'three-sigma' range, especially given that previous research methodologies could not explain the financial success of such unique examples. This study attempts to explain the financial success at the box office of low-budget movies by applying the concept of the tunnel effect in quantum mechanics, as the phenomenon found in the movie industry is similar to a particle's movement in quantum physics. The tunneling effect is a phenomenon by which a particle without enough energy to pass over a potential barrier tunnels through it. Adopting the analogy, this study draws a tunneling probability function and cultural constant to forecast other outliers using the Schrödinger equation. Moreover, the study finds that word-of-mouth creates in the movie industry this phenomenon of finding outliers.

Fast robust variable selection using VIF regression in large datasets (대형 데이터에서 VIF회귀를 이용한 신속 강건 변수선택법)

  • Seo, Han Son
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.4
    • /
    • pp.463-473
    • /
    • 2018
  • Variable selection algorithms for linear regression models of large data are considered. Many algorithms are proposed focusing on the speed and the robustness of algorithms. Among them variance inflation factor (VIF) regression is fast and accurate due to the use of a streamwise regression approach. But a VIF regression is susceptible to outliers because it estimates a model by a least-square method. A robust criterion using a weighted estimator has been proposed for the robustness of algorithm; in addition, a robust VIF regression has also been proposed for the same purpose. In this article a fast and robust variable selection method is suggested via a VIF regression with detecting and removing potential outliers. A simulation study and an analysis of a dataset are conducted to compare the suggested method with other methods.