• 제목/요약/키워드: Outlier model

검색결과 212건 처리시간 0.027초

MULTIPLE OUTLIER DETECTION IN LOGISTIC REGRESSION BY USING INFLUENCE MATRIX

  • Lee, Gwi-Hyun;Park, Sung-Hyun
    • Journal of the Korean Statistical Society
    • /
    • 제36권4호
    • /
    • pp.457-469
    • /
    • 2007
  • Many procedures are available to identify a single outlier or an isolated influential point in linear regression and logistic regression. But the detection of influential points or multiple outliers is more difficult, owing to masking and swamping problems. The multiple outlier detection methods for logistic regression have not been studied from the points of direct procedure yet. In this paper we consider the direct methods for logistic regression by extending the $Pe\tilde{n}a$ and Yohai (1995) influence matrix algorithm. We define the influence matrix in logistic regression by using Cook's distance in logistic regression, and test multiple outliers by using the mean shift model. To show accuracy of the proposed multiple outlier detection algorithm, we simulate artificial data including multiple outliers with masking and swamping.

한 개의 불확실(不確實)한 이상점(異常點)을 갖는 와이블분포(分布)에서 최대(最大)값과 최소(最小)값의 상관계수(相關係數) (The Correlation Coefficient between the Smallest and Largest Observations in the Weibull Model in the Presence of an Unidentified Outlier)

  • 우정수;이창수
    • Journal of the Korean Data and Information Science Society
    • /
    • 제4권
    • /
    • pp.131-136
    • /
    • 1993
  • We shall consider the trends of correlation coefficient between the smallest and largest observations in the Weibull model in the presence of an unidentified outlier, and derive the density functions of order statistics by the permanent theory.

  • PDF

A study On An Identification of Interactions In A Nonreplicated Two-Way Layout With $L_1$-Estimation

  • Lee, Ki-Hoon
    • Communications for Statistical Applications and Methods
    • /
    • 제7권1호
    • /
    • pp.119-128
    • /
    • 2000
  • This paper proposes a method for detecting interactions in a two-way layout with one observation per cell. The identification of interactions in the model is not clear for they are confounding with error terms. The $L_1$-Estimation is robust with respect to a y-direction outlier in linear model so we are able to estimate main effects without affection of interactions, If an observation is classified as an outlier we conclude it contains an interaction. An empirical study compared with a classical method is performed.

  • PDF

화학적산소요구량의 총유기탄소 변환을 위한 이상자료의 탐지와 처리 (Outlier Detection and Treatment for the Conversion of Chemical Oxygen Demand to Total Organic Carbon)

  • 조범준;조홍연;김성
    • 한국해안·해양공학회논문집
    • /
    • 제26권4호
    • /
    • pp.207-216
    • /
    • 2014
  • 총유기탄소(TOC)는 해양의 탄소순환 연구분야에서 직접적인 생물학적 지표로 이용되는 중요한 인자다. 가용한 TOC 자료가 상대적으로 화학적산소요구량(COD) 자료 보다 부족하기 때문에 COD 자료를 활용하여 TOC 자료를 추정할 수 있다. COD를 TOC 로의 변환 시 TOC 추정에 직접적으로 영향을 미치는 COD 관측자료에 포함된 이상자료의 탐지와 적절한 처리는 합리적이고 객관적으로 수행되어야 한다. 본 연구에서는 국내 연안해역에서 관측된 염분, COD 및 TOC 자료에 대한 최적회귀모형을 제시하였다. 최적회귀모형은 이상자료와 영향자료를 여러 가지 탐색방법으로 진단하여 제거 전 후의 자료 개수 변화, 변동계수 및 RMS 오차를 비교 및 분석하여 선택하였다. 연구수행 결과, Cook의 진단방법과 SIQR의 boxplot 방법을 조합한 방법이 가장 적절한 것으로 파악되었다. 최적 회귀 함수는 TOC(mg/L) = $0.44{\cdot}COD(mg/L)+1.53$ 이고, 결정계수는 0.47 정도로 나타났으며, RMS 오차는 0.85 mg/L이다. RMS 오차와 지레계수(leverage values)의 변동계수는 이상자료 제거 전에 비하여 각각 31%, 80%로 크게 감소되었다. 본 연구에서 제시된 방법을 통해 COD와 TOC 관측자료에 포함된 이상자료와 영향자료의 과도한 영향을 진단 및 제거하였기 때문에 보다 적절한 회귀곡선식을 제시할 수 있었다.

군집 알고리즘을 이용한 순차적 이상치 탐지법 (A sequential outlier detecting method using a clustering algorithm)

  • 서한손;윤민
    • 응용통계연구
    • /
    • 제29권4호
    • /
    • pp.699-706
    • /
    • 2016
  • 검정절차가 생략된 이상치 탐지법은 구조적으로 수렁효과나 가면효과에 취약하기 때문에 다수의 이상치를 제대로 탐지하지 못할 때가 있다. 본 연구에서는 군집화에 의하여 구분된 소수 관찰치군을 이상치로 판정하는 방법에 보완될 검정절차를 다룬다. 이에 관련된 일반적인 방법은 탐지된 이상치 후보군의 개별적인 관찰치에 대해 다양한 종류의 t-검정을 수행하는 것이다. 본 연구에서는 이상치 후보군에 대한 검정을 수행하고 군집나무의 절단기준을 변경시켜 새로운 이상치군을 탐색해 나가는 순차적인 방법을 제안한다. 예제와 모의실험을 통해 제시된 방법과 기존의 방법들을 비교한다.

앙상블 기법을 이용한 선박 메인엔진 빅데이터의 이상치 탐지 (Outlier detection of main engine data of a ship using ensemble method)

  • 김동현;이지환;이상봉;정봉규
    • 수산해양기술연구
    • /
    • 제56권4호
    • /
    • pp.384-394
    • /
    • 2020
  • This paper proposes an outlier detection model based on machine learning that can diagnose the presence or absence of major engine parts through unsupervised learning analysis of main engine big data of a ship. Engine big data of the ship was collected for more than seven months, and expert knowledge and correlation analysis were performed to select features that are closely related to the operation of the main engine. For unsupervised learning analysis, ensemble model wherein many predictive models are strategically combined to increase the model performance, is used for anomaly detection. As a result, the proposed model successfully detected the anomalous engine status from the normal status. To validate our approach, clustering analysis was conducted to find out the different patterns of anomalies the anomalous point. By examining distribution of each cluster, we could successfully find the patterns of anomalies.

Adaptive boosting in ensembles for outlier detection: Base learner selection and fusion via local domain competence

  • Bii, Joash Kiprotich;Rimiru, Richard;Mwangi, Ronald Waweru
    • ETRI Journal
    • /
    • 제42권6호
    • /
    • pp.886-898
    • /
    • 2020
  • Unusual data patterns or outliers can be generated because of human errors, incorrect measurements, or malicious activities. Detecting outliers is a difficult task that requires complex ensembles. An ideal outlier detection ensemble should consider the strengths of individual base detectors while carefully combining their outputs to create a strong overall ensemble and achieve unbiased accuracy with minimal variance. Selecting and combining the outputs of dissimilar base learners is a challenging task. This paper proposes a model that utilizes heterogeneous base learners. It adaptively boosts the outcomes of preceding learners in the first phase by assigning weights and identifying high-performing learners based on their local domains, and then carefully fuses their outcomes in the second phase to improve overall accuracy. Experimental results from 10 benchmark datasets are used to train and test the proposed model. To investigate its accuracy in terms of separating outliers from inliers, the proposed model is tested and evaluated using accuracy metrics. The analyzed data are presented as crosstabs and percentages, followed by a descriptive method for synthesis and interpretation.

APPLICATION OF HISTOGRAM OUTLIER ANALYSIS ON THE IMAGE DEGRADATION MODEL FOR BEST FOCAL POINT SELECTION

  • Shin, Hyun-Kyung
    • Journal of applied mathematics & informatics
    • /
    • 제27권1_2호
    • /
    • pp.175-182
    • /
    • 2009
  • Microscopic imaging system often requires the algorithm to adjust location of camera lenses automatically in machine level. An effort to detect the best focal point is naturally interpreted as a mathematical inverse problem [1]. Following Wiener's point of view [2], we interpret the focus level of images as the quantified factor appeared in image degradation model: g = $f{\ast}H+{\eta}$, a standard mathematical model for understanding signal or image degradation process [3]. In this paper we propose a simple, very fast and robust method to compare the degradation parameters among the multiple images given by introducing outlier analysis of histogram.

  • PDF

A Note on Bayesian Prediction Analysis for the Rayleigh Model in the presence of Outliers

  • 고정환;김영훈
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 한국데이터정보과학회 2003년도 춘계학술대회
    • /
    • pp.171-176
    • /
    • 2003
  • This paper deals with the problem of predicting order statistics in samples from a Rayleigh population when an outlier is present. Bayesian predictive distribution and prediction bounds of the p-th order statistics is obtained where an outlier of type $\theta\delta$ is present. In this connection, some identies are derived.

  • PDF

M-추정을 사용한 국방과학기술 수준조사 기술성장모형의 이상치 제거 (Elimination of Outlier from Technology Growth Curve using M-estimator for Defense Science and Technology Survey)

  • 김장헌
    • 한국군사과학기술학회지
    • /
    • 제23권1호
    • /
    • pp.76-86
    • /
    • 2020
  • Technology growth curve methodology is commonly used in technology forecasting. A technology growth curve represents the paths of product performance in relation to time or investment in R&D. It is a useful tool to compare the technological performances between Korea and advanced nations and to describe the inflection points, the limit of improvement of a technology and their technology innovation strategies, etc. However, the curve fitting to a set of survey data often leads to model mis-specification, biased parameter estimation and incorrect result since data through survey with experts frequently contain outlier in process of curve fitting due to the subjective response characteristics. This paper propose a method to eliminate of outlier from a technology growth curve using M-estimator. The experimental results prove the overall improvement in technology growth curves by several pilot tests using real-data in Defense Science and Technology Survey reports.