• 제목/요약/키워드: influential data point

검색결과 66건 처리시간 0.021초

On Sensitivity Analysis in Principal Component Regression

  • Kim, Soon-Kwi;Park, Sung H.
    • Journal of the Korean Statistical Society
    • /
    • 제20권2호
    • /
    • pp.177-190
    • /
    • 1991
  • In this paper, we discuss and review various measures which have been presented for studying outliers. high-leverage points, and influential observations when principal component regression is adopted. We suggest several diagnostics measures when principal component regression is used. A numerical example is illustrated. Some individual data points may be flagged as outliers, high-leverage point, or influential points.

  • PDF

Graphical Methods for the Sensitivity Analysis in Discriminant Analysis

  • Jang, Dae-Heung;Anderson-Cook, Christine M.;Kim, Youngil
    • Communications for Statistical Applications and Methods
    • /
    • 제22권5호
    • /
    • pp.475-485
    • /
    • 2015
  • Similar to regression, many measures to detect influential data points in discriminant analysis have been developed. Many follow similar principles as the diagnostic measures used in linear regression in the context of discriminant analysis. Here we focus on the impact on the predicted classification posterior probability when a data point is omitted. The new method is intuitive and easily interpretable compared to existing methods. We also propose a graphical display to show the individual movement of the posterior probability of other data points when a specific data point is omitted. This enables the summaries to capture the overall pattern of the change.

심장수술 환자의 회복의 질 영향 요인 (Influential Factors on Quality of Recovery of Patients Undergone Cardiac Surgery)

  • 김수연
    • 재활간호학회지
    • /
    • 제17권2호
    • /
    • pp.64-71
    • /
    • 2014
  • Purpose: The purpose of this study was to identify the quality of recovery and influential factors on the quality of recovery after cardiac surgery. Methods: 198 patients undergone cardiac surgery were asked to fill in a self-reported questionnaire about the quality of recovery, anxiety, depression including social support at discharge. The collected data were analyzed with mean, standard deviation, correlation and stepwised multiple regression. Results: The mean scores of quality of recovery at discharge after cardiac surgery was 2.04 on a 3 point scale. Influential factors on the quality of recovery after cardiac surgery were depression(p=.001) and anxiety(p=.027), which disclosed 44.2% of explanation. Depression was the most influential factor. Conclusion: The influential factors on the quality of recovery at discharge after cardiac surgery were depression and anxiety. More studies will be required to reduce depression and anxiety in patients undergone cardiac surgery.

일변량 자료의 왜도와 첨도에서 특이점의 영향을 평가하기 위한 탐색적 자료분석 그림도구로서의 불꽃그림 (Firework plot as a graphical exploratory data analysis tool for evaluating the impact of outliers in skewness and kurtosis of univariate data)

  • 문승호
    • 응용통계연구
    • /
    • 제29권2호
    • /
    • pp.355-368
    • /
    • 2016
  • 특이점 및 영향점은 자료분석을 하는 데 사용되는 계량적이고 기술적인 많은 측도들을 왜곡한다. 각종 자료분석에 있어서의 특이점 검색을 위한 검정 통계량이나 그림도구에 관한 연구는 꾸준히 전개되어 왔다. Jang과 Anderson-Cook (2014)은 불꽃그림이란 이름을 붙인 그림도구를 발표하였는데 이상점이나 영향점이 일변량/이변량 자료분석 및 회귀분석에 어떠한 영향을 미치는지 알기 위하여 3-D 불꽃그림 및 불꽃그림 행렬을 제시하였다. 본 연구에서는 이러한 불꽃그림이 일변량 자료의 왜도와 첨도에서 특이점의 영향을 평가하기 위한 탐색적 자료분석 그림도구로서 사용될 수 있음을 보였다.

Bayesian inference for an ordered multiple linear regression with skew normal errors

  • Jeong, Jeongmun;Chung, Younshik
    • Communications for Statistical Applications and Methods
    • /
    • 제27권2호
    • /
    • pp.189-199
    • /
    • 2020
  • This paper studies a Bayesian ordered multiple linear regression model with skew normal error. It is reasonable that the kind of inherent information available in an applied regression requires some constraints on the coefficients to be estimated. In addition, the assumption of normality of the errors is sometimes not appropriate in the real data. Therefore, to explain such situations more flexibly, we use the skew-normal distribution given by Sahu et al. (The Canadian Journal of Statistics, 31, 129-150, 2003) for error-terms including normal distribution. For Bayesian methodology, the Markov chain Monte Carlo method is employed to resolve complicated integration problems. Also, under the improper priors, the propriety of the associated posterior density is shown. Our Bayesian proposed model is applied to NZAPB's apple data. For model comparison between the skew normal error model and the normal error model, we use the Bayes factor and deviance information criterion given by Spiegelhalter et al. (Journal of the Royal Statistical Society Series B (Statistical Methodology), 64, 583-639, 2002). We also consider the problem of detecting an influential point concerning skewness using Bayes factors. Finally, concluding remarks are discussed.

영천호에서 남조류 발생과 환경요인의 관련성 연구 (A Study on the Relationship between Cyanobacteria and Environmental Factors in Yeongcheon Lake)

  • 이현미;신라영;이정호;박종근
    • 한국물환경학회지
    • /
    • 제35권4호
    • /
    • pp.352-361
    • /
    • 2019
  • The purpose of this study is to analyze the characteristics and correlations of the Yeongcheon Lake in order to reduce the occurrence of harmful cyanobacteria. In this study, we investigated the water quality and phytoplankton of the lake from May to November in 2017. Correlation and data mining analyses were performed to analyze the relationship between the two factors. The water temperature was lowest at the point where the Yeongcheon Lake inflow occurs at Imha Lake. It was highest at the point where the outflow occurs to Angye Lake. The pH was also highest at the outflow point, but in the case of DO, it was highest at the midpoint between the inflow and outflow. The main cyanobacteria that emerged during the study period were Oscillatorialimosa, Microcysti saeruginosa and Aphanizomenon flos-aquae. As a result of correlation analysis, the water temperature, inflow, COD loading, TOC loading at the inflow point of the Yeongcheon Lake were the items that were related to the harmful cyanobacteria. The data mining analysis indicated that the TP loading and harmful cyanobacteria in the inflow point of the Yeongcheon Lake were influential on the detrimental cyanobacteria in the Yeongcheon Lake outflow point. When the TP loading was less than 39.0 kg/day at the inflow site, it was expected that the amount of harmful cyanobacteria could be maintained below 10,000 cells/mL.

혼합물 실험에서 특이값의 영향을 평가하기 위한 그래픽 탐색적 자료분석 도구로서의 불꽃그림 (Firework Plot as a Graphical Exploratory Data Analysis Tool to Evaluate the Impact of Outliers in a Mixture Experiment)

  • 장대흥;안소진;김영일
    • 응용통계연구
    • /
    • 제27권4호
    • /
    • pp.629-643
    • /
    • 2014
  • 회귀모형을 이용하여 자료를 분석하는 경우 이상점이나 영향점과 같은 특이값들의 유무를 검정하는 회귀진단기법은 모형의 적합성을 체크하기 위한 필수적인 도구로 잡은 지 오래이다. 이러한 점들이 존재 하는 경우 회귀분석의 결과가 왜곡되어 해석이 된다. Jang과 Anderson-Cook (2013)은 불꽃그림이란 이름을 붙인 그림도구를 발표하였는데 관측값에 부여된 가중치를 1에서 0으로 변화함에 따라 이상점이나 영향점이 회귀계수 및 잔차제곱합(SSE)에 어떠한 영향을 미치는지 3차원 그림에 추적곡선을 그려 보았을 뿐 아니라 쌍으로 대비시켜 봄으로써 분석의 시각적인 효과를 증대시켰다. 본 연구에서는 더 나아가 이러한 시도가 기존 방법과 어떤 차이점이 있는지 2013년에는 반영치 않은 통계량을 포함해서 더 많은 해석이 가능한지 혼합물 실험 계획을 통해 다양한 통계량의 민감도 분석을 실행하였다. 왜냐하면 작은 혼합물실험인 자료인 경우 더욱 세밀한 통계량에 대한 민감도 분석이 필요하기 때문이다.

MULTIPLE OUTLIER DETECTION IN LOGISTIC REGRESSION BY USING INFLUENCE MATRIX

  • Lee, Gwi-Hyun;Park, Sung-Hyun
    • Journal of the Korean Statistical Society
    • /
    • 제36권4호
    • /
    • pp.457-469
    • /
    • 2007
  • Many procedures are available to identify a single outlier or an isolated influential point in linear regression and logistic regression. But the detection of influential points or multiple outliers is more difficult, owing to masking and swamping problems. The multiple outlier detection methods for logistic regression have not been studied from the points of direct procedure yet. In this paper we consider the direct methods for logistic regression by extending the $Pe\tilde{n}a$ and Yohai (1995) influence matrix algorithm. We define the influence matrix in logistic regression by using Cook's distance in logistic regression, and test multiple outliers by using the mean shift model. To show accuracy of the proposed multiple outlier detection algorithm, we simulate artificial data including multiple outliers with masking and swamping.

여대생의 월경전증후군에 영향을 미치는 요인 (The Influential Factors on Premenstrual Syndrome College Female Students)

  • 정금숙;오현미;최인령
    • 한국산학기술학회논문지
    • /
    • 제15권5호
    • /
    • pp.3025-3036
    • /
    • 2014
  • 본 연구는 여대생의 월경전증후군에 영향을 미치는 요인을 파악하여, 월경전증후군을 겪는 여대생에게 예방과 중재를 위한 프로그램 개발과 적용의 기초 자료로 활용하기 위한 서술적 조사연구이다. G시의 여대생 330명에 대한 자료를 2012년 4월 2일부터 4월 6일까지 수집하였다. 연구 결과 여대생의 스트레스 전체 평균 평점은 $2.50{\pm}.74$, 월경에 대한 태도 평균 평점은 $3.07{\pm}.02$, 월경전증후군의 전체 평점 평균은 $2.67{\pm}.60$이었다. 스트레스는 월경전증후군과 유의한 정적 상관관계가 있는 것으로 나타났고(r=.36, p<.001), 월경에 대한 태도와 월경전증후군도 유의한 정적 상관관계가 있는 것으로 나타났다(r=.34, p<.001). 월경전증후군에 영향을 미치는 요인을 확인하기 위하여 다중회귀분석을 실시한 결과 월경에 대한 태도, 스트레스 평점, 흡연, 월경통이 유의한 영향 요인이었으며, 설명력은 27%이었다. 가장 주요한 영향 요인은 월경에 대한 태도(${\beta}$=.28, p<.001)이었고, 그 다음으로는 스트레스 평점(${\beta}$=.27, p<.001), 흡연(${\beta}$=.20, p<.001), 월경통(${\beta}$=.15, p<.001)이었다. 이상의 연구 결과를 토대로 심리사회적 요인을 고려한 새로운 간호 중재 방법을 모색하고, 삶의 질 향상을 위한 내러티브적인 접근을 통한 질적 연구를 제언한다.

GIS를 이용한 사면위험도 작성기법 연구 (A Study on the Creation of Slope Instability Map Using Geographic Information Systems.)

  • 유명환
    • 자원환경지질
    • /
    • 제33권2호
    • /
    • pp.129-138
    • /
    • 2000
  • The various types of geohazards like landslides resulted from civil construction (i.e. highway construction) must of analysed considering all the possible influential factor systematically. Thus, by using GIS, slope stability can be evaluated, and it can be used as a data for further detailed investigation. So the aim of this study is to present a data for decision making in selecting suitable point for remediation. For analysing slope instability, through appropriate definition and classification, landslide mechanism must be understood. In building GIS model, the selection of appropriate factors and their rating system should be made. For this, the characteristics and the mechanism of landslide have to be understood. And suitable coverage should be chosen for the model considering the slope conditions. In this study, field investigation in lst and 2nd Section, Chung-ang highway was carried out. From the field data, GIS model on slope instability was created. 5 coverages were used for it. From the result of this study, 12 unstable sections were found out and more detailed investigation is needed there.

  • PDF