• 제목/요약/키워드: Outlier diagnostic

검색결과 12건 처리시간 0.019초

Outlier Detection Diagnostic based on Interpolation Method in Autoregressive Models

  • Cho, Sin-Sup;Ryu, Gui-Yeol;Park, Byeong-Uk;Lee, Jae-June
    • Journal of the Korean Statistical Society
    • /
    • 제22권2호
    • /
    • pp.283-306
    • /
    • 1993
  • An outlier detection diagnostic for the detection of k-consecutive atypical observations is considered. The proposed diagnostic is based on the innovational variance estimate utilizing both the interpolated and the predicted residuals. We adopt the interpolation method to construct the proposed diagnostic by replacing atypical observations. The perfomance of the proposed diagnositc is investigated by simulation. A real example is presented.

  • PDF

두개의 공분산 행렬의 동질성 검정에서의 영향치 분석 (Influence in Testing the Equality of Two Covariance Matrices)

  • Myung Geun Kim
    • 응용통계연구
    • /
    • 제7권2호
    • /
    • pp.213-224
    • /
    • 1994
  • 두개의 공분산 행렬의 동질성을 검정하는데 있어서, influence curve 방법을 이용하여 outlier를 찾는데 유용한 진단법을 제시한다. 이러한 진단법은 두개 이상의 공분산 행렬의 경우에 쉽게 일반화된다. 경험적 분포함수에 입각한 진단법의 sample version을 고려하며, 이것은 Wilks가 제안한 한개의 outlier를 찾는데 필요한 통계량과 두개의 모집단의 경우로 일반화된 Wilks 통계량을 포함한다.

  • PDF

유전자 알고리듬을 이용한 다중이상치 탐색

  • 고영현;이혜선;전치혁
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2000년도 추계학술발표회 논문집
    • /
    • pp.173-179
    • /
    • 2000
  • Genetic algorithm(GA) is applied for detecting multiple outliers. GA is a heuristic optimization tool solving for near optimal solution. We compare the performance of GA and the other diagnostic measures commonly used for detecting outliers in regression model. The results show that GA seems to have better performance than the others for the detection of multiple outliers.

  • PDF

A Study on Applications of Regression Diagnostic Method to Technometrics, and the Statistical Quality Control

  • Kim, Soon-Kwi
    • 품질경영학회지
    • /
    • 제21권1호
    • /
    • pp.55-64
    • /
    • 1993
  • This article is concerned with procedures for detecting one or more outliers or influential observations in a linear regression model. A test procedure, based on recursive residuals is proposed and developed The power of the test procedure to identify one or more outliers is investigated through simulation, and its relevance to the number and configuration of the outlier.

  • PDF

화학적산소요구량의 총유기탄소 변환을 위한 이상자료의 탐지와 처리 (Outlier Detection and Treatment for the Conversion of Chemical Oxygen Demand to Total Organic Carbon)

  • 조범준;조홍연;김성
    • 한국해안·해양공학회논문집
    • /
    • 제26권4호
    • /
    • pp.207-216
    • /
    • 2014
  • 총유기탄소(TOC)는 해양의 탄소순환 연구분야에서 직접적인 생물학적 지표로 이용되는 중요한 인자다. 가용한 TOC 자료가 상대적으로 화학적산소요구량(COD) 자료 보다 부족하기 때문에 COD 자료를 활용하여 TOC 자료를 추정할 수 있다. COD를 TOC 로의 변환 시 TOC 추정에 직접적으로 영향을 미치는 COD 관측자료에 포함된 이상자료의 탐지와 적절한 처리는 합리적이고 객관적으로 수행되어야 한다. 본 연구에서는 국내 연안해역에서 관측된 염분, COD 및 TOC 자료에 대한 최적회귀모형을 제시하였다. 최적회귀모형은 이상자료와 영향자료를 여러 가지 탐색방법으로 진단하여 제거 전 후의 자료 개수 변화, 변동계수 및 RMS 오차를 비교 및 분석하여 선택하였다. 연구수행 결과, Cook의 진단방법과 SIQR의 boxplot 방법을 조합한 방법이 가장 적절한 것으로 파악되었다. 최적 회귀 함수는 TOC(mg/L) = $0.44{\cdot}COD(mg/L)+1.53$ 이고, 결정계수는 0.47 정도로 나타났으며, RMS 오차는 0.85 mg/L이다. RMS 오차와 지레계수(leverage values)의 변동계수는 이상자료 제거 전에 비하여 각각 31%, 80%로 크게 감소되었다. 본 연구에서 제시된 방법을 통해 COD와 TOC 관측자료에 포함된 이상자료와 영향자료의 과도한 영향을 진단 및 제거하였기 때문에 보다 적절한 회귀곡선식을 제시할 수 있었다.

대화식 의사결정나무를 이용한 보건의료 데이터 질 관리 알고리즘 개발: 당뇨환자의 고혈압 동반을 중심으로 (Development of Healthcare Data Quality Control Algorithm Using Interactive Decision Tree: Focusing on Hypertension in Diabetes Mellitus Patients)

  • 황규연;이은숙;김고원;홍성옥;박정선;곽미숙;이예진;임채혁;박태현;박종호;강성홍
    • 보건의료산업학회지
    • /
    • 제10권3호
    • /
    • pp.63-74
    • /
    • 2016
  • Objectives : There is a need to develop a data quality management algorithm to improve the quality of healthcare data using a data quality management system. In this study, we developed a data quality control algorithms associated with diseases related to hypertension in patients with diabetes mellitus. Methods : To make a data quality algorithm, we extracted the 2011 and 2012 discharge damage survey data from diabetes mellitus patients. Derived variables were created using the primary diagnosis, diagnostic unit, primary surgery and treatment, minor surgery and treatment items. Results : Significant factors in diabetes mellitus patients with hypertension were sex, age, ischemic heart disease, and diagnostic ultrasound of the heart. Depending on the decision tree results, we found four groups with extreme values for diabetes accompanying hypertension patients. Conclusions : There is a need to check the actual data contained in the Outlier (extreme value) groups to improve the quality of the data.

Value of Contrast-Enhanced Ultrasonography in the Differential Diagnosis of Enlarged Lymph Nodes: a Meta-Analysis of Diagnostic Accuracy Studies

  • Jin, Ya;He, Yu-Shuang;Zhang, Ming-Ming;Parajuly, Shyam Sundar;Chen, Shuang;Zhao, Hai-Na;Peng, Yu-Lan
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제16권6호
    • /
    • pp.2361-2368
    • /
    • 2015
  • Objective: To evaluate the diagnostic accuracy of contrast-enhanced ultrasonography (CEUS) in differentiating between benign and malignant enlarged lymph nodes using meta-analysis. Materials and Methods: Pubmed, Embase, SCI and Cochrane databases were searched for studies (up to September 1, 2014) reporting the diagnostic performance of CEUS in discriminating between benign and malignant lymph nodes. Inclusion criteria were: prospective study; histopathology as the reference standard; and sufficient data to construct $2{\times}2$ contingency tables. Methodological quality was assessed using Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2). Patient clinical characteristics, sensitivity and specificity were extracted. The summary receiver operating characteristic curve was used to examine the accuracy of CEUS. A meta-analysis was performed to evaluate the clinical utility in identification of benign and malignant lymph nodes. Sensitivity analysis was performed after omitting outliers identified in a bivariate boxplot and publication bias was assessed with Egger testing. Results: The pooled sensitivity, specificity and AUROC were 0.92 (95%CI, 0.85-0.96), 0.91 (95%CI, 0.82-0.95) and 0.97 (95%CI, 0.95-0.98), respectively. After omitting 3 outlier studies, heterogeneity decreased. Sensitivity analysis demonstrated no disproportionate influences of individual studies. Publication bias was not significant. Conclusions: CEUS is a promising diagnostic modality in differentiating between benign and malignant lymph nodes and can potentially reduce unnecessary fine-needle aspiration biopsies of benign nodes.

온도변화에 자유로운 임피던스 기반 국부 손상검색 (Temperature Effect-free Impedance-based Local Damage Detection)

  • 구기영;박승희;이종재;윤정방
    • 한국전산구조공학회:학술대회논문집
    • /
    • 한국전산구조공학회 2007년도 정기 학술대회 논문집
    • /
    • pp.21-26
    • /
    • 2007
  • This paper presents an impedance-based structural health monitoring (SHM) technique considering temperature effects. The temperature variation results in a significant impedance variation, particularly both horizontal and vertical shifts in the frequency domain, which may lead to erroneous diagnostic results of real structures. A new damage detection strategy has been proposed based on the correlation coefficient (CC) between the reference impedance data and a concurrent impedance data with an effective frequency shift which is defined as the shift causing the maximum correlation. The proposed technique was applied to a lab-sized steel truss bridge member under the temperature varying environment. From an experimental study, it has been demonstrated that a narrow cut inflicted artificially to the steel structure was successfully detected using the proposed SHM strategy.

  • PDF

KDRG를 이용한 건강보험 외래 진료비 분류 타당성 (On Feasibility of Ambulatory KDRGs for the Classification of Health Insurance Claims)

  • 박하영;박기동;신영수
    • 보건행정학회지
    • /
    • 제13권1호
    • /
    • pp.98-115
    • /
    • 2003
  • Concerns about growing health insurance expenditures became a national Issue in 2001 when the National Health Insurance went into a deficit. Increases in spending for ambulatory care shared the largest portion of the problem. Methods and systems to control the spending should be developed and a system to measure case mix of providers is one of core components of the control system. The objectives of this article is to examine the feasibility of applying Korean Diagnosis Related Groups (KDRGs) to classify health insurance claims for ambulatory care and to identify problem areas of the classification. A database of 11,586,270 claims for ambulatory care delivered during January 2002 was obtained for the study, and the final number of claims analyzed was 8,319,494 after KDRG numbers were assigned to the data and records with an error KDRG were excluded from the study. The unit of analysis was a claim and resource use was measured by the sum of charges incurred during a month at a department of a hospital of at a clinic. Within group variance was assessed by th coefficient of variation (CV), and the classification accuracy was evaluated by the variance reduction achieved by the KDRG classification. The analyses were performed on both all and non-outlier data, and on a subset of the database to examine the validity of study results. Data were assigned to 787 KDRGs among 1,244 KDRGs defined in the classification system. For non-outlier data, 77.4% of KDRGs had a CV of charges from tertiary care hospitals less than 100% and 95.43% of KDRGs for data from clinics. The variance reduction achieved by the KDRG classification was 40.80% for non-outlier claims from tertiary care hospitals, 51.98% for general hospitals, 40.89% for hospitals, and 54.99% for clinics. Similar results were obtained from the analyses performed on a subset of the study database. The study results indicated that KDRGs developed for a classification of inpatient care could be used for ambulatory care, although there were areas where the classification should be refined. Its power to predict tile resource utilization showed a potential for its application to measure case mix of providers for monitoring and managing delivery of ambulatory care. The issue concerning the quality of diagnostic information contained in insurance claims remains to be improved, and significance of future studies for other classification systems based on visits or episodes is guaranteed.

Effect of outliers on the variable selection by the regularized regression

  • Jeong, Junho;Kim, Choongrak
    • Communications for Statistical Applications and Methods
    • /
    • 제25권2호
    • /
    • pp.235-243
    • /
    • 2018
  • Many studies exist on the influence of one or few observations on estimators in a variety of statistical models under the "large n, small p" setup; however, diagnostic issues in the regression models have been rarely studied in a high dimensional setup. In the high dimensional data, the influence of observations is more serious because the sample size n is significantly less than the number variables p. Here, we investigate the influence of observations on the least absolute shrinkage and selection operator (LASSO) estimates, suggested by Tibshirani (Journal of the Royal Statistical Society, Series B, 73, 273-282, 1996), and the influence of observations on selected variables by the LASSO in the high dimensional setup. We also derived an analytic expression for the influence of the k observation on LASSO estimates in simple linear regression. Numerical studies based on artificial data and real data are done for illustration. Numerical results showed that the influence of observations on the LASSO estimates and the selected variables by the LASSO in the high dimensional setup is more severe than that in the usual "large n, small p" setup.