• Title/Summary/Keyword: ROC 곡선

Search Result 158, Processing Time 0.025 seconds

Alternative Optimal Threshold Criteria: MFR (대안적인 분류기준: 오분류율곱)

  • Hong, Chong Sun;Kim, Hyomin Alex;Kim, Dong Kyu
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.5
    • /
    • pp.773-786
    • /
    • 2014
  • We propose the multiplication of false rates (MFR) which is a classification accuracy criteria and an area type of rectangle from ROC curve. Optimal threshold obtained using MFR is compared with other criteria in terms of classification performance. Their optimal thresholds for various distribution functions are also found; consequently, some properties and advantages of MFR are discussed by comparing FNR and FPR corresponding to optimal thresholds. Based on general cost function, cost ratios of optimal thresholds are computed using various classification criteria. The cost ratios for cost curves are observed so that the advantages of MFR are explored. Furthermore, the de nition of MFR is extended to multi-dimensional ROC analysis and the relations of classification criteria are also discussed.

Application of Receiver Operating Characteristic (ROC) Curve for Evaluation of Diagnostic Test Performance (진단검사의 특성 평가를 위한 Receiver Operating Characteristic (ROC) 곡선의 활용)

  • Pak, Son-Il;Oh, Tae-Ho
    • Journal of Veterinary Clinics
    • /
    • v.33 no.2
    • /
    • pp.97-101
    • /
    • 2016
  • In the field of clinical medicine, diagnostic accuracy studies refer to the degree of agreement between the index test and the reference standard for the discriminatory ability to identify a target disorder of interest in a patient. The receiver operating characteristic (ROC) curve offers a graphical display the trade-off between sensitivity and specificity at each cutoff for a diagnostic test and is useful in assigning the best cutoff for clinical use. In this end, the ROC curve analysis is a useful tool for estimating and comparing the accuracy of competing diagnostic tests. This paper reviews briefly the measures of diagnostic accuracy such as sensitivity, specificity, and area under the ROC curve (AUC) that is a summary measure for diagnostic accuracy across the spectrum of test results. In addition, the methods of creating an ROC curve in single diagnostic test with five-category discrete scale for disease classification from healthy individuals, meaningful interpretation of the AUC, and the applications of ROC methodology in clinical medicine to determine the optimal cutoff values have been discussed using a hypothetical example as an illustration.

Odds curve and optimal threshold (오즈 곡선과 최적분류점)

  • Hong, Chong Sun;Oh, Tae Gyu;Oh, Se Hyeon
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.5
    • /
    • pp.807-822
    • /
    • 2021
  • Various accuracy measures that can be explained on the odds curve are discussed, and an alternative accuracy measure, the maximum square, is proposed based on the characteristics of the odds curve. Thresholds corresponding to these accuracy measures are obtained by considering various probability distribution functions and an illustrative example. Their characteristics are discussed while comparing many kinds of statistics measuring thresholds. Therefore, we can conclude that optimal thresholds could be explored from the odds curve, similar to the ROC curve, and that the maximum square measure can be used as a good accuracy measure that can improve the performance of the binary classification model.

Standard criterion of hypervolume under the ROC manifold (ROC 다면체 아래 체적의 판단기준)

  • Hong, C.S.;Jung, D.G.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.3
    • /
    • pp.473-483
    • /
    • 2014
  • Even though the ROC manifold for more than three dimensional space which is an extension of the ROC curve and surface has difficulty to represent graphically, the hypervolume under the ROC manifold (HUM) statistic can be defined and obtained based on AUC and VUS measures for the ROC curve and the ROC surface. Hence the definition and characteristics of the HUM for four dimensional space are studied in this work. By extension of the standard criterion of AUC for probabilities of default based on Basel II, the 13 classes of standard criterion of HUM are proposed in order to discriminate four classification models and some application methods are discussed. In order to explore the standard criterion of HUM whose values are obtained from various distributions, ternary plot is used and explained.

Cost Ratios for Cost and ROC Curves (비용곡선과 ROC곡선에서의 비용비율)

  • Hong, Chong-Sun;Yoo, Hyun-Sang
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.6
    • /
    • pp.755-765
    • /
    • 2010
  • For classification problems on mixture distribution, a threshold based on cost functions is optimal from the viewpoint of a minimum expected cost. Assuming that there is no cost information, we propose cost ratios in the expected cost corresponding to thresholds where the total accuracy and the true rate are maximized to explain the relation of these cost ratios minimizing the expected cost. Other cost ratios are also proposed by comparing the normalized expected costs when classification accuracy is maximized. The values of these cost ratios are located between two cost ratios for the expected costs based on classification accuracies, and converge to that of the minimum expected cost. This work suggests two cost ratios: one is minimized by the expected cost and the normalized expected cost, and the other in the expected cost and the normalized expected cost functions that are maximized classification accuracies. We discuss their compatibility based on the relation of these cost ratios.

Multivariate Outlier Removing for the Risk Prediction of Gas Leakage based Methane Gas (메탄 가스 기반 가스 누출 위험 예측을 위한 다변량 특이치 제거)

  • Dashdondov, Khongorzul;Kim, Mi-Hye
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.12
    • /
    • pp.23-30
    • /
    • 2020
  • In this study, the relationship between natural gas (NG) data and gas-related environmental elements was performed using machine learning algorithms to predict the level of gas leakage risk without directly measuring gas leakage data. The study was based on open data provided by the server using the IoT-based remote control Picarro gas sensor specification. The naturel gas leaks into the air, it is a big problem for air pollution, environment and the health. The proposed method is multivariate outlier removing method based Random Forest (RF) classification for predicting risk of NG leak. After, unsupervised k-means clustering, the experimental dataset has done imbalanced data. Therefore, we focusing our proposed models can predict medium and high risk so best. In this case, we compared the receiver operating characteristic (ROC) curve, accuracy, area under the ROC curve (AUC), and mean standard error (MSE) for each classification model. As a result of our experiments, the evaluation measurements include accuracy, area under the ROC curve (AUC), and MSE; 99.71%, 99.57%, and 0.0016 for MOL_RF respectively.

A Comparison of the Interval Estimations for the Difference in Paired Areas under the ROC Curves (대응표본에서 AUC차이에 대한 신뢰구간 추정에 관한 고찰)

  • Kim, Hee-Young
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.2
    • /
    • pp.275-292
    • /
    • 2010
  • Receiver operating characteristic(ROC) curves can be used to assess the accuracy of tests measured on ordinal or continuous scales. The most commonly used measure for the overall diagnostic accuracy of diagnostic tests is the area under the ROC curve(AUC). When two ROC curves are constructed based on two tests performed on the same individuals, statistical analysis on differences between AUCs must take into account the correlated nature of the data. This article focuses on confidence interval estimation of the difference between paired AUCs. We compare nonparametric, maximum likelihood, bootstrap and generalized pivotal quantity methods, and conduct a monte carlo simulation to investigate the probability coverage and expected length of the four methods.

ROC Function Estimation (ROC 함수 추정)

  • Hong, Chong-Sun;Lin, Mei Hua;Hong, Sun-Woo
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.6
    • /
    • pp.987-994
    • /
    • 2011
  • From the point view of credit evaluation whose population is divided into the default and non-default state, two methods are considered to estimate conditional distribution functions: one is to estimate under the assumption that the data is followed the mixture normal distribution and the other is to use the kernel density estimation. The parameters of normal mixture are estimated using the EM algorithm. For the kernel density estimation, five kinds of well known kernel functions and four kinds of the bandwidths are explored. In addition, the corresponding ROC functions are obtained based on the estimated distribution functions. The goodness-of-fit of the estimated distribution functions are discussed and the performance of the ROC functions are compared. In this work, it is found that the kernel distribution functions shows better fit, and the ROC function obtained under the assumption of normal mixture shows better performance.

Development of Drought Index based on Streamflow for Monitoring Hydrological Drought (수문학적 가뭄감시를 위한 하천유량 기반 가뭄지수 개발)

  • Yoo, Jiyoung;Kim, Tae-Woong;Kim, Jeong-Yup;Moon, Jang-Won
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.37 no.4
    • /
    • pp.669-680
    • /
    • 2017
  • This study evaluated the consistency of the standard flow to forecast low-flow based on various drought indices. The data used in this study were streamflow data at the Gurye2 station located in the Seomjin River and the Angang station located in the Hyeongsan River, as well as rainfall data of nearby weather stations (Namwon and Pohang). Using streamflow data, the streamflow accumulation drought index (SADI) was developed in this study to represent the hydrological drought condition. For SADI calculations, the threshold of drought was determined by a Change-Point analysis of the flow pattern and a reduction factor was estimated based on the kernel density function. Standardized runoff index (SRI) and standardized precipitation index (SPI) were also calculated to compared with the SADI. SRI and SPI were calculated for the 30-, 90-, 180-, and 270-day period and then an ROC curve analysis was performed to determine the appropriate time-period which has the highest consistency with the standard flow. The result of ROC curve analysis indicated that for the Seomjin River-Gurye2 station SADI_C3, SRI30, SADI_C1, SADI_C2, and SPI90 were confirmed in oder of having high consistency with standard flow under the attention stage and for the Hyeongsan River-Angang station, SADI_C3, SADI_C1, SPI270, SRI30, and SADI_C2 have order of high consistency with standard flow under the attention stage.