• Title/Summary/Keyword: Kolmogorov-smirnov statistic

Search Result 28, Processing Time 0.02 seconds

Estimation of p-values with Two Dimensional Null Distributions from Genomic Data Set

  • Yee, Jaeyong;Park, Mira
    • Journal of the Korean Data Analysis Society
    • /
    • v.20 no.6
    • /
    • pp.2711-2719
    • /
    • 2018
  • When an observable is described by a single value, the statistic significance may be estimated by construction of null distribution using permutation and counting the portion of it that exceeds the observed value by chance. Genome-wide association study usually focuses on the association measure between a single or interacting genotypes with a single phenotype. However investigation of common genotypes associated simultaneously on multiple phenotypes may involve the observables that should be described with multiple numbers. Statistical significance for such an observable would involve null distribution in multiple dimensions. In this study, extension of the p-value estimation process using null distribution in one dimension has been sought that may be applicable to two dimensional case. Comparison of the position of points within the set of points they form has been proposed to use a positioning parameter inspired by the extension of the Kolmogorov-Smirnov statistic to two dimensions.

Adjusted ROC and CAP Curves (조정된 ROC와 CAP 곡선)

  • Hong, Chong-Sun;Kim, Ji-Hun;Choi, Jin-Soo
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.1
    • /
    • pp.29-39
    • /
    • 2009
  • Among others, ROC and CAP curves are used to explore the discriminatory power between the defaults and non-defaults, based on the distribution of the probability of default in credit rating works. ROC and CAP curves are plotted in terms of various ratios of the probability of default. Each point on ROC and CAP curves is calculated according to cutting points (scores) for classifying between defaults and non-defaults. In this paper, adjusted ROC and CAP curves are proposed by using functions of ratios of the probability of default. It is possible to recognize the score corresponding to a point oil these adjusted curves, and we can identify the best score to show the optimal discriminatory power. Moreover, we discuss the relationships between the best score obtained from the adjusted ROC and CAP curves and the score corresponding to Kolmogorov - Smirnov statistic to test the homogeneous distribution functions of the defaults and non-defaults.

Optimal Threshold from ROC and CAP Curves (ROC와 CAP 곡선에서의 최적 분류점)

  • Hong, Chong-Sun;Choi, Jin-Soo
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.5
    • /
    • pp.911-921
    • /
    • 2009
  • Receiver Operating Characteristic(ROC) and Cumulative Accuracy Profile(CAP) curves are two methods used to assess the discriminatory power of different credit-rating approaches. The points of optimal classification accuracy on an ROC curve and of maximal profit on a CAP curve can be found by using iso-performance tangent lines, which are based on the standard notion of accuracy. In this paper, we offer an alternative accuracy measure called the true rate. Using this rate, one can obtain alternative optimal threshold points on both ROC and CAP curves. For most real populations of borrowers, the number of the defaults is much less than that of the non-defaults, and in such cases the true rate may be more efficient than the accuracy rate in terms of cost functions. Moreover, it is shown that both alternative scores of optimal classification accuracy and maximal profit are the identical, and this single score coincides with the score corresponding to Kolmogorov-Smirnov statistic used to test the homogeneous distribution functions of the defaults and non-defaults.

Building credit scoring models with various types of target variables (목표변수의 형태에 따른 신용평점 모형 구축)

  • Woo, Hyun Seok;Lee, Seok Hyung;Cho, HyungJun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.1
    • /
    • pp.85-94
    • /
    • 2013
  • As the financial market becomes larger, the loss increases due to the failure of the credit risk managements from the poor management of the customer information or poor decision-making. Thus, the credit risk management also becomes more important and it is essential to develop a credit scoring model, which is a fundamental tool used to minimize the credit risk. Credit scoring models have been studied and developed only for binary target variables. In this paper, we consider other types of target variables such as ordinal multinomial data or longitudinal binary data and suggest credit scoring models. We then apply our developed models to real data and random data, and investigate their performance through Kolmogorov-Smirnov statistic.

Two optimal threshold criteria for ROC analysis

  • Cho, Min Ho;Hong, Chong Sun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.1
    • /
    • pp.255-260
    • /
    • 2015
  • Among many optimal threshold criteria from ROC curve, the closest-to-(0,1) and amended closest-to-(0,1) criteria are considered. An ROC curve that passes close to the (0,1) point indicates that two models are well classified. In this case, the ROC curve is located far from the (1,0) point. Hence we propose two criteria: the farthest-to-(1,0) and amended farthest-to-(1,0) criteria. These criteria are found to have a relationship with the KolmogorovSmirnov statistic as well as some optimal threshold criteria. Moreover, we derive that a definition for the proposed criteria with more than two dimensions and with relations to multi-dimensional optimal threshold criteria.

A Study on the Template Matching Methods for Hand Vein Pattern Recognition (손등의 정맥패턴 인식을 위한 원형정합방법의 비교 연구)

  • Choi, Hwan-Soo;Park, Seong-Hyuk;Jung, Dong-Chul
    • Proceedings of the KIEE Conference
    • /
    • 1998.07g
    • /
    • pp.2231-2233
    • /
    • 1998
  • 본 논문은 손등의 정맥패턴을 이용한 개인식별을 위해 개발된 3가지의 알고리즘에 관해 각각의 성능을 비교한 결과를 제시한다. 세가지 방법은 각각 Unsharp Masking을 이용한 이치화 후 정맥과 손등 배경의 면적을 이용한 가중치를 적용한 원형정합 알고리즘[1]과 Kolmogorov Smirnov(KS) statistic[2]을 이용한 매칭 알고리즘을 개선한 방식, 그리고 정맥의 세선화 처리 후 분기점의 좌표, 정맥의 길이, 정맥 가지 사이의 분기각도 등의 특징벡터를 이용한 방법 등이다. 본 연구에서는 전처리 과정에 있어서, 원시영상의 혈관부위와 배경부위의 gray scale 분포가 겹친 상태에서 Unsharp Masking 필터링을 적용한 결과가 기타 다른 전처리 방식보다 우수하게 영상을 강화시킬 수 있음을 확인하였고, 가중치를 이용한 매칭방식이 다른 매칭방식보다 우수함을 확인하였다.

  • PDF

Length-biased Rayleigh distribution: reliability analysis, estimation of the parameter, and applications

  • Kayid, M.;Alshingiti, Arwa M.;Aldossary, H.
    • International Journal of Reliability and Applications
    • /
    • v.14 no.1
    • /
    • pp.27-39
    • /
    • 2013
  • In this article, a new model based on the Rayleigh distribution is introduced. This model is useful and practical in physics, reliability, and life testing. The statistical and reliability properties of this model are presented, including moments, the hazard rate, the reversed hazard rate, and mean residual life functions, among others. In addition, it is shown that the distributions of the new model are ordered regarding the strongest likelihood ratio ordering. Four estimating methods, namely, method of moment, maximum likelihood method, Bayes estimation, and uniformly minimum variance unbiased, are used to estimate the parameters of this model. Simulation is used to calculate the estimates and to study their properties. Finally, the appropriateness of this model for real data sets is shown by using the chi-square goodness of fit test and the Kolmogorov-Smirnov statistic.

  • PDF

Criterion of Test Statistics for Validation in Credit Rating Model (신용평가모형에서 타당성검증 통계량들의 판단기준)

  • Park, Yong-Seok;Hong, Chong-Sun;Lim, Han-Seung
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.2
    • /
    • pp.239-347
    • /
    • 2009
  • This paper presents Kolmogorov-Smirnov, mean difference, AUROC and AR, four well known statistics that have been widely used for evaluating the discriminatory power of credit rating models. Criteria for these statistics are determined by the value of mean difference under the assumption of normality and equal standard deviation. Alternative criteria are proposed through the simulations according to various sample sizes, type II error rates, and the ratio of bads, also we suggest the meaning of statistic on the basis of discriminatory power. Finally we make a comparative study of the currently used guidelines and simulated results.