Browse > Article
http://dx.doi.org/10.5351/KJAS.2011.24.3.477

A Study on the Adjustment of Posterior Probability for Oversampling when the Target is Rare  

Kim, U.N. (BC Card)
Lee, S.K. (Department of Statistics, Sungshin Women's University)
Choi, J.H. (Department of Information & Statistics, Korea University)
Publication Information
The Korean Journal of Applied Statistics / v.24, no.3, 2011 , pp. 477-484 More about this Journal
Abstract
When an event of target variable is rare, a widespread strategy is to build a model on the sample that disproportionally over-represents the events, that is over-sampled. Using the data over-sampled from the original data set, the predicted values would be biased; however, it can be easily corrected to represent the population. In this study, we investigate into the relationship between the proportion of rare event on a data-mart and the model performance using real world data of a Korean credit card company. Also, we use the methods for adjusting of posterior probability for over-sampled data of the offset method and the weighted method. Finally, we compare the performance of the methods using real data sets.
Keywords
Over-sampling; adjusting of posterior probability; rare event offset method; weighted method;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 강현철, 한상태, 최종후, 이성건, 김은석, 엄익현, 김미경 (2006). 고객관계관리(CRM)를 위한 데이터마이닝 방법론, 자유아카데미.
2 이태림, 구자용, 박헌진, 이긍희, 최대우 (2004). 데이터마이닝, 한국방송통신대학교출판부.
3 장남식, 홍성완, 장재호 (1999). 데이터 마이닝, 대청미디어.
4 Galit, S., Nitin, R. P. and Peter, C. B. (2006). Data Mining for Business Intelligence, John Wiley & Sons, New York.
5 Scott, A. J. and Wild, C. J. (1986). Fitting logistic regression models under case-control or choice based sampling, Journal of the Royal Statistical Society B, 48, 170-182.
6 Scott, A. J. and Wild, C. J. (1997). Fitting regression models to case-control data by maximum likelihood, Biometrika, 84, 57-71.   DOI   ScienceOn