Search | Korea Science

Kim, HanYong;Lee, Woojoo
- The Korean Journal of Applied Statistics
- /
- v.30 no.5
- /
- pp.681-690
- /
- 2017
Various imbalanced binary classification problems exist such as fraud detection in banking operations, detecting spam mail and predicting defective products. Several sampling methods such as over sampling, under sampling, SMOTE have been developed to overcome the poor prediction performance of binary classifiers when the proportion of one group is dominant. In order to overcome this problem, several sampling methods such as over-sampling, under-sampling, SMOTE have been developed. In this study, we investigate prediction performance of logistic regression, Lasso, random forest, boosting and support vector machine in combination with the sampling methods for binary imbalanced data. Four real data sets are analyzed to see if there is a substantial improvement in prediction performance. We also emphasize some precautions when the sampling methods are implemented.
https://doi.org/10.5351/KJAS.2017.30.5.681 인용 PDF KSCI

Lim, Hwa-Kyung;Song, Seuck-Heun;Song, Ju-Won
- The Korean Journal of Applied Statistics
- /
- v.22 no.2
- /
- pp.365-373
- /
- 2009
Since 1990, identifying the sex of fetus and illegal abortion has brought the sex ratio imbalance at birth in Korea due to a notion of preferring a son to a daughter, socio-economic development, population policy, and so forth. Although there have been many researches such as time series analysis and region difference analysis to monitor this sex ratio imbalance, they have a defect that time and space could not be included in the analysis simultaneously. This study analyzes the sex ratio imbalance at birth, taking into account time and region at the same time. The analysis considered the numbers of male and female babies, who were born as the third or latter in their families, in 2000 and 2001 at 234 Gu / Si / Goon administrative districts. Here, we suggest a mixture model of binomial distributions, assuming heterogeneous populations. The estimation of the location parameters, weights and correlation coefficient of the mixture model is conducted by the EM algorithm, and the heterogeneity of the regions is expressed as a picture using ArcView GIS.
https://doi.org/10.5351/KJAS.2009.22.2.365 인용 PDF KSCI

Hong, C.S.;Won, C.H.
- The Korean Journal of Applied Statistics
- /
- v.29 no.2
- /
- pp.309-319
- /
- 2016
For binary classification models, we consider a risk score that is a function of linear scores and estimate the coefficients of the linear scores. There are two estimation methods: one is to obtain MLEs using logistic models and the other is to estimate by maximizing AUC. AUC approach estimates are better than MLEs when using logistic models under a general situation which does not support logistic assumptions. This paper considers imbalanced data that contains a smaller number of observations in the default class than those in the non-default for credit assessment models; consequently, the AUC approach is applied to imbalanced data. Various logit link functions are used as a link function to generate imbalanced data. It is found that predicted coefficients obtained by the AUC approach are equivalent to (or better) than those from logistic models for low default probability - imbalanced data.
https://doi.org/10.5351/KJAS.2016.29.2.309 인용 PDF KSCI

Lee, Kyungeun;Kim, Kyoung Hee;Shin, Seung Jun
- The Korean Journal of Applied Statistics
- /
- v.30 no.5
- /
- pp.793-808
- /
- 2017
We compare various variable screening methods on multiclass classification problems when the data is ultrahigh-dimensional. Two different approaches were considered: (1) pairwise extension from binary classification via one versus one or one versus rest comparisons and (2) direct classification of multiclass responses. We conducted extensive simulation studies under different conditions: heavy tailed explanatory variables, correlated signal and noise variables, correlated joint distributions but uncorrelated marginals, and unbalanced response variables. We then analyzed real data to examine the performance of the methods. The results showed that model-free methods perform better for multiclass classification problems as well as binary ones.
https://doi.org/10.5351/KJAS.2017.30.5.793 인용 PDF KSCI

Kim, Hyeon-Wook;Lee, Hangyong
- KDI Journal of Economic Policy
- /
- v.29 no.1
- /
- pp.177-196
- /
- 2007
This paper investigates the cyclical patterns of buffer capital using an unbalanced panel data for the banks in 30 OECD countries and 7 non-OECD Asian countries. We test whether the relationships between buffer capital and business cycle are systematically different across country groups controlling for other potential determinants of bank capital. We find that the correlation is positive for developed countries while it is negative for Asian developing countries. These findings suggest that, once Basel II is implemented, developing countries are more likely to observe an increase in output volatility. We then review the policy recommendations to mitigate the procyclicality problem of Basel II.
https://doi.org/10.23895/kdijep.2007.29.1.177 인용 PDF