• Title/Summary/Keyword: 과대표집

Search Result 2, Processing Time 0.016 seconds

A Comparison of Ensemble Methods Combining Resampling Techniques for Class Imbalanced Data (데이터 전처리와 앙상블 기법을 통한 불균형 데이터의 분류모형 비교 연구)

  • Leea, Hee-Jae;Lee, Sungim
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.3
    • /
    • pp.357-371
    • /
    • 2014
  • There are many studies related to imbalanced data in which the class distribution is highly skewed. To address the problem of imbalanced data, previous studies deal with resampling techniques which correct the skewness of the class distribution in each sampled subset by using under-sampling, over-sampling or hybrid-sampling such as SMOTE. Ensemble methods have also alleviated the problem of class imbalanced data. In this paper, we compare around a dozen algorithms that combine the ensemble methods and resampling techniques based on simulated data sets generated by the Backbone model, which can handle the imbalance rate. The results on various real imbalanced data sets are also presented to compare the effectiveness of algorithms. As a result, we highly recommend the resampling technique combining ensemble methods for imbalanced data in which the proportion of the minority class is less than 10%. We also find that each ensemble method has a well-matched sampling technique. The algorithms which combine bagging or random forest ensembles with random undersampling tend to perform well; however, the boosting ensemble appears to perform better with over-sampling. All ensemble methods combined with SMOTE outperform in most situations.

The Clinical Utility of Korean Bayley Scales of Infant and Toddler Development-III - Focusing on using of the US norm - (베일리영유아발달검사 제3판(Bayley-III)의 미국 규준 적용의 문제: 미숙아 집단을 대상으로)

  • Lim, Yoo Jin;Bang, Hee Jeong;Lee, Soonhang
    • Korean journal of psychology:General
    • /
    • v.36 no.1
    • /
    • pp.81-107
    • /
    • 2017
  • The study aims to investigate the clinical utility of Bayley-III using US norm in Korea. A total of 98 preterm infants and 93 term infants were assessed with the K-Bayley-III. The performance pattern of preterm infants was analyzed with mixed design ANOVA which examined the differences of scaled scores and composite scores of Bayley-III between full term- and preterm- infant group and within preterm infants group. Then, We have investigated agreement between classifications of delay made using the BSID-II and Bayley-III. In addition, ROC plots were constructed to identify a Bayley-III cut-off score with optimum diagnostic utility in this sample. The results were as follows. (1) Preterm infants have significantly lower function levels in areas of 5 scaled scores and 3 developmental indexes compared with infants born at term. Significant differences among scores within preterm infant group were also found. (2) Bayley-III had the higher scores of the Mental Development Index and Psychomotor Developmental Index comparing to the scores of K-BSID-II, and had the lower rates of developmental delay. (3) All scales of Bayley-III, Cognitive, Language and Motor scale had the appropriate level of discrimination, but the cut-off composite scores of Bayley-III were adjusted 13~28 points higher than 69 for prediction of delay, as defined by the K-BSID-II. It explains the lower rates of developmental delay using the standard of two standard deviation. This study has provided empirical data to inform that we must careful when interpreting the score for clinical applications, identified the discriminating power, and proposed more appropriate cut-off scores. In addition, discussion about the sampling for making the Korean norm of Bayley-III was provided. It is preferable that infants in Korea should use our own validated norms. The standardization process to get Korean normative data must be performed carefully.