• Title/Summary/Keyword: 과소표집

Search Result 5, Processing Time 0.023 seconds

A Comparison of Ensemble Methods Combining Resampling Techniques for Class Imbalanced Data (데이터 전처리와 앙상블 기법을 통한 불균형 데이터의 분류모형 비교 연구)

  • Leea, Hee-Jae;Lee, Sungim
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.3
    • /
    • pp.357-371
    • /
    • 2014
  • There are many studies related to imbalanced data in which the class distribution is highly skewed. To address the problem of imbalanced data, previous studies deal with resampling techniques which correct the skewness of the class distribution in each sampled subset by using under-sampling, over-sampling or hybrid-sampling such as SMOTE. Ensemble methods have also alleviated the problem of class imbalanced data. In this paper, we compare around a dozen algorithms that combine the ensemble methods and resampling techniques based on simulated data sets generated by the Backbone model, which can handle the imbalance rate. The results on various real imbalanced data sets are also presented to compare the effectiveness of algorithms. As a result, we highly recommend the resampling technique combining ensemble methods for imbalanced data in which the proportion of the minority class is less than 10%. We also find that each ensemble method has a well-matched sampling technique. The algorithms which combine bagging or random forest ensembles with random undersampling tend to perform well; however, the boosting ensemble appears to perform better with over-sampling. All ensemble methods combined with SMOTE outperform in most situations.

Systematic Bias of Telephone Surveys: Meta Analysis of 2007 Presidential Election Polls (전화조사의 체계적 편향 - 2007년 대통령선거 여론조사들에 대한 메타분석 -)

  • Kim, Se-Yong;Huh, Myung-Hoe
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.2
    • /
    • pp.375-385
    • /
    • 2009
  • For 2007 Korea presidential election, most polls by telephone surveys indicated Lee Myung-Bak led the second runner-up Jung Dong-Young by certain margin. The margin between two candidates can be estimated accurately by averaging individual poll results, provided there exists no systematic bias in telephone surveys. Most Korean telephone surveys via telephone directory are based on quota samples, with the region, the gender and the age-band as quota variables. Thus the surveys may result in certain systematic bias due to unbalanced factors inherent in quota sampling. The aim of this study is to answer the following questions by the analytic methods adopted in Huh et al. (2004): Question 1. Wasn't there systematic bias in estimates of support rates. Question 2. If yes, what was the source of the bias? To answer the questions, we collected eighteen surveys administered during the election campaign period and applied the iterated proportional weighting (the rim weighting) to the last eleven surveys to obtain the balance in five factors - region, gender, age, occupation and education level. We found that the support rate of Lee Myung-Bak was over-estimated consistently by 1.4%P and that of Jung Dong-Young was underestimated by 0.6%P, resulting in the over-estimation of the margin by 2.0%P. By investigating the Lee Myung-Bak bias with logistic regression models, we conclude that it originated from the under-representation of less educated class and/or the over-representation of house wives in telephone samples.

Sample Distortion in Social Surveys and Effects of Weighting Adjustment: A Study of 18 Cases (사회조사에서 표본의 왜곡과 가중치 보정의 결과: 18개 사례연구)

  • Huh, Myung-Hoe;Yoon, Young-A;Lee, Yong-Goo
    • Survey Research
    • /
    • v.5 no.2
    • /
    • pp.31-48
    • /
    • 2004
  • We collected and analyzed 18 social surveys to assess the quality of samples with respect to region, gender, age-band, education level and occupation. We found in our samples that highly educated people and house wives are over-represented whereas low educated people, self-employed/blue collars and white collars are under-represented. To correct such sample distortions, we applied the iterative proportional weighting or the raking to our samples. We observed sizable changes in survey results. Also, the effective sample sizes were shrunken up to 20%-40%, that could be interpreted as the necessity of larger samples to meet the claimed sampling error limits.

  • PDF

Guaranteed Minimum Accumulated Benefit in Variable Annuities and Jump Risk (변액연금보험의 최저연금적립금보증과 점프리스크)

  • Kwon, Yongjae;Kim, So-Yeun
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.11
    • /
    • pp.281-291
    • /
    • 2020
  • This study used Gauss-Poisson jump diffusion process on standard assets to estimate the statutory reserves of Variable Annuity (VA) guarantees specified in Korean bylaw of insurance supervision and calculated guarantee fees and risks based on the model to see the effect of considering the jumps. Financial assets, except KOSPI 200, have fat-tailed return distributions, which is an indirect evidence of discontinuous jumps. In the case of a domestic stock index and foreign stock indexes(Korean Won), guarantee fees and risks decrease when jumps are considered in models of underlying assets. This is explained by decreases in standard deviations after the jump diffusion is considered. On the other hand, in the case of domestic bond indexes and a foreign bond index(Korean Won), guarantee fees and risks tend to increase when jumps are considered. Results from a foreign stock index(US Dollar) and a foreign bond index(US Dollar) were opposite to those from the same kinds of Korean Won indexes. We conclude that VA guarantee fees and risks may be under or over estimated when jumps are not considered in models of underlying assets.

The Clinical Utility of Korean Bayley Scales of Infant and Toddler Development-III - Focusing on using of the US norm - (베일리영유아발달검사 제3판(Bayley-III)의 미국 규준 적용의 문제: 미숙아 집단을 대상으로)

  • Lim, Yoo Jin;Bang, Hee Jeong;Lee, Soonhang
    • Korean journal of psychology:General
    • /
    • v.36 no.1
    • /
    • pp.81-107
    • /
    • 2017
  • The study aims to investigate the clinical utility of Bayley-III using US norm in Korea. A total of 98 preterm infants and 93 term infants were assessed with the K-Bayley-III. The performance pattern of preterm infants was analyzed with mixed design ANOVA which examined the differences of scaled scores and composite scores of Bayley-III between full term- and preterm- infant group and within preterm infants group. Then, We have investigated agreement between classifications of delay made using the BSID-II and Bayley-III. In addition, ROC plots were constructed to identify a Bayley-III cut-off score with optimum diagnostic utility in this sample. The results were as follows. (1) Preterm infants have significantly lower function levels in areas of 5 scaled scores and 3 developmental indexes compared with infants born at term. Significant differences among scores within preterm infant group were also found. (2) Bayley-III had the higher scores of the Mental Development Index and Psychomotor Developmental Index comparing to the scores of K-BSID-II, and had the lower rates of developmental delay. (3) All scales of Bayley-III, Cognitive, Language and Motor scale had the appropriate level of discrimination, but the cut-off composite scores of Bayley-III were adjusted 13~28 points higher than 69 for prediction of delay, as defined by the K-BSID-II. It explains the lower rates of developmental delay using the standard of two standard deviation. This study has provided empirical data to inform that we must careful when interpreting the score for clinical applications, identified the discriminating power, and proposed more appropriate cut-off scores. In addition, discussion about the sampling for making the Korean norm of Bayley-III was provided. It is preferable that infants in Korea should use our own validated norms. The standardization process to get Korean normative data must be performed carefully.