• Title/Summary/Keyword: Over-sampling

Search Result 1,271, Processing Time 0.028 seconds

Cluster Sampling in Sampling Inspection: Bayes Estimation

  • Juyoung Lee
    • Communications for Statistical Applications and Methods
    • /
    • v.6 no.1
    • /
    • pp.107-116
    • /
    • 1999
  • We propose a sample design which minimize Bayes risk for cluster smpling in sampling inspection. We treat a pilot sample and an additional sample size as random variable. In addition we compute an appropriate cluster size for handling over-dispersion.

  • PDF

On sampling algorithms for imbalanced binary data: performance comparison and some caveats (불균형적인 이항 자료 분석을 위한 샘플링 알고리즘들: 성능비교 및 주의점)

  • Kim, HanYong;Lee, Woojoo
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.5
    • /
    • pp.681-690
    • /
    • 2017
  • Various imbalanced binary classification problems exist such as fraud detection in banking operations, detecting spam mail and predicting defective products. Several sampling methods such as over sampling, under sampling, SMOTE have been developed to overcome the poor prediction performance of binary classifiers when the proportion of one group is dominant. In order to overcome this problem, several sampling methods such as over-sampling, under-sampling, SMOTE have been developed. In this study, we investigate prediction performance of logistic regression, Lasso, random forest, boosting and support vector machine in combination with the sampling methods for binary imbalanced data. Four real data sets are analyzed to see if there is a substantial improvement in prediction performance. We also emphasize some precautions when the sampling methods are implemented.

A new criterion for determining the sampling rate of digital controller (디지털제어기의 제어주기 결정방법에 관한 연구)

  • 이준화;문홍주;정병근
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2000.10a
    • /
    • pp.360-360
    • /
    • 2000
  • In this paper, a new criterion f9r determining the sampling rate of digital conroller is proposed. This paper will introduce a method fur determining the appropriate sampling rate of digital controller which can be substituted with the given analog controller, using phase margin and gain cross over frequency, not rising time or bandwidth of the closed-loop system. This method also guarantees performance of the system. Without exact modeling functions of the plant, abstracting those functions, this paper can achieve stability and aimed performance of the system, and this paper proved it with proper modeling functions.

  • PDF

Complex Bandpass Sampling Technique and Its Generalized Formulae for SDR System (SDR 시스템을 위한 Complex Bandpass Sampling 기법 및 일반화 공식의 유도)

  • Bae, Jung-Hwa;Ha, Won;Park, Jin-Woo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.7C
    • /
    • pp.687-695
    • /
    • 2005
  • A bandpass sampling technique, which is a method directly downconverting a bandpass signal to a baseband or a low IF signal without analog mixers, can be an alterative choice for the SDR system to minimize the RF front-end. In this paper, a complex bandpass sampling technique for two bandpass-filtered signals is proposed. We derived generalized formulae for the available sampling range, the signal's IF and the minimum sampling frequency taking into consideration the guard-bands for the multiple RE signals. Thru the simulation experiments, the advantages of the . complex bandpass sampling over the pre-reported real bandpass sampling are investigated for applications in the SDR design.

[ $\bar{X}$ ] Control Charts with Variable Sample Sizes and Variable Sampling Intervals

  • Lee, Jae-Heon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.3
    • /
    • pp.429-440
    • /
    • 2003
  • Variable sampling rate (VSR) control charts vary the sampling interval and/or the sample size according to value of the control statistic. It is known that $\bar{X}$ charts with VSR scheme lead to large improvements in performance over those with fixed sampling rate (FSR) scheme. In this paper, we studied $\bar{X}$ charts with several VSR schemes, and compared their statistical performance each other.

  • PDF

Sampling Study on Environmental Observations: Precipitation, Soil Moisture and Land Cover Information

  • 유철상
    • Journal of Environmental Science International
    • /
    • v.5 no.2
    • /
    • pp.103-112
    • /
    • 1996
  • Observational date is integral in our understanding of present climate, its natural variability and any cnange roue to anturopogenic effects. This study incorporates a brief overview of sampling requirements using data from the first ISLSCP Field Experiment (FIFE) in 1987, which was a multi-disciplinary field experiment over a 15km grid in Konza Prairie, USA. Sampling strategies were designed for precipitation and soil moisture measurements and also detecting land cover type. It was concludes that up to 8 raingages would be needed for valuable precipitation measurements covering the whole FIFE catchment, but only one soil moisture station. Results show that as new gages or station are added to the catchment then the sampling error is reduced, but the Improvement in error performance is less as the number of gages or stations increases. Sampling from remoteiy sensed instruments shows different results. It can be seen that the sampling error at 1arger resolution sizes are small due to competing error contribution from both commission and omission error.

  • PDF

Folded Ranked Set Sampling for Asymmetric Distributions

  • Bani-Mustafa, Ahmed;Al-Nasser, Amjad D.;Aslam, Muhammad
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.1
    • /
    • pp.147-153
    • /
    • 2011
  • In this paper a new sampling procedure for estimating the population mean is introduced. The performance of the new population mean estimator is discussed, along with its properties, and it is shown that the proposed method generates an unbiased estimator. The relative efficiency of the suggested estimator is computed, in regards to the simple random sample(SRS), and comparisons are made to the ranked set sampling(RSS) and extreme ranked set sampling(ERSS) estimators used for asymmetric distributions. The results indicate that the proposed estimator is more efficient than the estimators based on the ERSS. In addition, the folded ranked set sampling(FRSS) procedure has an advantage over the RSS and ERSS in that it reduces the number of unused sampling units.

Ensemble Learning for Solving Data Imbalance in Bankruptcy Prediction (기업부실 예측 데이터의 불균형 문제 해결을 위한 앙상블 학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.15 no.3
    • /
    • pp.1-15
    • /
    • 2009
  • In a classification problem, data imbalance occurs when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. This paper proposes a Geometric Mean-based Boosting (GM-Boost) to resolve the problem of data imbalance. Since GM-Boost introduces the notion of geometric mean, it can perform learning process considering both majority and minority sides, and reinforce the learning on misclassified data. An empirical study with bankruptcy prediction on Korea companies shows that GM-Boost has the higher classification accuracy than previous methods including Under-sampling, Over-Sampling, and AdaBoost, used in imbalanced data and robust learning performance regardless of the degree of data imbalance.

  • PDF

Application of Random Over Sampling Examples(ROSE) for an Effective Bankruptcy Prediction Model (효과적인 기업부도 예측모형을 위한 ROSE 표본추출기법의 적용)

  • Ahn, Cheolhwi;Ahn, Hyunchul
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.8
    • /
    • pp.525-535
    • /
    • 2018
  • If the frequency of a particular class is excessively higher than the frequency of other classes in the classification problem, data imbalance problems occur, which make machine learning distorted. Corporate bankruptcy prediction often suffers from data imbalance problems since the ratio of insolvent companies is generally very low, whereas the ratio of solvent companies is very high. To mitigate these problems, it is required to apply a proper sampling technique. Until now, oversampling techniques which adjust the class distribution of a data set by sampling minor class with replacement have popularly been used. However, they are a risk of overfitting. Under this background, this study proposes ROSE(Random Over Sampling Examples) technique which is proposed by Menardi and Torelli in 2014 for the effective corporate bankruptcy prediction. The ROSE technique creates new learning samples by synthesizing the samples for learning, so it leads to better prediction accuracy of the classifiers while avoiding the risk of overfitting. Specifically, our study proposes to combine the ROSE method with SVM(support vector machine), which is known as the best binary classifier. We applied the proposed method to a real-world bankruptcy prediction case of a Korean major bank, and compared its performance with other sampling techniques. Experimental results showed that ROSE contributed to the improvement of the prediction accuracy of SVM in bankruptcy prediction compared to other techniques, with statistical significance. These results shed a light on the fact that ROSE can be a good alternative for resolving data imbalance problems of the prediction problems in social science area other than bankruptcy prediction.

Over-Sampling Rate for Accurate Evaluation of MLFMM Transfer Function (MLFMM의 Transfer 함수의 정확한 계산을 위한 오버샘플링 비율)

  • Lee, Hyunsoo;Rim, Jae-Won;Koh, Il-Suek
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.29 no.10
    • /
    • pp.811-816
    • /
    • 2018
  • When applying the MLFMM algorithm to a large scattering problem, the accuracy of the calculation of the transfer function has a crucial effect on the final simulation results. The numerical accuracy for the double integral on the unit sphere is strongly dependent on the sampling number. With an increasing the sampling points, the overall required memory and running time of the MLFMM simulation also increases. Hence, an optimal over-sampling rate for the number of the sampling points is numerically obtained, which is verified for a real large scattering problem.