• Title/Summary/Keyword: under-sampling

Search Result 1,098, Processing Time 0.03 seconds

Heterogeneous Ensemble of Classifiers from Under-Sampled and Over-Sampled Data for Imbalanced Data

  • Kang, Dae-Ki;Han, Min-gyu
    • International journal of advanced smart convergence
    • /
    • v.8 no.1
    • /
    • pp.75-81
    • /
    • 2019
  • Data imbalance problem is common and causes serious problem in machine learning process. Sampling is one of the effective methods for solving data imbalance problem. Over-sampling increases the number of instances, so when over-sampling is applied in imbalanced data, it is applied to minority instances. Under-sampling reduces instances, which usually is performed on majority data. We apply under-sampling and over-sampling to imbalanced data and generate sampled data sets. From the generated data sets from sampling and original data set, we construct a heterogeneous ensemble of classifiers. We apply five different algorithms to the heterogeneous ensemble. Experimental results on an intrusion detection dataset as an imbalanced datasets show that our approach shows effective results.

Bayesian Multiattribute Acceptance Sampling Plans under Curtailed Inspection (베이지안 다특성(多特性) 단축(短縮) 샘플링 검사 방식의 설계)

  • Lee, Jong Seong
    • Journal of Industrial Technology
    • /
    • v.9
    • /
    • pp.51-56
    • /
    • 1989
  • A methodology for determining optimal sampling plans for Bayesian multiattribute curtailed inspection models is proposed, whereby sampling inspection is terminated as soon as the disposition of the inspection lot is determined. An iterative solution procedure is developed for obtaining optimal multiattribute acceptance sampling plans under cuntailed sampling inspection.

  • PDF

On the Estimation of Fraction Defectives

  • Kim, Seong-in
    • Journal of Korean Society for Quality Management
    • /
    • v.8 no.2
    • /
    • pp.3-14
    • /
    • 1980
  • This paper is concerned with the design of an appropriate sampling plan or stopping rule and the construction of estimate for the estimation of process or lot fraction defective. Various sampling plans which are well known or have potential applications are unified into a generalized sampling plan. Under this sampling plan sufficient statistic, probability distribution, moment, and minimum variance unbiased estimate are obtained. Results for various sampling plans can be derived as special cases. Then, under given parameter values, the relative efficiencies of the various sampling plans are compared with respect to expected sample sizes and variances of estimates.

  • PDF

A Hybrid Under-sampling Approach for Better Bankruptcy Prediction (부도예측 개선을 위한 하이브리드 언더샘플링 접근법)

  • Kim, Taehoon;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.173-190
    • /
    • 2015
  • The purpose of this study is to improve bankruptcy prediction models by using a novel hybrid under-sampling approach. Most prior studies have tried to enhance the accuracy of bankruptcy prediction models by improving the classification methods involved. In contrast, we focus on appropriate data preprocessing as a means of enhancing accuracy. In particular, we aim to develop an effective sampling approach for bankruptcy prediction, since most prediction models suffer from class imbalance problems. The approach proposed in this study is a hybrid under-sampling method that combines the k-Reverse Nearest Neighbor (k-RNN) and one-class support vector machine (OCSVM) approaches. k-RNN can effectively eliminate outliers, while OCSVM contributes to the selection of informative training samples from majority class data. To validate our proposed approach, we have applied it to data from H Bank's non-external auditing companies in Korea, and compared the performances of the classifiers with the proposed under-sampling and random sampling data. The empirical results show that the proposed under-sampling approach generally improves the accuracy of classifiers, such as logistic regression, discriminant analysis, decision tree, and support vector machines. They also show that the proposed under-sampling approach reduces the risk of false negative errors, which lead to higher misclassification costs.

The systematic sampling for inferring the survey indices of Korean groundfish stocks

  • Hyun, Saang-Yoon;Seo, Young IL
    • Fisheries and Aquatic Sciences
    • /
    • v.21 no.8
    • /
    • pp.24.1-24.9
    • /
    • 2018
  • The Korean bottom trawl survey has been deployed on a regular basis for about the last decade as part of groundfish stock assessments. The regularity indicates that they sample groundfish once per grid cell whose sides are half of one latitude and that of one longitude, respectively, and whose inside is furthermore divided into nine nested grids. Unless they have a special reason (e.g., running into a rocky bottom), their sample location is at the center grid of the nine nested grids. Given data collected by the survey, we intended to show how to appropriately estimate not only the survey index of a fish stock but also its uncertainty. For the regularity reason, we applied the systematic sampling theory for the above purposes and compared its results with a reference, which was based on the simple random sampling. When using the survey data about 11 fish stocks, collected by the spring and fall surveys in 2014, the survey indices of those stocks estimated under the systematic sampling were overall more precise than those under the simple random sampling. In estimates of the survey indices in number, the standard errors of those estimates under the systematic sampling were reduced from those under the simple random sampling by 0.23~27.44%, while in estimates of the survey indices in weight, they decreased by 0.04~31.97%. In bias of the estimates, the systematic sampling was the same as the simple random sampling. Our paper is first in formally showing how to apply the systematic sampling theory to the actual data collected by the Korean bottom trawl surveys.

Economic-Statistical Design of Double Sampling T2 Control Chart under Weibull Failure Model (와이블 고장모형 하에서의 이중샘플링 T2 관리도의 경제적-통계적 설계 (이중샘플링 T2 관리도의 경제적-통계적 설계))

  • Hong, Seong-Ok;Lee, Min-Koo;Lee, Jooho
    • Journal of Korean Society for Quality Management
    • /
    • v.43 no.4
    • /
    • pp.471-488
    • /
    • 2015
  • Purpose: Double sampling $T^2$ chart is a useful tool for detecting a relatively small shift in process mean when the process is controlled by multiple variables. This paper finds the optimal design of the double sampling $T^2$ chart in both economical and statistical sense under Weibull failure model. Methods: The expected cost function is mathematically derived using recursive equation approach. The optimal designs are found using a genetic algorithm for numerical examples and compared to those of single sampling $T^2$ chart. Sensitivity analysis is performed to see the parameter effects. Results: The proposed design outperforms the optimal design of the single sampling $T^2$ chart in terms of the expected cost per unit time and Type-I error rate for all the numerical examples considered. Conclusion: Double sampling $T^2$ chart can be designed to satisfy both economic and statistical requirements under Weibull failure model and the resulting design is better than the single sampling counterpart.

New Attributes and Variables Control Charts under Repetitive Sampling

  • Aslam, Muhammad;Azam, Muhammad;Jun, Chi-Hyuck
    • Industrial Engineering and Management Systems
    • /
    • v.13 no.1
    • /
    • pp.101-106
    • /
    • 2014
  • New control charts under repetitive sampling are proposed, which can be used for variables and attributes quality characteristics. The proposed control charts have inner and outer control limits so that repetitive sampling may be needed if the plotted statistic falls between the two limits. Particularly, the new np and variable X-bar control charts under repetitive sampling are considered in detail. The in-control and out-of-control average run lengths are analyzed according to various process shifts. The performance of the proposed control charts is compared with the existing np and the X-bar control charts in terms of the average run lengths.

On inference of multivariate means under ranked set sampling

  • Rochani, Haresh;Linder, Daniel F.;Samawi, Hani;Panchal, Viral
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.1
    • /
    • pp.1-13
    • /
    • 2018
  • In many studies, a researcher attempts to describe a population where units are measured for multiple outcomes, or responses. In this paper, we present an efficient procedure based on ranked set sampling to estimate and perform hypothesis testing on a multivariate mean. The method is based on ranking on an auxiliary covariate, which is assumed to be correlated with the multivariate response, in order to improve the efficiency of the estimation. We showed that the proposed estimators developed under this sampling scheme are unbiased, have smaller variance in the multivariate sense, and are asymptotically Gaussian. We also demonstrated that the efficiency of multivariate regression estimator can be improved by using Ranked set sampling. A bootstrap routine is developed in the statistical software R to perform inference when the sample size is small. We use a simulation study to investigate the performance of the method under known conditions and apply the method to the biomarker data collected in China Health and Nutrition Survey (CHNS 2009) data.

Sampling Plans Based on Truncated Life Test for a Generalized Inverted Exponential Distribution

  • Singh, Sukhdev;Tripathi, Yogesh Mani;Jun, Chi-Hyuck
    • Industrial Engineering and Management Systems
    • /
    • v.14 no.2
    • /
    • pp.183-195
    • /
    • 2015
  • In this paper, we propose a two-stage group acceptance sampling plan for generalized inverted exponential distribution under truncated life test. Median life is considered as a quality parameter. Design parameters are obtained to ensure that true median life is longer than a given specified life at certain level of consumer's risk and producer's risk. We also explore situations under which design parameters based on median lifetime can be used for other percentile points. Tables and specific examples are reported to explain the proposed plans. Finally a real data set is analyzed to implement the plans in practical situations and some suggestions are given.

A Study on Determining Job Sequence by Sampling Method (II) (샘플링 기법에 의한 작업순서의 결정 (II))

  • 강성수;노인규
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.12 no.19
    • /
    • pp.25-30
    • /
    • 1989
  • This study is concerned with a job sequencing method using the concept of sampling technique. This sampling technique has never been applied to develop the scheduling algorithms. The most job sequencing algorithms have been developed to determine the best or good solution under the special conditions. Thus, it is not only very difficult, but also taken too much time to develop the appropriate job schedules that satisfy the complex work conditions. The application areas of these algorithms are also very narrow. Under these circumstances it is very desirable to develop a simple job sequencing method which can produce the good solution with the short tine period under any complex work conditions. It is called a sampling job sequencing method in this study. This study is to examine the selection of the good job sequence of 1%-5% upper group by the sampling method. The result shows that there is the set of 0.5%-5% job sequence group which has to same amount of performance measure with the optimal job sequence in the case of experiment of 2/n/F/F max. This indicates that the sampling job sequencing method is a useful job sequencing method to find the optimal or good job sequence with a little effort and time consuming. The results of ANOVA show that the two factors, number of jobs and the range of processing time are the significant factors for determining the job sequence at $\alpha$=0.01. This study is extended to 3 machines to machines job shop problems further.

  • PDF