• Title/Summary/Keyword: under-sampling

Search Result 1,083, Processing Time 0.03 seconds

An Additive Quantitative Randomized Response Model by Cluster Sampling

  • Lee, Gi-Sung
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.3
    • /
    • pp.447-456
    • /
    • 2012
  • For a sensitive survey in which the population is comprised of several clusters with a quantitative attribute, we present an additive quantitative randomized response model by cluster sampling that adapts a two-stage cluster sampling instead of a simple random sample based on Himmelfarb-Edgell's additive quantitative attribute model and Gjestvang-Singh's one. We also derive optimum values for the number of 1st stage clusters and the optimum values of observation units in a 2nd stage cluster under the condition of minimizing the variance given constant cost. We can see that Himmelfarb-Edgell's model is more efficient than Gjestvang-Singh's model under the condition of cluster sampling.

Classification of Class-Imbalanced Data: Effect of Over-sampling and Under-sampling of Training Data (계급불균형자료의 분류: 훈련표본 구성방법에 따른 효과)

  • 김지현;정종빈
    • The Korean Journal of Applied Statistics
    • /
    • v.17 no.3
    • /
    • pp.445-457
    • /
    • 2004
  • Given class-imbalanced data in two-class classification problem, we often do over-sampling and/or under-sampling of training data to make it balanced. We investigate the validity of such practice. Also we study the effect of such sampling practice on boosting of classification trees. Through experiments on twelve real datasets it is observed that keeping the natural distribution of training data is the best way if you plan to apply boosting methods to class-imbalanced data.

Bayesian Estimation of k-Population Weibull Distribution Under Ordered Scale Parameters (순서를 갖는 척도모수들의 사전정보 하에 k-모집단 와이블분포의 베이지안 모수추정)

  • 손영숙;김성욱
    • The Korean Journal of Applied Statistics
    • /
    • v.16 no.2
    • /
    • pp.273-282
    • /
    • 2003
  • The problem of estimating the parameters of k-population Weibull distributions is discussed under the prior of ordered scale parameters. Parameters are estimated by the Gibbs sampling method. Since the conditional posterior distribution of the shape parameter in the Gibbs sampler is not log-concave, the shape parameter is generated by the adaptive rejection sampling. Finally, we applied this estimation methodology to the data discussed in Nelson (1970).

Development of Integrated Variable Sampling Interval EngineeringProcess Control & Statistical Process Control System (가변 샘플링간격 EPC/SPC 결합시스템의 개발)

  • Lee, Sung-Jae;Seo, Sun-Keun
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.32 no.3
    • /
    • pp.210-218
    • /
    • 2006
  • Traditional statistical process control (SPC) applied to discrete part industry in the form of control charts can look for and eliminate assignable causes by process monitoring. On the other hand, engineering process control (EPC) applied to the process industry in the form of feedback control can maintain the process output on the target by continual adjustment of input variable. This study presents controlling and monitoring rules adopted by variable sampling interval (VSI) to change sampling intervals in a predetermined fashion on the predicted process levels under integrated EPC and SPC systems. Twelve rules classified by EPC schemes(MMSE, constrained PI, bounded or deadband adjustment policy) and type of sampling interval combined with EWMA chart of SPC are proposed under IMA (1,1) disturbance model and zero-order (responsive) dynamic system. Properties of twelve control rules under three patterns of process change (sudden shift, drift and random shift) are evaluated and discussed through simulation and control rules for integrated VSI EPC and SPC systems are recommended.

Optimal Design of the Adaptive Searching Estimation in Spatial Sampling

  • Pyong Namkung;Byun, Jong-Seok
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.1
    • /
    • pp.73-85
    • /
    • 2001
  • The spatial population existing in a plane ares, such as an animal or aerial population, have certain relationships among regions which are located within a fixed distance from one selected region. We consider with the adaptive searching estimation in spatial sampling for a spatial population. The adaptive searching estimation depends on values of sample points during the survey and on the nature of the surfaces under investigation. In this paper we study the estimation by the adaptive searching in a spatial sampling for the purpose of estimating the area possessing a particular characteristic in a spatial population. From the viewpoint of adaptive searching, we empirically compare systematic sampling with stratified sampling in spatial sampling through the simulation data.

  • PDF

On sampling algorithms for imbalanced binary data: performance comparison and some caveats (불균형적인 이항 자료 분석을 위한 샘플링 알고리즘들: 성능비교 및 주의점)

  • Kim, HanYong;Lee, Woojoo
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.5
    • /
    • pp.681-690
    • /
    • 2017
  • Various imbalanced binary classification problems exist such as fraud detection in banking operations, detecting spam mail and predicting defective products. Several sampling methods such as over sampling, under sampling, SMOTE have been developed to overcome the poor prediction performance of binary classifiers when the proportion of one group is dominant. In order to overcome this problem, several sampling methods such as over-sampling, under-sampling, SMOTE have been developed. In this study, we investigate prediction performance of logistic regression, Lasso, random forest, boosting and support vector machine in combination with the sampling methods for binary imbalanced data. Four real data sets are analyzed to see if there is a substantial improvement in prediction performance. We also emphasize some precautions when the sampling methods are implemented.

Comparison of Sampling and Wall-to-Wall Methodologies for Reporting the GHG Inventory of the LULUCF Sector in Korea (LULUCF 부문 산림 온실가스 인벤토리 구축을 위한 Sampling과 Wall-to-Wall 방법론 비교)

  • Park, Eunbeen;Song, Cholho;Ham, Boyoung;Kim, Jiwon;Lee, Jongyeol;Choi, Sol-E;Lee, Woo-Kyun
    • Journal of Climate Change Research
    • /
    • v.9 no.4
    • /
    • pp.385-398
    • /
    • 2018
  • Although the importance of developing reliable and systematic GHG inventory has increased, the GIS/RS-based national scale LULUCF (Land Use, Land-Use Change and Forestry) sector analysis is insufficient in the context of the Paris Agreement. In this study, the change in $CO_2$ storage of forest land due to land use change is estimated using two GIS/RS methodologies, Sampling and Wall-to-Wall methods, from 2000 to 2010. Particularly, various imagery with sampling data and land cover maps are used for Sampling and Wall-to-Wall methods, respectively. This land use matrix of these methodologies and the national cadastral statistics are classified by six land-use categories (Forest land, Cropland, Grassland, Wetlands, Settlements, and Other land). The difference of area between the result of Sampling methods and the cadastral statistics decreases as the sample plot distance decreases. However, the difference is not significant under a 2 km sample plot. In the 2000s, the Wall-to-Wall method showed similar results to sampling under a 2 km distance except for the Settlement category. With the Wall-to-Wall method, $CO_2$ storage is higher than that of the Sampling method. Accordingly, the Wall-to-Wall method would be more advantageous than the Sampling method in the presence of sufficient spatial data for GHG inventory assessment. These results can contribute to establish an annual report system of national greenhouse gas inventory in the LULUCF sector.

Outpput Regulation of Nonlinear Systems and Time-Sampling Effects (비선형 시스템 출력 조절과 샘플링 영향)

  • Chung, Sun-Tae
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.35S no.11
    • /
    • pp.96-105
    • /
    • 1998
  • The effects of time-sampling to be considered in digital implementation of a nonlinear output regulator is investigated. It is found that the output regulatability of nonlinear systems is generally not robust with respect to time-sampling although the output regulatedness of nonlinear systems is preserved under time-sampling. Also, a certain class of nonlinear systems is clarified for which the preservation of the output regulatability under time-sampling can be decided without difficulty. These results imply that one needs to seek a better approximate sampled-data nonlinear output regulator since a digital output regulator resulting from discretizing the continuous-time nonlinear output regulator designed based on the underlying continuous-time nonlinear system model is nothing but a 1st order approximate one with respect to sampling-time.

  • PDF

EFFICIENT ESTIMATION OF POPULATION MEAN IN STRATIFIED SAMPLING USING REGRESSION TYPE ESTIMATOR

  • Grover Lovleen Kumar
    • Journal of the Korean Statistical Society
    • /
    • v.35 no.4
    • /
    • pp.441-452
    • /
    • 2006
  • Here an efficient regression type estimator for a stratified population mean is proposed under the two-phase sampling scheme. While constructing the proposed estimator, it is assumed that the first auxiliary variable x is directly and highly correlated with the study variable y, and the second auxiliary variable z is directly and highly correlated with the first auxiliary variable x. However the variable z is not directly correlated with the variable y, but they are just correlated with each other only due to their direct and high correlation with the variable x. The proposed regression type estimator is found to be always more efficient than the existing estimators defined under the same situation.

Development of Integrated Variable Sampling Interval Engineering Process Control & Statistical Process Control System (가변 샘플링간격 EPC/SPC 결합시스템의 개발)

  • Lee, Seong-Jae;Seo, Sun-Geun
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2005.05a
    • /
    • pp.723-729
    • /
    • 2005
  • Traditional statistical process control(SPC) applied to discrete part industry in the form of control charts can look for and eliminate assignable causes by process monitoring. On the other hand, engineering process control(EPC) applied to the process industry in the form of feedback control can maintain the process output on the target by continual adjustment of input variable. This study presents controlling and monitoring rules adopted variable sampling interval(VSI) to change sampling intervals in a predetermined fashion on the predicted process levels for integrated EPC and SPC systems. Twelve rules classified by EPC schemes(MMSE, constrained PI, bounded or deadband adjustment policy) and type of sampling interval combined with EWMA chart of SPC are proposed under IMA(1,1) disturbance model and zero-order (responsive) dynamic system. The properties of twelve control rules under three patterns of process change(sudden shift, drift and random shift) are evaluated and discussed through simulation and control rules for integrated VSI EPC and SPC systems are recommended.

  • PDF