• Title/Summary/Keyword: Stratified Sampling

Search Result 605, Processing Time 0.025 seconds

A Sampling Design of the Agricultural Machine Estimated Sales Survey

  • Park, Jinwoo
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.2
    • /
    • pp.375-382
    • /
    • 2001
  • The agricultural machine estimated sales survey is a survey to estimate annual sales quantities of eight major agricultural machines such as tracter, combine, etc. The purpose of this study is to design a multipurpose sample for the agricultural machine estimated sales survey. Main achievements of this study are to present an efficient stratification criterion and to suggest a reasonable estimation method by using the concept of post-stratification.

  • PDF

A Sampling Design for Health Index Survey

  • Ryu, Jea-Bok;Lee, Kay-O;Kim, Young-Won
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.2
    • /
    • pp.565-576
    • /
    • 2002
  • We propose a new sampling design for the 2001 Health Index Survey at Seoul. In this stratified two-stage sampling design, the ED(enumeration district) of 2000 Population and Housing Census is used as primary sampling unit and the Gu is used as stratification variable in order to obtain the sub-domain estimate for 25 Gu's as well as population estimate for Seoul. The sample ED's are systematically selected after the Ed's are ordered by location and property to obtain a representative sample. And also, the imputation methods for item nonresponses are suggested.

Understanding Complex Design Features via Design Effect Models (설계효과모형을 통한 설계요소의 유용성 이해)

  • Park, Inho
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.6
    • /
    • pp.1217-1225
    • /
    • 2015
  • Survey research, data is commonly collected through a sample design with complex design features that allow the relative efficiency on the precision of an estimator to be measured using the concept of the design effect compared to simple random sampling as a reference design. This concept is most useful when the design effect can be expressed as a function of various design features. We propose a design effect formula suitable under a stratified multistage sampling by generalizing Gabler et al. (1999, 2006)'s approaches for multistage sampling. Its use can either guide improvement in the design efficiency when in design stage or enable the evaluation of the adopted design features afterwards.

Logistic Regression for Retrospective Studies

  • Shin, Mi-Young
    • Journal of Korean Society for Quality Management
    • /
    • v.22 no.4
    • /
    • pp.111-119
    • /
    • 1994
  • We consider logistic models based on retrospective, case-control data with stratified samples and study the Weighted Exogeneous Sampling Maximum Likelihood (WESMU) We develop a consistent estimator of the asymptotic covariance matrix of the WESML estimator.

  • PDF

Establishment of a statistically reliable sampling method and size for serological surveillance of classical swine fever (CSF) in Korea (우리나라 돼지콜레라 항체 수준 측정을 위한 표본감사의 통계학적 기준 설정)

  • Yoon, Hachung;Nam, Hyang-Mi;Park, Choi-Kyu;Kim, Byoung-han;Park, Jee-Yong;Song, Jae-Young;Hyeon, Bang-Hun;Wee, Sung-Hwan
    • Korean Journal of Veterinary Research
    • /
    • v.47 no.1
    • /
    • pp.51-57
    • /
    • 2007
  • To establish a statistically reliable sampling strategy for serological surveillance of classical swinefever (CSF) in Korea, antibody test data from CSF surveillance conducted during year 2005 were analyzed.The most appropriate sampling method was determined to be stratified multi-stage random sampling strategy,in which the primary sampling unit is a pig farm and the secondary are the pigs by the strata of breedersand finishers in the selected farm. The optimum sample size was 5 to 19 including 1 to 2 breeders accordingto the number of pigs in the farm. The optimum sampling strategy demonstrated in this study was veryFindings of our study provide practical guidelines for surveillance of herd immunity level to CSF in Korea.

How Should We Randomly Sample Marine Fish Landed at Korea Ports to Represent a Length Frequency Distribution of Those Fish? (한국 연근해 어업에서 수집되는 어류 개체군 체장자료의 표집(sampling) 방법 제안)

  • Park, Min Gyou;Hyun, Saang-Yoon
    • Korean Journal of Fisheries and Aquatic Sciences
    • /
    • v.54 no.1
    • /
    • pp.80-89
    • /
    • 2021
  • In Korea, marine fish landed at ports are randomly sampled on a periodic basis (e.g., daily or weekly), and body sizes (e.g., lengths and weights) of those sampled fish are measured. The motivation for our study is whether or not such measurements reflect the size distribution, especially the length distribution of fish landed (= a population), because such length measurements are key data for a length-based assessment model. The current sampling method is to sample fish landed at ports by body size group (e.g., very small, small, medium, large, very large), using the sampling weights as the number of boxes by body size group. In this study, we showed that length composition data about fish sampled by the current method did not represent the length frequency distribution of the fish landed, and suggested that an alternative sampling method should be applied of using the sampling weights as the number of fish landed by body size group. We also introduced a method for determining an appropriate sample size.

Empirical Analysis on Rao-Scott First Order Adjustment for Two Population Homogeneity test Based on Stratified Three-Stage Cluster Sampling with PPS

  • Heo, Sunyeong
    • Journal of Integrative Natural Science
    • /
    • v.7 no.3
    • /
    • pp.208-213
    • /
    • 2014
  • National-wide and/or large scale sample surveys generally use complex sample design. Traditional Pearson chi-square test is not appropriate for the categorical complex sample data. Rao-Scott suggested an adjustment method for Pearson chi-square test, which uses the average of eigenvalues of design matrix of cell probabilities. This study is to compare the efficiency of Rao-Scott first order adjusted test to Wald test for homogeneity between two populations using 2009 Gyeongnam regional education offices's customer satisfaction survey (2009 GREOCSS) data. The 2009 GREOCSS data were collected based on stratified three-stage cluster sampling with probability proportional to size. The empirical results show that the Rao-Scott adjusted test statistic using only the variances of cell probabilities is very close to the Wald test statistic, which uses the covariance matrix of cell probabilities, under the 2009 GREOCSS data based. However it is necessary to be cautious to use the Rao-Scott first order adjusted test statistic in the place of Wald test because its efficiency is decreasing as the relative variance of eigenvalues of the design matrix of cell probabilities is increasing, specially more when the number of degrees of freedom is small.

Using weighted Support Vector Machine to address the imbalanced classes problem of Intrusion Detection System

  • Alabdallah, Alaeddin;Awad, Mohammed
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.10
    • /
    • pp.5143-5158
    • /
    • 2018
  • Improving the intrusion detection system (IDS) is a pressing need for cyber security world. With the growth of computer networks, there are constantly daily new attacks. Machine Learning (ML) is one of the most important fields which have great contribution to address the intrusion detection issues. One of these issues relates to the imbalance of the diverse classes of network traffic. Accuracy paradox is a result of training ML algorithm with imbalanced classes. Most of the previous efforts concern improving the overall accuracy of these models which is truly important. However, even they improved the total accuracy of the system; it fell in the accuracy paradox. The seriousness of the threat caused by the minor classes and the pitfalls of the previous efforts to address this issue is the motive for this work. In this paper, we consolidated stratified sampling, cost function and weighted Support Vector Machine (WSVM) method to address the accuracy paradox of ID problem. This model achieved good results of total accuracy and superior results in the small classes like the User-To-Remote and Remote-To-Local attacks using the improved version of the benchmark dataset KDDCup99 which is called NSL-KDD.

Integrity Assessment of Sharp Flaw in CANDU Pressure Tube Using Probabilistic Fracture Mechanics (확률론적 파괴역학을 도입한 CANDU 압력관의 예리한 결함에 대한 건전성평가)

  • Lee, Jun-Seong;Gwak, Sang-Rok;Kim, Yeong-Jin;Park, Yun-Won
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.26 no.4
    • /
    • pp.653-659
    • /
    • 2002
  • This paper describes a probabilistic fracture mechanics(PFM) analysis based on Monte Carlo(MC) simulation. In the analysis of CANDU pressure tube, the depth and aspect ratio of an initial semi-elliptical surface crack, a fracture toughness value and delayed hydride cracking(DHC) velocity are assumed to be probabilistic variables. As an example, some failure probabilities of piping and CANDU pressure tube are calculated using MC method with the stratified sampling MC technique, taking analysis conditions of normal operations. In the stratified MC simulation, a sampling space of probabilistic variables is divided into a number of small cells. For the verification of analysis results, a comparison study of the PFM analysis using other commercial code is carried out and a good agreement was observed between those results.

Quality Enhancement of MIROS Wave Radar Data at Ieodo Ocean Research Station Using ANN

  • Donghyun Park;Kideok Do;Miyoung Yun;Jin-Yong Jeong
    • Journal of Ocean Engineering and Technology
    • /
    • v.38 no.3
    • /
    • pp.103-114
    • /
    • 2024
  • Remote sensing wave observation data are crucial when analyzing ocean waves, the main external force of coastal disasters. Nevertheless, it has limitations in accuracy when used in low-wind environments. Therefore, this study collected the raw data from MIROS Wave and Current Radar (MWR) and wave radar at the Ieodo Ocean Research Station (IORS) and applied the optimal filter by combining filters provided by MIROS software. The data were validated by a comparison with South Jeju ocean buoy data. The results showed it maintained accuracy for significant wave height, but errors were observed in significant wave periods and extreme waves. Hence, this study used an artificial neural network (ANN) to improve these errors. The ANN was generalized by separating the data into training and test datasets through stratified sampling, and the optimal model structure was derived by adjusting the hyperparameters. The application of ANN effectively improved the accuracy in significant wave periods and high wave conditions. Consequently, this study reproduced past wave data by enhancing the reliability of the MWR, contributing to understanding wave generation and propagation in storm conditions, and improving the accuracy of wave prediction. On the other hand, errors persisted under high wave conditions because of wave shadow effects, necessitating more data collection and future research.