• Title/Summary/Keyword: stratified sampling

Search Result 603, Processing Time 0.023 seconds

A Sampling Design of the Agricultural Machine Estimated Sales Survey

  • Park, Jinwoo
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.2
    • /
    • pp.375-382
    • /
    • 2001
  • The agricultural machine estimated sales survey is a survey to estimate annual sales quantities of eight major agricultural machines such as tracter, combine, etc. The purpose of this study is to design a multipurpose sample for the agricultural machine estimated sales survey. Main achievements of this study are to present an efficient stratification criterion and to suggest a reasonable estimation method by using the concept of post-stratification.

  • PDF

A Sampling Design for Health Index Survey

  • Ryu, Jea-Bok;Lee, Kay-O;Kim, Young-Won
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.2
    • /
    • pp.565-576
    • /
    • 2002
  • We propose a new sampling design for the 2001 Health Index Survey at Seoul. In this stratified two-stage sampling design, the ED(enumeration district) of 2000 Population and Housing Census is used as primary sampling unit and the Gu is used as stratification variable in order to obtain the sub-domain estimate for 25 Gu's as well as population estimate for Seoul. The sample ED's are systematically selected after the Ed's are ordered by location and property to obtain a representative sample. And also, the imputation methods for item nonresponses are suggested.

Understanding Complex Design Features via Design Effect Models (설계효과모형을 통한 설계요소의 유용성 이해)

  • Park, Inho
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.6
    • /
    • pp.1217-1225
    • /
    • 2015
  • Survey research, data is commonly collected through a sample design with complex design features that allow the relative efficiency on the precision of an estimator to be measured using the concept of the design effect compared to simple random sampling as a reference design. This concept is most useful when the design effect can be expressed as a function of various design features. We propose a design effect formula suitable under a stratified multistage sampling by generalizing Gabler et al. (1999, 2006)'s approaches for multistage sampling. Its use can either guide improvement in the design efficiency when in design stage or enable the evaluation of the adopted design features afterwards.

Logistic Regression for Retrospective Studies

  • Shin, Mi-Young
    • Journal of Korean Society for Quality Management
    • /
    • v.22 no.4
    • /
    • pp.111-119
    • /
    • 1994
  • We consider logistic models based on retrospective, case-control data with stratified samples and study the Weighted Exogeneous Sampling Maximum Likelihood (WESMU) We develop a consistent estimator of the asymptotic covariance matrix of the WESML estimator.

  • PDF

Establishment of a statistically reliable sampling method and size for serological surveillance of classical swine fever (CSF) in Korea (우리나라 돼지콜레라 항체 수준 측정을 위한 표본감사의 통계학적 기준 설정)

  • Yoon, Hachung;Nam, Hyang-Mi;Park, Choi-Kyu;Kim, Byoung-han;Park, Jee-Yong;Song, Jae-Young;Hyeon, Bang-Hun;Wee, Sung-Hwan
    • Korean Journal of Veterinary Research
    • /
    • v.47 no.1
    • /
    • pp.51-57
    • /
    • 2007
  • To establish a statistically reliable sampling strategy for serological surveillance of classical swinefever (CSF) in Korea, antibody test data from CSF surveillance conducted during year 2005 were analyzed.The most appropriate sampling method was determined to be stratified multi-stage random sampling strategy,in which the primary sampling unit is a pig farm and the secondary are the pigs by the strata of breedersand finishers in the selected farm. The optimum sample size was 5 to 19 including 1 to 2 breeders accordingto the number of pigs in the farm. The optimum sampling strategy demonstrated in this study was veryFindings of our study provide practical guidelines for surveillance of herd immunity level to CSF in Korea.

How Should We Randomly Sample Marine Fish Landed at Korea Ports to Represent a Length Frequency Distribution of Those Fish? (한국 연근해 어업에서 수집되는 어류 개체군 체장자료의 표집(sampling) 방법 제안)

  • Park, Min Gyou;Hyun, Saang-Yoon
    • Korean Journal of Fisheries and Aquatic Sciences
    • /
    • v.54 no.1
    • /
    • pp.80-89
    • /
    • 2021
  • In Korea, marine fish landed at ports are randomly sampled on a periodic basis (e.g., daily or weekly), and body sizes (e.g., lengths and weights) of those sampled fish are measured. The motivation for our study is whether or not such measurements reflect the size distribution, especially the length distribution of fish landed (= a population), because such length measurements are key data for a length-based assessment model. The current sampling method is to sample fish landed at ports by body size group (e.g., very small, small, medium, large, very large), using the sampling weights as the number of boxes by body size group. In this study, we showed that length composition data about fish sampled by the current method did not represent the length frequency distribution of the fish landed, and suggested that an alternative sampling method should be applied of using the sampling weights as the number of fish landed by body size group. We also introduced a method for determining an appropriate sample size.

Empirical Analysis on Rao-Scott First Order Adjustment for Two Population Homogeneity test Based on Stratified Three-Stage Cluster Sampling with PPS

  • Heo, Sunyeong
    • Journal of Integrative Natural Science
    • /
    • v.7 no.3
    • /
    • pp.208-213
    • /
    • 2014
  • National-wide and/or large scale sample surveys generally use complex sample design. Traditional Pearson chi-square test is not appropriate for the categorical complex sample data. Rao-Scott suggested an adjustment method for Pearson chi-square test, which uses the average of eigenvalues of design matrix of cell probabilities. This study is to compare the efficiency of Rao-Scott first order adjusted test to Wald test for homogeneity between two populations using 2009 Gyeongnam regional education offices's customer satisfaction survey (2009 GREOCSS) data. The 2009 GREOCSS data were collected based on stratified three-stage cluster sampling with probability proportional to size. The empirical results show that the Rao-Scott adjusted test statistic using only the variances of cell probabilities is very close to the Wald test statistic, which uses the covariance matrix of cell probabilities, under the 2009 GREOCSS data based. However it is necessary to be cautious to use the Rao-Scott first order adjusted test statistic in the place of Wald test because its efficiency is decreasing as the relative variance of eigenvalues of the design matrix of cell probabilities is increasing, specially more when the number of degrees of freedom is small.

Using weighted Support Vector Machine to address the imbalanced classes problem of Intrusion Detection System

  • Alabdallah, Alaeddin;Awad, Mohammed
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.10
    • /
    • pp.5143-5158
    • /
    • 2018
  • Improving the intrusion detection system (IDS) is a pressing need for cyber security world. With the growth of computer networks, there are constantly daily new attacks. Machine Learning (ML) is one of the most important fields which have great contribution to address the intrusion detection issues. One of these issues relates to the imbalance of the diverse classes of network traffic. Accuracy paradox is a result of training ML algorithm with imbalanced classes. Most of the previous efforts concern improving the overall accuracy of these models which is truly important. However, even they improved the total accuracy of the system; it fell in the accuracy paradox. The seriousness of the threat caused by the minor classes and the pitfalls of the previous efforts to address this issue is the motive for this work. In this paper, we consolidated stratified sampling, cost function and weighted Support Vector Machine (WSVM) method to address the accuracy paradox of ID problem. This model achieved good results of total accuracy and superior results in the small classes like the User-To-Remote and Remote-To-Local attacks using the improved version of the benchmark dataset KDDCup99 which is called NSL-KDD.

Integrity Assessment of Sharp Flaw in CANDU Pressure Tube Using Probabilistic Fracture Mechanics (확률론적 파괴역학을 도입한 CANDU 압력관의 예리한 결함에 대한 건전성평가)

  • Lee, Jun-Seong;Gwak, Sang-Rok;Kim, Yeong-Jin;Park, Yun-Won
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.26 no.4
    • /
    • pp.653-659
    • /
    • 2002
  • This paper describes a probabilistic fracture mechanics(PFM) analysis based on Monte Carlo(MC) simulation. In the analysis of CANDU pressure tube, the depth and aspect ratio of an initial semi-elliptical surface crack, a fracture toughness value and delayed hydride cracking(DHC) velocity are assumed to be probabilistic variables. As an example, some failure probabilities of piping and CANDU pressure tube are calculated using MC method with the stratified sampling MC technique, taking analysis conditions of normal operations. In the stratified MC simulation, a sampling space of probabilistic variables is divided into a number of small cells. For the verification of analysis results, a comparison study of the PFM analysis using other commercial code is carried out and a good agreement was observed between those results.

A Sample Design for National Nutrition Servey (국민영양조사(國民營養調査)를 위한 표본설계(標本設計) 소고(小考))

  • Jun, Tae-Yoon;Chung, Kee-Hey
    • Journal of Nutrition and Health
    • /
    • v.17 no.3
    • /
    • pp.236-241
    • /
    • 1984
  • In order to make clear the relationship between sample design and sample survey in community, it was conducted research on sample design for National Nutrition Survey in 1983. In this paper it was tried to analize the data based on The Report of a Settled Population, 1981 conducted by National Bureau of Statistics Economic Planning Board. The sample was basically using stratified two-stage sampling with systematic sampling of Ban or Li as administrative unit. The population represents the whole nation excluding Jeju-do because of budget. The selection of sampling unit and sampling procedure was as follows. 1) Stratify the nation-wide area in 20 sections according to administrative districts. 2) Determine the sample size in each section according to equal proportional rate (1 / 8040) and to about 1,000 households in the sample. 3) Select the 25 sampling units by section according to households proportion. 4) Select the 10 households at random from each Ban or Li according to equal probability proportion as the final sampling unit. Using the procedure, it was sampled 1,000 households for National Nutrition Survey in 1983.

  • PDF