• Title/Summary/Keyword: data sampling

Search Result 5,029, Processing Time 0.035 seconds

A review of analysis methods for secondary outcomes in case-control studies

  • Schifano, Elizabeth D.
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.2
    • /
    • pp.103-129
    • /
    • 2019
  • The main goal of a case-control study is to learn the association between various risk factors and a primary outcome (e.g., disease status). Particularly recently, it is also quite common to perform secondary analyses of the case-control data in order to understand certain associations between the risk factors of the primary outcome. It has been repeatedly documented with case-control data, association studies of the risk factors that ignore the case-control sampling scheme can produce highly biased estimates of the population effects. In this article, we review the issues of the naive secondary analyses that do not account for the biased sampling scheme, and also the various methods that have been proposed to account for the case-control ascertainment. We additionally compare the results of many of the discussed methods in an example examining the association of a particular genetic variant with smoking behavior, where the data were obtained from a lung cancer case-control study.

CHAID Algorithm by Cube-based Sampling

  • Park, Hee-Chang;Cho, Kwang-Hyun
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.239-247
    • /
    • 2003
  • Decision tree algorithms are used extensively for data mining in many domains such as retail target marketing, fraud dection, data reduction and variable screening, etc. CHAID(Chi-square Automatic Interaction Detector), is an exploratory method used to study the relationship between a dependent variable and a series of predictor variables. In this paper we propose and CHAID algorithm by cube-based sampling and explore CHAID algorithm in view of accuracy and speed by the number of variables.

  • PDF

Deciding a sampling length for estimating the parameters in Geometric Brownian Motion

  • Song, Jun-Mo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.3
    • /
    • pp.549-553
    • /
    • 2011
  • In this paper, we deal with the problem of deciding the length of data for estimating the parameters in geometric Brownian motion. As an approach to this problem, we consider the change point test and introduce simple test statistic based on the cumulative sum of squares test (cusum test). A real data analysis is performed for illustration.

A Generalized Mixed-Effects Model for Vaccination Data

  • Choi, Jae-Sung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.2
    • /
    • pp.379-386
    • /
    • 2004
  • This paper deals with a mixed logit model for vaccination data. The effect of a newly developed vaccine for a certain chicken disease can be evaluated by a noninfection rate after injecting chicken with the disease vaccine. But there are a lot of factors that might affect the noninfecton rate. Some of these are fixed and others are random. Random factors are sometimes coming from the sampling scheme for choosing experimental units. This paper suggests a mixed model when some fixed factors need to have different experimental sizes by an experimental design and illustrates how to estimate parameters in a suggested model.

  • PDF

An Expert System for Reliability Management (신뢰성 관리 전문가 시스템)

  • Kim, Seong-in;Chang, Hong S.
    • Journal of Korean Society for Quality Management
    • /
    • v.22 no.3
    • /
    • pp.152-160
    • /
    • 1994
  • This paper concerns an expert system for reliability management. The system includes data base, life data analysis, life testing sampling plans and system operation. PROLOG is used as a language with dBASE III+ for the data base management system and C for calculations and graphics. This system analyzing the data and selecting an appropriate sampling plan can be implemented on an IBM PC 386 or a higher level machine.

  • PDF

Development of Sequential Sampling Plans for Tetranychus urticae in Strawberry Greenhouses (딸기 온실에서 점박이응애의 축차표본조사법 개발)

  • Choe, Hojeong;Kang, Juwan;Jung, Hyojin;Choi, Sira;Park, Jung-Joon
    • Korean Journal of Environmental Biology
    • /
    • v.35 no.4
    • /
    • pp.427-436
    • /
    • 2017
  • A fixed-precision-level sampling plan was developed to establish control of the two-spotted spider mite, Tetranychus urticae, in two strawberry greenhouses (conventional plot, natural enemy plot). T. urticae was sampled by taking a three-leaflet leaf (1 stalk) from each plant (3 three-leaflet leaves) from each sampling position. Each leaflet was divided into three different units (1-leaflet, 2-leaflet, and 3-leaflet units) to compare relative net precision (RNP) values for selection of the appropriate sampling unit. The relative net precision values indicated that a 1-leaflet unit was more precise and cost-efficient than other units. The spatial distribution analysis was performed using Taylor's power law (TPL). Homogeneity of the TPL parameters in each greenhouse was evaluated by using the analysis of covariance (ANCOVA). A fixed-precision-level sequential sampling plan was developed using the parameters of TPL generated from the combined data of the conventional plot and natural enemy plot in a 1-leaflet sampling unit. Sequential classification sampling plans were also developed using the action threshold of 3 and 10 mites for pooled data. Using the results obtained in the independent data, simulated validation of the developed sampling plan by Resampling validation for sampling plan (RVSP) indicated a reasonable level of precision.

Pedagogical Significance and Students' Informal Knowledge of Sample and Sampling (표본 개념의 교육적 의의와 인식 특성 연구)

  • Lee Kyung Hwa;Ji Eun Jeung
    • Journal of Educational Research in Mathematics
    • /
    • v.15 no.2
    • /
    • pp.177-196
    • /
    • 2005
  • In the Korean curriculum, students learn the concept of sample, sampling and other concepts related to sample and sampling, when they have reached the 10th grade of high school. But before the 10th grade, they have an activity about data collection, data analysis and the formulation of conclusion. We then investigated and analyzed the informal knowledge of students before they receive formal instructions. The results enabled the identification of the maximum response rate for each question that each student agreed or disagreed with. In particular, it didn't agree with how students consider the characteristic of population in the process of sampling, and the students agreed on a sampling process without considering the characteristic of the population or the components that consist the population. It showed that 5th grade students didn't investigate the data connected with sampling, and didn't understand the validity of sample survey process. While, 6th grade students equally understood sample size, sampling process, the reliance of data acquired through sample survey that applied to the source of judgment. But in details, it revealed that student had a misconception, or stayed at a subjective judgment level. The significant point is that many high school students didn't adequately understood a sample size with sampling. Though statistics instruction has traditionally been delayed until upper secondary education, this inquiry convinced us that this delay is unnecessary.

  • PDF

Application of compressive sensing and variance considered machine to condition monitoring

  • Lee, Myung Jun;Jun, Jun Young;Park, Gyuhae;Kang, To;Han, Soon Woo
    • Smart Structures and Systems
    • /
    • v.22 no.2
    • /
    • pp.231-237
    • /
    • 2018
  • A significant data problem is encountered with condition monitoring because the sensors need to measure vibration data at a continuous and sometimes high sampling rate. In this study, compressive sensing approaches for condition monitoring are proposed to demonstrate their efficiency in handling a large amount of data and to improve the damage detection capability of the current condition monitoring process. Compressive sensing is a novel sensing/sampling paradigm that takes much fewer data than traditional data sampling methods. This sensing paradigm is applied to condition monitoring with an improved machine learning algorithm in this study. For the experiments, a built-in rotating system was used, and all data were compressively sampled to obtain compressed data. The optimal signal features were then selected without the signal reconstruction process. For damage classification, we used the Variance Considered Machine, utilizing only the compressed data. The experimental results show that the proposed compressive sensing method could effectively improve the data processing speed and the accuracy of condition monitoring of rotating systems.

A Low Bit Rate Speech Coder Based on the Inflection Point Detection

  • Iem, Byeong-Gwan
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.15 no.4
    • /
    • pp.300-304
    • /
    • 2015
  • A low bit rate speech coder based on the non-uniform sampling technique is proposed. The non-uniform sampling technique is based on the detection of inflection points (IP). A speech block is processed by the IP detector, and the detected IP pattern is compared with entries of the IP database. The address of the closest member of the database is transmitted with the energy of the speech block. In the receiver, the decoder reconstructs the speech block using the received address and the energy information of the block. As results, the coder shows fixed data rate contrary to the existing speech coders based on the non-uniform sampling. Through computer simulation, the usefulness of the proposed technique is shown. The SNR performance of the proposed method is approximately 5.27 dB with the data rate of 1.5 kbps.

A Fixed Rate Speech Coder Based on the Filter Bank Method and the Inflection Point Detection

  • Iem, Byeong-Gwan
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.16 no.4
    • /
    • pp.276-280
    • /
    • 2016
  • A fixed rate speech coder based on the filter bank and the non-uniform sampling technique is proposed. The non-uniform sampling is achieved by the detection of inflection points (IPs). A speech block is band passed by the filter bank, and the subband signals are processed by the IP detector, and the detected IP patterns are compared with entries of the IP database. For each subband signal, the address of the closest member of the database and the energy of the IP pattern are transmitted through channel. In the receiver, the decoder recovers the subband signals using the received addresses and the energy information, and reconstructs the speech via the filter bank summation. As results, the coder shows fixed data rate contrary to the existing speech coders based on the non-uniform sampling. Through computer simulation, the usefulness of the proposed technique is confirmed. The signal-to-noise ratio (SNR) performance of the proposed method is comparable to that of the uniform sampled pulse code modulation (PCM) below 20 kbps data rate.