• 제목/요약/키워드: Simple random sample

검색결과 105건 처리시간 0.023초

조사 데이터 분석용 소프트웨어 패키지 (software packages for survey data analysis)

  • 성내경
    • 한국조사연구학회지:조사연구
    • /
    • 제1권1호
    • /
    • pp.109-123
    • /
    • 2000
  • 복잡한 확률표본설계를 기초로 수집된 데이터를 적법하게 분석하려면 반드시 조사설계를 고려하여 통계적 추론을 전개해야한다. 만일 설계를 무시하고 분석을 하면 각종 추정량의 분산이 과소추정되면 이 결과 제 1종 오류의 확률이 매우높아진다. 본고에서는 조사데이터를 전문적으로 분석하는 소프트웨어 패키지들을 소개하고 특히 SUDAAN 7.5판 과 SAS 8판의 분석 능력들에 대한 정보를 요약한다.

  • PDF

복수의 비밀 분산을 위한 (2, n) 시각 암호의 새로운 구성 (New Construction of (2,n) Visual Cryptography for Multiple Secret Sharing)

  • 김문수
    • 정보보호학회논문지
    • /
    • 제10권3호
    • /
    • pp.37-48
    • /
    • 2000
  • 서울Visual cryptography scheme is a simple method in which can be directly decoded the secret information in human visual system without performing any cryptographic computations. This scheme is a kind of secret sharing scheme in which if a secret of image type is scattered to n random images(slides) and any threshold (or more) of them are stacked together the original image will become visible. In this paper we consider (2, n) visual cryptography scheme and propose a new construction method in which the number of expanded pixels can be reduced by using the sample matrix. The proposed scheme can futhermore distribute the multiple secret image to each group according to the difference of relative contrast.

단순집락추출법에 의한 양적속성의 무관질문모형 (Unrelated question model with quantitative attribute by simple cluster sampling)

  • 이기성;홍기학
    • 응용통계연구
    • /
    • 제11권1호
    • /
    • pp.141-150
    • /
    • 1998
  • 본 논문에서는 매우 민감한 조사에서 모집단이 양적속성을 갖는 여러 개의 집락으로 구성되어 있을 때, 집락을 추출단위로 하는 단순집락추출법에 양적속성의 무관질문모형을 적용하였다. 그리고, 일정한 비용하에서 분산을 최소로 하는 집락의 크기와 표본집락의 수의 최적값을 구하여 최소분산의 형태를 도출하였다. 또한, 제안한 단순집략추출법에 의한 무관질문모형과 단순임의 추출법에 의한 무관질문모형과의 효율성을 비교해 보았다.

  • PDF

RELATIONS OF DAGUM DISTRIBUTION BASED ON DUAL GENERALIZED ORDER STATISTICS

  • KUMAR, DEVENDRA
    • Journal of applied mathematics & informatics
    • /
    • 제35권5_6호
    • /
    • pp.477-493
    • /
    • 2017
  • The dual generalized order statistics is a unified model which contains the well known decreasingly ordered random variables like order statistics and lower record values. With this definition we give simple expressions for single and product moments of dual generalized order statistics from Dagum distribution. The results for order statistics and lower records are deduced from the relations derived and some computational works are also carried out. Further, a characterizing result of this distribution on using the conditional moment of the dual generalized order statistics is discussed. These recurrence relations enable computation of the means, variances and covariances of all order statistics for all sample sizes in a simple and efficient manner. By using these relations, we tabulate the means, variances, skewness and kurtosis of order statistics and record values of the Dagum distribution.

On Quantifies Estimation Using Ranked Samples with Some Applications

  • Samawi, Hani-M.
    • Journal of the Korean Statistical Society
    • /
    • 제30권4호
    • /
    • pp.667-678
    • /
    • 2001
  • The asymptotic behavior and distribution for quantiles estimators using ranked samples are introduced. Applications of quantiles estimation on finding the normal ranges (2.5% and 97.5% percentiles) and the median of some medical characteristics and on finding the Hodges-Lehmann estimate are discussed. The conclusion of this study is, whenever perfect ranking is possible, the relative efficiency of quantiles estimation using ranked samples relative to SRS is high. This may translates to large savings in cost and time. Also, this conclusion holds even if the ranking is not perfect. Computer simulation results are given and real data from lows 65+ study is used to illustrate the method.

  • PDF

잠재적 위험요인의 탐색에 관한 단일표본분석과 복합표본분석의 비교 (Comparative Analysis of Unweighted Sample Design and Complex Sample Design Related to the Exploration of Potential Risk Factors of Dysphonia)

  • 변해원
    • 한국산학기술학회논문지
    • /
    • 제13권5호
    • /
    • pp.2251-2258
    • /
    • 2012
  • 본 연구는 잠재적 위험요인을 탐색하는 방법으로 단순임의추출분석(unweighted sample design), 빈도 가중치를 적용한 단일표본분석(frequency weighted sample design), 가중치를 층화하여 적용한 복합표본분석(complex sample design)을 비교하고, 도출된 결과에 통계적인 차이가 있는지를 파악하고자 수행되었다. 자료원은 2009 국민건강영양조사의 이비인후과 검진 자료를 이용하였다. 분석 방법은 피어슨의 교차검정(Pearson chi-square test)과 라오-스콧교차검정(Rao-scott chi-square test)을 이용하였다. 분석 결과, 빈도 가중치만을 적용한 단일표본분석의 경우에는 모든 변수가 유의한 위험요인으로 과대 예측 되었고, 가중치를 적용하지 않은 단순임의추출 분석과 복합표본분석은 유의수준 및 결과에 차이가 있었다. 국가통계자료를 이용할 때, 연구의 결과가 전체 인구집단을 대표할 수 있도록 의미를 부여하기 위해서는 층화변수와 집락변수를 사용하여 가중치를 적용하는 복합표본분석이 필요하다. 나아가, 빈도 가중치만을 적용하는 경우에는 연구 결과에 대한 과잉해석의 가능성이 높기 때문에 각별한 주의가 요구된다.

이단계표본추출을 이용한 소결핵병 유병률 추정 (Two-stage Sampling for Estimation of Prevalence of Bovine Tuberculosis)

  • 박선일
    • 한국임상수의학회지
    • /
    • 제28권4호
    • /
    • pp.422-426
    • /
    • 2011
  • For a national survey in which wide geographic region or an entire country is targeted, multi-stage sampling approach is widely used to overcome the problem of simple random sampling, to consider both herd- and animallevel factors associated with disease occurrence, and to adjust clustering effect of disease in the population in the calculation of sample size. The aim of this study was to establish sample size for estimating bovine tuberculosis (TB) in Korea using stratified two-stage sampling design. The sample size was determined by taking into account the possible clustering of TB-infected animals on individual herds to increase the reliability of survey results. In this study, the country was stratified into nine provinces (administrative unit) and herd, the primary sampling unit, was considered as a cluster. For all analyses, design effect of 2, between-cluster prevalence of 50% to yield maximum sample size, and mean herd size of 65 were assumed due to lack of information available. Using a two-stage sampling scheme, the number of cattle sampled per herd was 65 cattle, regardless of confidence level, prevalence, and mean herd size examined. Number of clusters to be sampled at a 95% level of confidence was estimated to be 296, 74, 33, 19, 12, and 9 for desired precision of 0.01, 0.02, 0.03, 0.04, 0.05, and 0.06, respectively. Therefore, the total sample size with a 95% confidence level was 172,872, 43,218, 19,224, 10,818, 6,930, and 4,806 for desired precision ranging from 0.01 to 0.06. The sample size was increased with desired precision and design effect. In a situation where the number of cattle sampled per herd is fixed ranging from 5 to 40 with a 5-head interval, total sample size with a 95% confidence level was estimated to be 6,480, 10,080, 13,770, 17,280, 20.925, 24,570, 28,350, and 31,680, respectively. The percent increase in total sample size resulting from the use of intra-cluster correlation coefficient of 0.3 was 22.2, 32.1, 36.3, 39.6, 41.9, 42.9, 42,2, and 44.3%, respectively in comparison to the use of coefficient of 0.2.

계속조사에서 응답률을 반영한 표본크기 (Sample size using response rate on repeated surveys)

  • 박현아;나성룡
    • 응용통계연구
    • /
    • 제31권5호
    • /
    • pp.587-597
    • /
    • 2018
  • 조사목적에 부합하는 표본 자료를 얻기 위해서는 추출방법 및 조사방법 결정, 설문지 작성 등의 절차가 필요하며 중요한 결정 중 하나가 표본크기 공식의 적용이다. 표본크기 공식은 추출방법에 따른 목표오차와 총비용 등을 설정함으로써 결정되는데 본 논문에서는 단순임의추출에서 목표오차와 예상 응답률이 주어져 있을 때 과거 및 현재 시점의 모집단의 변동과 과거 자료의 추정오차 및 응답률을 사용한 표본크기 공식을 제안한다. 실제조사에서는 설계가중치 외에도 여러 가중치가 복합적으로 적용되는 추정량을 사용하고 있는데 본 논문에서는 설계가중치와 무응답 보정계수를 사용한 추정량에서의 표본크기 공식을 유도하며 이것은 시점별 조사방법이 달라질 경우 응답률에 차이가 발생하는 현상을 반영한 공식이 될 수 있다. 또한 모의 실험을 통하여 기존의 표본크기 공식과 비교함으로써 제안된 공식의 다양한 적용방안을 살펴본다.

Exploring modern machine learning methods to improve causal-effect estimation

  • Kim, Yeji;Choi, Taehwa;Choi, Sangbum
    • Communications for Statistical Applications and Methods
    • /
    • 제29권2호
    • /
    • pp.177-191
    • /
    • 2022
  • This paper addresses the use of machine learning methods for causal estimation of treatment effects from observational data. Even though conducting randomized experimental trials is a gold standard to reveal potential causal relationships, observational study is another rich source for investigation of exposure effects, for example, in the research of comparative effectiveness and safety of treatments, where the causal effect can be identified if covariates contain all confounding variables. In this context, statistical regression models for the expected outcome and the probability of treatment are often imposed, which can be combined in a clever way to yield more efficient and robust causal estimators. Recently, targeted maximum likelihood estimation and causal random forest is proposed and extensively studied for the use of data-adaptive regression in estimation of causal inference parameters. Machine learning methods are a natural choice in these settings to improve the quality of the final estimate of the treatment effect. We explore how we can adapt the design and training of several machine learning algorithms for causal inference and study their finite-sample performance through simulation experiments under various scenarios. Application to the percutaneous coronary intervention (PCI) data shows that these adaptations can improve simple linear regression-based methods.

Use of Protective Gloves in Nail Salons in Manhattan, New York City

  • Basch, Corey;Yarborough, Christina;Trusty, Stephanie;Basch, Charles
    • Journal of Preventive Medicine and Public Health
    • /
    • 제49권4호
    • /
    • pp.249-251
    • /
    • 2016
  • Objectives: Nail salon owners in New York City (NYC) are required to provide their workers with gloves and it is their responsibility to maintain healthy, safe working spaces for their employees. The purpose of this study was to determine the frequency with which nail salon workers wear protective gloves. Methods: A Freedom of Information Law request was submitted to New York Department of State's Division of Licensing Services for a full list of nail salons in Manhattan, NYC. A sample population of 800 nail salons was identified and a simple random sample (without replacement) of 30% (n=240) was selected using a random number generator. Researchers visited each nail salon from October to December of 2015, posing as a potential customer to determine if nail salon workers were wearing gloves. Results: Among the 169 salons in which one or more workers was observed providng services, a total of 562 workers were observed. For 149 salons, in which one or more worker was observed providing services, none of the workers were wearing gloves. In contrast, in six of the salons observed, in which one or more workers was providing services, all of the workers (1 in 2 sites, 2 in 1 site, 3 in 2 sites, and 4 in 1 site) were wearing gloves. Almost three-quarters of the total number of workers observed (n=415, 73.8%) were not wearing gloves. Conclusions: The findings of this study indicate that, despite recent media attention and legislation, the majority of nail salon workers we observed were not wearing protective gloves when providing services.