• Title/Summary/Keyword: Statistical data analyses

Search Result 1,105, Processing Time 0.034 seconds

Inappropriate Survey Design Analysis of the Korean National Health and Nutrition Examination Survey May Produce Biased Results

  • Kim, Yangho;Park, Sunmin;Kim, Nam-Soo;Lee, Byung-Kook
    • Journal of Preventive Medicine and Public Health
    • /
    • v.46 no.2
    • /
    • pp.96-104
    • /
    • 2013
  • Objectives: The inherent nature of the Korean National Health and Nutrition Examination Survey (KNHANES) design requires special analysis by incorporating sample weights, stratification, and clustering not used in ordinary statistical procedures. Methods: This study investigated the proportion of research papers that have used an appropriate statistical methodology out of the research papers analyzing the KNHANES cited in the PubMed online system from 2007 to 2012. We also compared differences in mean and regression estimates between the ordinary statistical data analyses without sampling weight and design-based data analyses using the KNHANES 2008 to 2010. Results: Of the 247 research articles cited in PubMed, only 19.8% of all articles used survey design analysis, compared with 80.2% of articles that used ordinary statistical analysis, treating KNHANES data as if it were collected using a simple random sampling method. Means and standard errors differed between the ordinary statistical data analyses and design-based analyses, and the standard errors in the design-based analyses tended to be larger than those in the ordinary statistical data analyses. Conclusions: Ignoring complex survey design can result in biased estimates and overstated significance levels. Sample weights, stratification, and clustering of the design must be incorporated into analyses to ensure the development of appropriate estimates and standard errors of these estimates.

Statistical analyses on the damage consequences of occupational accidents in construction work (건설공사 노동재해의 피해강도 및 규모특성에 관한 통계분석)

  • 최기봉
    • Journal of the Korean Society of Safety
    • /
    • v.13 no.1
    • /
    • pp.104-111
    • /
    • 1998
  • Statistical analyses of occupational accidents associated with construction work were carried out to explore the basic statistical characteristics of their damage consequences. Emphasis was placed upon the probabilistic and statistical analyses to clarify, in particular, the relationship between frequency of labour accidents and their damage consequences. Damage consequences were classified into two categories such as the number of workdays lost due to accidents and the number of injured workers involved in one accident. Two types of accident data were collected for the analyses. From the analyses, it was found that the relation between damage due to accidents and their frequencies can be represented by a simple power function which indicates a log-log linear relation. By making use of this relationship, various probabilistic evaluations such as the estimation of the mean time periods between accidents, expected damage consequences, and expected damage ratio between different mean time period of accidents were conducted.

  • PDF

Statistical Methods for Gene Expression Data

  • Kim, Choongrak
    • Communications for Statistical Applications and Methods
    • /
    • v.11 no.1
    • /
    • pp.59-77
    • /
    • 2004
  • Since the introduction of DNA microarray, a revolutionary high through-put biological technology, a lot of papers have been published to deal with the analyses of the gene expression data from the microarray. In this paper we review most papers relevant to the cDNA microarray data, classify them in statistical methods' point of view, and present some statistical methods deserving consideration and future study.

Resistant Singular Value Decomposition and Its Statistical Applications

  • Park, Yong-Seok;Huh, Myung-Hoe
    • Journal of the Korean Statistical Society
    • /
    • v.25 no.1
    • /
    • pp.49-66
    • /
    • 1996
  • The singular value decomposition is one of the most useful methods in the area of matrix computation. It gives dimension reduction which is the centeral idea in many multivariate analyses. But this method is not resistant, i.e., it is very sensitive to small changes in the input data. In this article, we derive the resistant version of singular value decomposition for principal component analysis. And we give its statistical applications to biplot which is similar to principal component analysis in aspects of the dimension reduction of an n x p data matrix. Therefore, we derive the resistant principal component analysis and biplot based on the resistant singular value decomposition. They provide graphical multivariate data analyses relatively little influenced by outlying observations.

  • PDF

A guideline for the statistical analysis of compositional data in immunology

  • Yoo, Jinkyung;Sun, Zequn;Greenacre, Michael;Ma, Qin;Chung, Dongjun;Kim, Young Min
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.4
    • /
    • pp.453-469
    • /
    • 2022
  • The study of immune cellular composition has been of great scientific interest in immunology because of the generation of multiple large-scale data. From the statistical point of view, such immune cellular data should be treated as compositional. In compositional data, each element is positive, and all the elements sum to a constant, which can be set to one in general. Standard statistical methods are not directly applicable for the analysis of compositional data because they do not appropriately handle correlations between the compositional elements. In this paper, we review statistical methods for compositional data analysis and illustrate them in the context of immunology. Specifically, we focus on regression analyses using log-ratio transformations and the alternative approach using Dirichlet regression analysis, discuss their theoretical foundations, and illustrate their applications with immune cellular fraction data generated from colorectal cancer patients.

A Study of Statistical Methods in the Water Environmental Research of Han and Nakdong River Basins

  • Lee, Sang-Bock;Kim, Mal-Suk
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.1
    • /
    • pp.23-35
    • /
    • 2003
  • This paper provides the checklist of statistical methods in the water environmental research of Han and Nakdong river basins in South Korea. There are many errors pointed out in adopting statistical methods for the researches, as an example, basic statistical assumptions are missed for t-tests or regression analyses. Some new ideas are proposed for better researches of the river basins in Korea.

  • PDF

Nonparametric Analysis of Warranty Data on Engine : Case Study (엔진에 대한 품질보증데이터의 비모수적 분석 사례연구)

  • Baik, Jai-Wook;Jo, Jin-Nam
    • Journal of Korean Society for Quality Management
    • /
    • v.34 no.1
    • /
    • pp.40-47
    • /
    • 2006
  • Claim history data of rather long period were collected to assess reliability and warranty cost analyses. The data were appropriately organized to be used for further statistical analyses. For each critical component, nonparametric statistical method was applied to obtain reliability plot. Hazard plots of the components in a subsystem or system level were also obtained. Competing risk model was assumed to obtain the performance of the subsystem or system level.

Methodology of data analyses under presence of outliers for estimating construction cost (공사비 예측시 이상값 존재하에서 데이터 처리 분석 방안)

  • O, Se-Dae;Huh, Young-Ki
    • KIEAE Journal
    • /
    • v.7 no.3
    • /
    • pp.31-37
    • /
    • 2007
  • Statistical analyses with actual data are used in estimating construction cost for many years, but collected data could include factors that distort analytical results, namely outliers. To enhance reliability in predicting construction cost, the methodology, which is able to identify outliers and determine how to manage them, is needed. Actual costs obtained from 22 construction projects were studied. It is found that there is substantial disparity between results considering outliers and results not considering ones. Therefore, it is to identify outliers and apply an optimum process in estimating construction cost when actual data is used in statistical analysis.

A review of analysis methods for secondary outcomes in case-control studies

  • Schifano, Elizabeth D.
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.2
    • /
    • pp.103-129
    • /
    • 2019
  • The main goal of a case-control study is to learn the association between various risk factors and a primary outcome (e.g., disease status). Particularly recently, it is also quite common to perform secondary analyses of the case-control data in order to understand certain associations between the risk factors of the primary outcome. It has been repeatedly documented with case-control data, association studies of the risk factors that ignore the case-control sampling scheme can produce highly biased estimates of the population effects. In this article, we review the issues of the naive secondary analyses that do not account for the biased sampling scheme, and also the various methods that have been proposed to account for the case-control ascertainment. We additionally compare the results of many of the discussed methods in an example examining the association of a particular genetic variant with smoking behavior, where the data were obtained from a lung cancer case-control study.

Explanatory Analysis for South Korea's Political Website Linking - Statistical Aspects

  • Choi, Kyoung-Ho;Park, Han-Woo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.4
    • /
    • pp.899-911
    • /
    • 2005
  • This paper conducts an explanatory analysis of the web sphere produced by National Assemblymen in South Korea, using some statistical methods. First, some descriptive metrics were employed. Next, the traditional methods of multi-variate analyses, multidimensional scaling and corresponding analysis, were applied to the data. Finally, cross-sectional data were compared to examine a change over time.

  • PDF