• 제목/요약/키워드: Statistical error analysis

검색결과 724건 처리시간 0.025초

Linear regression under log-concave and Gaussian scale mixture errors: comparative study

  • Kim, Sunyul;Seo, Byungtae
    • Communications for Statistical Applications and Methods
    • /
    • 제25권6호
    • /
    • pp.633-645
    • /
    • 2018
  • Gaussian error distributions are a common choice in traditional regression models for the maximum likelihood (ML) method. However, this distributional assumption is often suspicious especially when the error distribution is skewed or has heavy tails. In both cases, the ML method under normality could break down or lose efficiency. In this paper, we consider the log-concave and Gaussian scale mixture distributions for error distributions. For the log-concave errors, we propose to use a smoothed maximum likelihood estimator for stable and faster computation. Based on this, we perform comparative simulation studies to see the performance of coefficient estimates under normal, Gaussian scale mixture, and log-concave errors. In addition, we also consider real data analysis using Stack loss plant data and Korean labor and income panel data.

Restricted maximum likelihood estimation of a censored random effects panel regression model

  • Lee, Minah;Lee, Seung-Chun
    • Communications for Statistical Applications and Methods
    • /
    • 제26권4호
    • /
    • pp.371-383
    • /
    • 2019
  • Panel data sets have been developed in various areas, and many recent studies have analyzed panel, or longitudinal data sets. Maximum likelihood (ML) may be the most common statistical method for analyzing panel data models; however, the inference based on the ML estimate will have an inflated Type I error because the ML method tends to give a downwardly biased estimate of variance components when the sample size is small. The under estimation could be severe when data is incomplete. This paper proposes the restricted maximum likelihood (REML) method for a random effects panel data model with a censored dependent variable. Note that the likelihood function of the model is complex in that it includes a multidimensional integral. Many authors proposed to use integral approximation methods for the computation of likelihood function; however, it is well known that integral approximation methods are inadequate for high dimensional integrals in practice. This paper introduces to use the moments of truncated multivariate normal random vector for the calculation of multidimensional integral. In addition, a proper asymptotic standard error of REML estimate is given.

경락경혈학회지에 게재된 논문의 통계적 오류에 관한 고찰(2007~2011년) (Analysis of the Statistical Errors in Articles of The Korean Journal of Meridian and Acupuncture)

  • 이민희;강경원;김정은;최선미;이상훈
    • Korean Journal of Acupuncture
    • /
    • 제29권4호
    • /
    • pp.573-580
    • /
    • 2012
  • Objectives : This study was to investigate statistical validities and trends of previously reported papers that used various statistical techniques such as t-test and analysis of variance. Methods : To analyze the statistical procedures, 54 original articles using those statistical methods were selected from The Korean Journal of Acupuncture published from 2007 to 2011. Results : T-test and analysis of variance were used in 23(25.27%), and 18 papers(19.78%) out of 54 papers, respectively. Seven articles(12.96%) did not report alpha values and 26(48.15%) out of 54 studies were not tested for normal distribution. One paper(1.85%) misused t-test and 7 papers(38.89%) did not carry out the multiple comparison. Conclusions : To improve the quality of KJA, statistician involvement in research design would be necessary to reduce errors in statistical methods and interpretation of the results.

Efficiency of Aggregate Data in Non-linear Regression

  • Huh, Jib
    • Communications for Statistical Applications and Methods
    • /
    • 제8권2호
    • /
    • pp.327-336
    • /
    • 2001
  • This work concerns estimating a regression function, which is not linear, using aggregate data. In much of the empirical research, data are aggregated for various reasons before statistical analysis. In a traditional parametric approach, a linear estimation of the non-linear function with aggregate data can result in unstable estimators of the parameters. More serious consequence is the bias in the estimation of the non-linear function. The approach we employ is the kernel regression smoothing. We describe the conditions when the aggregate data can be used to estimate the regression function efficiently. Numerical examples will illustrate our findings.

  • PDF

Tree-structured Classification based on Variable Splitting

  • Ahn, Sung-Jin
    • Communications for Statistical Applications and Methods
    • /
    • 제2권1호
    • /
    • pp.74-88
    • /
    • 1995
  • This article introduces a unified method of choosing the most explanatory and significant multiway partitions for classification tree design and analysis. The method is derived on the impurity reduction (IR) measure of divergence, which is proposed to extend the proportional-reduction-in-error (PRE) measure in the decision-theory context. For the method derivation, the IR measure is analyzed to characterize its statistical properties which are used to consistently handle the subjects of feature formation, feature selection, and feature deletion required in the associated classification tree construction. A numerical example is considered to illustrate the proposed approach.

  • PDF

Performance Analysis of Single Bluetooth Piconet in Error-Prone Environments

  • Shin, Soo-Young;Park, Hong-Seong;Kim, Dong-Sung;Kwon, Wook-Hyun
    • Journal of Communications and Networks
    • /
    • 제9권3호
    • /
    • pp.229-235
    • /
    • 2007
  • This paper analyzes the performance of a Bluetooth piconet in error-prone environments. A statistical characterization of a waiting time, an end-to-end delay, and a goodput are derived analytically in terms of the arrival rates, the number of slaves, and the packet error rate (PER). For simplicity, half-symmetric piconet is assumed in this analysis. Both exhaustive and limited scheduling are considered. The analytic results are validated by simulations.

공간 격자데이터 분석에 대한 우위성 비교 연구 - 이상치가 존재하는 경우 - (A Comparative Study on Spatial Lattice Data Analysis - A Case Where Outlier Exists -)

  • 김수정;최승배;강창완;조장식
    • Communications for Statistical Applications and Methods
    • /
    • 제17권2호
    • /
    • pp.193-204
    • /
    • 2010
  • 최근들어 공간적으로 분석을 필요로 하는 여러 분야에서의 연구자들은 공간통계학에 많은 관심을 가지게 되었다. 그리고 통계학 분야 역시 공간상에서 얻어진 데이터에 공간자기상관이 존재할 경우 공간적으로 분석해야 한다는 주장과 함께 많은 연구가 진행되고 있다. 공간통계학에서 다루고 있는 데이터 중에서 '공간 격자데이터 분석'은 (1) 공간이웃의 정의, (2) 공간이웃 가중치의 정의, (3) 공간모형의 적용 등의 단계를 거쳐서 행해진다. 본 연구에서는 이상치가 존재하는 공간 격자데이터를 분석할 경우 절사평균제곱오차를 이용하여 분석함으로써 예측적인 측면에서 공간통계학적 방법이 일반통계학적 방법보다 더 우수함을 보인다. 본 연구에 대한 내용의 타당성을 보이기 위해서 시뮬레이션을 통하여 공간통계학적인 방법과 일반통계학적인 방법을 비교하였다. 그리고 부산진구의 실제 범죄데이터를 이용한 적용사례를 통하여 절사평균제곱오차를 사용한 공간통계학적 방법의 유용성을 알아보았다.

Analysis of Statistical Methods Currently used in Toxicology Journals

  • Na, Jihye;Yang, Hyeri;Bae, SeungJin;Lim, Kyung-Min
    • Toxicological Research
    • /
    • 제30권3호
    • /
    • pp.185-191
    • /
    • 2014
  • Statistical methods are frequently used in toxicology, yet it is not clear whether the methods employed by the studies are used consistently and conducted based on sound statistical grounds. The purpose of this paper is to describe statistical methods used in top toxicology journals. More specifically, we sampled 30 papers published in 2014 from Toxicology and Applied Pharmacology, Archives of Toxicology, and Toxicological Science and described methodologies used to provide descriptive and inferential statistics. One hundred thirteen endpoints were observed in those 30 papers, and most studies had sample size less than 10, with the median and the mode being 6 and 3 & 6, respectively. Mean (105/113, 93%) was dominantly used to measure central tendency, and standard error of the mean (64/113, 57%) and standard deviation (39/113, 34%) were used to measure dispersion, while few studies provide justifications regarding why the methods being selected. Inferential statistics were frequently conducted (93/113, 82%), with one-way ANOVA being most popular (52/93, 56%), yet few studies conducted either normality or equal variance test. These results suggest that more consistent and appropriate use of statistical method is necessary which may enhance the role of toxicology in public health.

UNCERTAINTIES IN THE STAR-COUNT ANALYSIS

  • Hong, Seung-Soo;Lee, See-Woo
    • 천문학회지
    • /
    • 제21권2호
    • /
    • pp.155-171
    • /
    • 1988
  • We have examined how sensitively the extinction value determined by the method of star-count depends on such factors as the plate limit, the size of counting reseau, the non-linearity in the number distribution of stars with magnitude, and the angular resolution demanded by the given problem. We let the Poisson distribution portray the statistical nature of the countings, and chose the region containing the globule Barnard 361 as an example field. Uncertainties due to various combinations of the factors are presented in graphic forms: (1) Dynamic range in the extinction measurements is evaluated as a function of reseau size for varying plate limits. (2) Statistical errors involved in the star-count are analized in terms of the signal-to-noise ratio, the plate limit and the reseau size. (3) Systematic error due to the non-linearity in the number distribution are thoroughly analized. (4) Finally, a methodology is presented for correcting the systematic error in the observed radial density gradient. These graphs are meant to be used in selecting proper size of the reseau and in estimating errors inherent to the star-count analysis.

  • PDF

Selection of Canonical Factors in Second Order Response Surface Models

  • Park, Sung H.;Seong K. Han
    • Journal of the Korean Statistical Society
    • /
    • 제30권4호
    • /
    • pp.585-595
    • /
    • 2001
  • A second-order response surface model is often used to approximate the relationship between a response factor and a set of explanatory factors. In this article, we deal with canonical analysis in response surface models. For the interpretation of the geometry of second-order response surface model, standard errors and confidence intervals for the eigenvalues of the second-order coefficient matrix play an important role. If the confidence interval for some eigenvalue includes 0 or the estimate of some eigenvalue is very small (near to 0) with respect to other eigenvalues, then we are able to delete the corresponding canonical factor. We propose a formulation of criterion which can be used to select canonical factors. This criterion is based on the IMSE(=Integrated Mean Squared Error). As a result of this method, we may approximately write the canonical factors as a set of some important explanatory factors.

  • PDF