• 제목/요약/키워드: empirical type I error

검색결과 15건 처리시간 0.028초

A comparison of tests for homoscedasticity using simulation and empirical data

  • Anastasios Katsileros;Nikolaos Antonetsis;Paschalis Mouzaidis;Eleni Tani;Penelope J. Bebeli;Alex Karagrigoriou
    • Communications for Statistical Applications and Methods
    • /
    • 제31권1호
    • /
    • pp.1-35
    • /
    • 2024
  • The assumption of homoscedasticity is one of the most crucial assumptions for many parametric tests used in the biological sciences. The aim of this paper is to compare the empirical probability of type I error and the power of ten parametric and two non-parametric tests for homoscedasticity with simulations under different types of distributions, number of groups, number of samples per group, variance ratio and significance levels, as well as through empirical data from an agricultural experiment. According to the findings of the simulation study, when there is no violation of the assumption of normality and the groups have equal variances and equal number of samples, the Bhandary-Dai, Cochran's C, Hartley's Fmax, Levene (trimmed mean) and Bartlett tests are considered robust. The Levene (absolute and square deviations) tests show a high probability of type I error in a small number of samples, which increases as the number of groups rises. When data groups display a nonnormal distribution, researchers should utilize the Levene (trimmed mean), O'Brien and Brown-Forsythe tests. On the other hand, if the assumption of normality is not violated but diagnostic plots indicate unequal variances between groups, researchers are advised to use the Bartlett, Z-variance, Bhandary-Dai and Levene (trimmed mean) tests. Assessing the tests being considered, the test that stands out as the most well-rounded choice is the Levene's test (trimmed mean), which provides satisfactory type I error control and relatively high power. According to the findings of the study and for the scenarios considered, the two non-parametric tests are not recommended. In conclusion, it is suggested to initially check for normality and consider the number of samples per group before choosing the most appropriate test for homoscedasticity.

Comprehensive comparison of normality tests: Empirical study using many different types of data

  • Lee, Chanmi;Park, Suhwi;Jeong, Jaesik
    • Journal of the Korean Data and Information Science Society
    • /
    • 제27권5호
    • /
    • pp.1399-1412
    • /
    • 2016
  • We compare many normality tests consisting of different sources of information extracted from the given data: Anderson-Darling test, Kolmogorov-Smirnov test, Cramervon Mises test, Shapiro-Wilk test, Shaprio-Francia test, Lilliefors, Jarque-Bera test, D'Agostino' D, Doornik-Hansen test, Energy test and Martinzez-Iglewicz test. For the purpose of comparison, those tests are applied to the various types of data generated from skewed distribution, unsymmetric distribution, and distribution with different length of support. We then summarize comparison results in terms of two things: type I error control and power. The selection of the best test depends on the shape of the distribution of the data, implying that there is no test which is the most powerful for all distributions.

MEASURING THE INFLUENCE OF TASK COMPLEXITY ON HUMAN ERROR PROBABILITY: AN EMPIRICAL EVALUATION

  • Podofillini, Luca;Park, Jinkyun;Dang, Vinh N.
    • Nuclear Engineering and Technology
    • /
    • 제45권2호
    • /
    • pp.151-164
    • /
    • 2013
  • A key input for the assessment of Human Error Probabilities (HEPs) with Human Reliability Analysis (HRA) methods is the evaluation of the factors influencing the human performance (often referred to as Performance Shaping Factors, PSFs). In general, the definition of these factors and the supporting guidance are such that their evaluation involves significant subjectivity. This affects the repeatability of HRA results as well as the collection of HRA data for model construction and verification. In this context, the present paper considers the TAsk COMplexity (TACOM) measure, developed by one of the authors to quantify the complexity of procedure-guided tasks (by the operating crew of nuclear power plants in emergency situations), and evaluates its use to represent (objectively and quantitatively) task complexity issues relevant to HRA methods. In particular, TACOM scores are calculated for five Human Failure Events (HFEs) for which empirical evidence on the HEPs (albeit with large uncertainty) and influencing factors are available - from the International HRA Empirical Study. The empirical evaluation has shown promising results. The TACOM score increases as the empirical HEP of the selected HFEs increases. Except for one case, TACOM scores are well distinguished if related to different difficulty categories (e.g., "easy" vs. "somewhat difficult"), while values corresponding to tasks within the same category are very close. Despite some important limitations related to the small number of HFEs investigated and the large uncertainty in their HEPs, this paper presents one of few attempts to empirically study the effect of a performance shaping factor on the human error probability. This type of study is important to enhance the empirical basis of HRA methods, to make sure that 1) the definitions of the PSFs cover the influences important for HRA (i.e., influencing the error probability), and 2) the quantitative relationships among PSFs and error probability are adequately represented.

Tests for equivalence/non-inferiority based on odds ratio in matched-pair design

  • 고혜정;이재원
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2003년도 추계 학술발표회 논문집
    • /
    • pp.319-324
    • /
    • 2003
  • 본 논문에서는 matched-pair design에서의 두 처리간 동등성/ 비열등성 검정에 대해 고려하였다. 기존에 비율차이나 risk ratio관점에서 동등성/비열등성 검정을 시행한 것과는 달리, 본 논문에서는 odds ratio에 기초하여 두 가지 검정통계량을 유도하였다. (1) constrained maximum likelihood estimator(mle)를 이용한 fieller type 통계량 (2) 제약없이 구한 mle를 사용한 wald-type 통계량). 비율 차이나 risk ratio에 기초한 기존의 통계적 방법들(비율차이에 근거한 (3) score-type 통계량과 (4) wald-type 통계량, risk ratio에 기초한 (5) fieller-type 통계량과 (6) wald-type 통계량)과 본 논문에서 제시한 두가지 통계량의 성능을 비교하기 위해 모의실험을 시행하였다. 모의실험 결과, 본 논문에서 제안한 constrained mle를 사용한 fieller type 통계량은 empirical type I error 측면에서 매우 만족스러운 결과를 보이고 있으며, 특히 비대각 셀의 확률이 작아질 경우에도 안정적인 성능을 보여준다.

  • PDF

A Jarque-Bera type test for multivariate normality based on second-power skewness and kurtosis

  • Kim, Namhyun
    • Communications for Statistical Applications and Methods
    • /
    • 제28권5호
    • /
    • pp.463-475
    • /
    • 2021
  • Desgagné and de Micheaux (2018) proposed an alternative univariate normality test to the Jarque-Bera test. The proposed statistic is based on the sample second power skewness and kurtosis while the Jarque-Bera statistic uses sample Pearson's skewness and kurtosis that are the third and fourth standardized sample moments, respectively. In this paper, we generalize their statistic to a multivariate version based on orthogonalization or an empirical standardization of data. The proposed multivariate statistic follows chi-squared distribution approximately. A simulation study shows that the proposed statistic has good control of type I error even for a very small sample size when critical values from the approximate distribution are used. It has comparable power to the multivariate version of the Jarque-Bera test with exactly the same idea of the orthogonalization. It also shows much better power for some mixed normal alternatives.

토양수분함량 예측 및 계획관개 모의 모형 개발에 관한 연구(I) (A Study on the Development of a Simulation Model for Predicting Soil Moisture Content and Scheduling Irrigation)

  • 김철회;고재군
    • 한국농공학회지
    • /
    • 제19권1호
    • /
    • pp.4279-4295
    • /
    • 1977
  • Two types of model were established in order to product the soil moisture content by which information on irrigation could be obtained. Model-I was to represent the soil moisture depletion and was established based on the concept of water balance in a given soil profile. Model-II was a mathematical model derived from the analysis of soil moisture variation curves which were drawn from the observed data. In establishing the Model-I, the method and procedure to estimate parameters for the determination of the variables such as evapotranspirations, effective rainfalls, and drainage amounts were discussed. Empirical equations representing soil moisture variation curves were derived from the observed data as the Model-II. The procedure for forecasting timing and amounts of irrigation under the given soil moisture content was discussed. The established models were checked by comparing the observed data with those predicted by the model. Obtained results are summarized as follows: 1. As a water balance model of a given soil profile, the soil moisture depletion D, could be represented as the equation(2). 2. Among the various empirical formulae for potential evapotranspiration (Etp), Penman's formula was best fit to the data observed with the evaporation pans and tanks in Suweon area. High degree of positive correlation between Penman's predicted data and observed data with a large evaporation pan was confirmed. and the regression enquation was Y=0.7436X+17.2918, where Y represents evaporation rate from large evaporation pan, in mm/10days, and X represents potential evapotranspiration rate estimated by use of Penman's formula. 3. Evapotranspiration, Et, could be estimated from the potential evapotranspiration, Etp, by introducing the consumptive use coefficient, Kc, which was repre sensed by the following relationship: Kc=Kco$.$Ka+Ks‥‥‥(Eq. 6) where Kco : crop coefficient Ka : coefficient depending on the soil moisture content Ks : correction coefficient a. Crop coefficient. Kco. Crop coefficients of barley, bean, and wheat for each growth stage were found to be dependent on the crop. b. Coefficient depending on the soil moisture content, Ka. The values of Ka for clay loam, sandy loam, and loamy sand revealed a similar tendency to those of Pierce type. c. Correction coefficent, Ks. Following relationships were established to estimate Ks values: Ks=Kc-Kco$.$Ka, where Ks=0 if Kc,=Kco$.$K0$\geq$1.0, otherwise Ks=1-Kco$.$Ka 4. Effective rainfall, Re, was estimated by using following relationships : Re=D, if R-D$\geq$0, otherwise, Re=R 5. The difference between rainfall, R, and the soil moisture depletion D, was taken as drainage amount, Wd. {{{{D= SUM from { {i }=1} to n (Et-Re-I+Wd)}}}} if Wd=0, otherwise, {{{{D= SUM from { {i }=tf} to n (Et-Re-I+Wd)}}}} where tf=2∼3 days. 6. The curves and their corresponding empirical equations for the variation of soil moisture depending on the soil types, soil depths are shown on Fig. 8 (a,b.c,d). The general mathematical model on soil moisture variation depending on seasons, weather, and soil types were as follow: {{{{SMC= SUM ( { C}_{i }Exp( { - lambda }_{i } { t}_{i } )+ { Re}_{i } - { Excess}_{i } )}}}} where SMC : soil moisture content C : constant depending on an initial soil moisture content $\lambda$ : constant depending on season t : time Re : effective rainfall Excess : drainage and excess soil moisture other than drainage. The values of $\lambda$ are shown on Table 1. 7. The timing and amount of irrigation could be predicted by the equation (9-a) and (9-b,c), respectively. 8. Under the given conditions, the model for scheduling irrigation was completed. Fig. 9 show computer flow charts of the model. a. To estimate a potential evapotranspiration, Penman's equation was used if a complete observed meteorological data were available, and Jensen-Haise's equation was used if a forecasted meteorological data were available, However none of the observed or forecasted data were available, the equation (15) was used. b. As an input time data, a crop carlender was used, which was made based on the time when the growth stage of the crop shows it's maximum effective leaf coverage. 9. For the purpose of validation of the models, observed data of soil moiture content under various conditions from May, 1975 to July, 1975 were compared to the data predicted by Model-I and Model-II. Model-I shows the relative error of 4.6 to 14.3 percent which is an acceptable range of error in view of engineering purpose. Model-II shows 3 to 16.7 percent of relative error which is a little larger than the one from the Model-I. 10. Comparing two models, the followings are concluded: Model-I established on the theoretical background can predict with a satisfiable reliability far practical use provided that forecasted meteorological data are available. On the other hand, Model-II was superior to Model-I in it's simplicity, but it needs long period and wide scope of observed data to predict acceptable soil moisture content. Further studies are needed on the Model-II to make it acceptable in practical use.

  • PDF

범주형 자료에서 경험적 베이지안 오분류 분석 (Empirical Bayesian Misclassification Analysis on Categorical Data)

  • 임한승;홍종선;서문섭
    • 응용통계연구
    • /
    • 제14권1호
    • /
    • pp.39-57
    • /
    • 2001
  • 범주형 자료에서 오분류는 자료를 수집하는 과정에서 발생될 수 있다. 오분류되어 있는 자료를 정확한 자료로 간주하여 분석한다면 추정결과에 편의가 발생하고 검정력이 약화되는 결과를 초래하게 되며, 정확하게 분류된 자료를 오분류하고 판단한다면 오분류의 수정을 위해 불필요한 비용과 시간을 낭비해야 할 것이다. 따라서 정확하게 분류된 표본인지 오분류된 표본인지를 판정하는 것은 자료를 분석하기 전에 이루어져야할 매우 중요한 과정이다. 본 논문은 I$\times$J 분할표로 주어지는 범주형 자료에서 두 변수 중 하나의 변수에서만 오분류가 발생되는 경우에 오분류 여부를 검정하기 위해서 오분류 가능성이 없는 변수에 대한 주변합은 고정시키고, 오분류 여부를 가능성이 있는 변수의 주변합을 Sebastiani와 Ramoni(1997)가 제안한 Bound와 외부정보로 표현되는 Collapse의 개념, 그리고 베이지안 방법을 확장하여 자료에 적합한 모형과 사전정보를 고려한 사전모수를 다양하게 설정하면서 재분류하는 연구를 하였다. 오분류에 대한 정보를 얻기 위해서 Tenenbein(1970)에 의해 연구된 이중추출법을 이용하여 오분류 검정을 위한 새로운 통계량을 제안하였으며, 제안된 오분류 검정통계량에 관한 분포를 다양한 모의실험을 통하여 연구하였다.

  • PDF

비대칭 오류비용을 고려한 분류기준값 최적화와 SVM에 기반한 지능형 침입탐지모형 (An Intelligent Intrusion Detection Model Based on Support Vector Machines and the Classification Threshold Optimization for Considering the Asymmetric Error Cost)

  • 이현욱;안현철
    • 지능정보연구
    • /
    • 제17권4호
    • /
    • pp.157-173
    • /
    • 2011
  • 최근 인터넷 사용의 증가에 따라 네트워크에 연결된 시스템에 대한 악의적인 해킹과 침입이 빈번하게 발생하고 있으며, 각종 시스템을 운영하는 정부기관, 관공서, 기업 등에서는 이러한 해킹 및 침입에 의해 치명적인 타격을 입을 수 있는 상황에 놓여 있다. 이에 따라 인가되지 않았거나 비정상적인 활동들을 탐지, 식별하여 적절하게 대응하는 침입탐지 시스템에 대한 관심과 수요가 높아지고 있으며, 침입탐지 시스템의 예측성능을 개선하려는 연구 또한 활발하게 이루어지고 있다. 본 연구 역시 침입탐지 시스템의 예측성능을 개선하기 위한 새로운 지능형 침입탐지모형을 제안한다. 본 연구의 제안모형은 비교적 높은 예측력을 나타내면서 동시에 일반화 능력이 우수한 것으로 알려진 Support Vector Machine(SVM)을 기반으로, 비대칭 오류비용을 고려한 분류기준값 최적화를 함께 반영하여 침입을 효과적으로 차단할 수 있도록 설계되었다. 제안모형의 우수성을 확인하기 위해, 기존 기법인 로지스틱 회귀분석, 의사결정나무, 인공신경망과의 결과를 비교하였으며 그 결과 제안하는 SVM 모형이 다른 기법에 비해 상대적으로 우수한 성과를 보임을 확인할 수 있었다.

위상이동 광탄성법과 멱급수형 응력함수를 이용한 인장시편 중앙 균열선단 주위 응력장 해석 (Analysis of Stress Distribution around a Central Crack Tip in a Tensile Plate Using Phase-Shifting Photoelasticity and a Power Series Stress Function)

  • 백태현
    • 비파괴검사학회지
    • /
    • 제29권1호
    • /
    • pp.1-9
    • /
    • 2009
  • 본 연구에서는 균열선단 주위의 응력장을 균열선단으로부터 멀리 떨어진 직선상에서 위상이동 광탄성법과 멱급수형 등각사상 맵핑함수를 이용하여 해석하였다. 해석된 광탄성 응력장을 실제의 광탄성프린지와 비교하였다. 정성적인 비교가 용이하도록 디지털 영상처리에 의해 등색프린지 패턴을 2배로 증식시키고, 증식된 프린지를 다시 세선 처리하여 서로 비교하였다. 정량적인 분석을 위하여 각각의 광탄성 측정 데이터와 계산된 프린지에 대한 퍼센트 오차와 멱급수형 응력함수의 항의 수에 따른 퍼센트 오차에 대한 표준편차를 비교하였다. 응력함수의 항의 수를 변화시켰을 때 표준편차를 계산하였다. 해석 결과 모드I 응력확대계수는 유한요소법과 경험식으로 계산한 값과 2% 이내로 근접하였다.

다중 회귀 분석을 활용한 Tee-Pipe 버링 공정에서 찢어짐 방지를 위한 피어싱 펀치 형상 최적 설계 (Multiple Regression Analysis for Piercing Punch Profile Optimization to Prevent Tearing During Tee Pipe Burring)

  • 이영섭;김준영;강정식;홍석무
    • 소성∙가공
    • /
    • 제26권5호
    • /
    • pp.271-276
    • /
    • 2017
  • A tee is the most common pipefitting used to combine or divide fluid flow. Tees can connect pipes of different diameters or change the direction of a pipe run. To manufacture tee type of stainless steel pipe, combinations of punch piercing and burr forming have been widely used in the industry. However, such method is considerably time consuming with regard to performing empirical work necessary to attain process conditions to prevent upper end tearing of the tee product and meet target tee height. Numerous experiments have shown that the piercing profile is the main cause of defects mentioned above. Furthermore, the mold design is formed through trial and error according to pipe diameters and changes in requirements. Thus, the objective of this study was to perform piercing and burring process analysis via finite element analysis using DYNAFORM to resolve problems mentioned above. An optimization design method was used to determine the piercing punch profile. Three radii of the piercing punch (i.e., large, small, and joined radii) were selected as design variables to minimize thinning of a tee pipe. Based on results of correlation and multiple regression analyses, we developed a predictive approximation model to satisfy requirements for both thickness reduction and target height. The new piercing punch profile was then applied to actual tee forming using the developed prediction equation. Model results were found to be in good agreement with experimental results.