• Title/Summary/Keyword: regression statistics

Search Result 5,318, Processing Time 0.026 seconds

Firework plot for evaluating the impact of influential observations in multi-response surface methodology (다반응 반응표면분석에서 특이값의 영향을 평가하기 위한 불꽃그림)

  • Kim, Sang Ik;Jang, Dae-Heung
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.1
    • /
    • pp.97-108
    • /
    • 2018
  • It has been routine practice in regression analysis to check the validity of the assumed model by the use of regression diagnostics tools. Outliers and influential observations often distort the regression output in an undesired manner. Jang and Anderson-Cook (Quality and Reliability Engineering International, 30, 1409-1425, 2014) proposed a graphical method (called a firework plot) so that there could be an exploratory visualization of the trace of the impact of the possible outliers and influential observations on individual regression coefficients and the overall residual sum of the squares measure. This paper further extends a graphical approach to a multi-response surface methodology problem.

Asymptotic Test for Dimensionality in Sliced Inverse Regression (분할 역회귀모형에서 차원결정을 위한 점근검정법)

  • Park, Chang-Sun;Kwak, Jae-Guen
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.2
    • /
    • pp.381-393
    • /
    • 2005
  • As a promising technique for dimension reduction in regression analysis, Sliced Inverse Regression (SIR) and an associated chi-square test for dimensionality were introduced by Li (1991). However, Li's test needs assumption of Normality for predictors and found to be heavily dependent on the number of slices. We will provide a unified asymptotic test for determining the dimensionality of the SIR model which is based on the probabilistic principal component analysis and free of normality assumption on predictors. Illustrative results with simulated and real examples will also be provided.

A Study on the effects of air pollution on circulatory health using spatial data (공간 자료를 이용한 대기오염이 순환기계 건강에 미치는 영향 분석)

  • Park, Jin-Ok;Choi, Ilsu;Na, Myung Hwan
    • Journal of Korean Society for Quality Management
    • /
    • v.44 no.3
    • /
    • pp.677-688
    • /
    • 2016
  • Purpose: In this study, we examine the effects of circulatory diseases mortality in South Korea 2005-2013 using the air pollution index, Methods: We cluster the region of high risk mortality by SaTScan$^{TM}$9.3.1 and compare this result with the regional distribution of air pollution. We use the Geographically Weighted Regression (GWR) to consider the spatial heterogeneity of data collected by administrative district in order to estimate the model. As GWR is spatial analysis techniques utilizing the spatial information, regression model estimated for each region on the assumption that regression coefficients are different by region. Results: As a result of estimating model of the collected air pollution index, circulatory diseases mortality data combined with the spatial information, GWR was found to solve the problem of spatial autocorrelation and increase the fit of the model than OLS regression model. Conclusion: GWR is used to select the air pollution affecting the disease each year, the K-means cluster analysis discover the characteristics of the distribution of air pollution by region.

Bayesian inference for an ordered multiple linear regression with skew normal errors

  • Jeong, Jeongmun;Chung, Younshik
    • Communications for Statistical Applications and Methods
    • /
    • v.27 no.2
    • /
    • pp.189-199
    • /
    • 2020
  • This paper studies a Bayesian ordered multiple linear regression model with skew normal error. It is reasonable that the kind of inherent information available in an applied regression requires some constraints on the coefficients to be estimated. In addition, the assumption of normality of the errors is sometimes not appropriate in the real data. Therefore, to explain such situations more flexibly, we use the skew-normal distribution given by Sahu et al. (The Canadian Journal of Statistics, 31, 129-150, 2003) for error-terms including normal distribution. For Bayesian methodology, the Markov chain Monte Carlo method is employed to resolve complicated integration problems. Also, under the improper priors, the propriety of the associated posterior density is shown. Our Bayesian proposed model is applied to NZAPB's apple data. For model comparison between the skew normal error model and the normal error model, we use the Bayes factor and deviance information criterion given by Spiegelhalter et al. (Journal of the Royal Statistical Society Series B (Statistical Methodology), 64, 583-639, 2002). We also consider the problem of detecting an influential point concerning skewness using Bayes factors. Finally, concluding remarks are discussed.

Influence Assessment in Robust Regression

  • Sohn, Bang-Yong;Huh, Myung-Hoe
    • Communications for Statistical Applications and Methods
    • /
    • v.4 no.1
    • /
    • pp.21-32
    • /
    • 1997
  • Robust regression based on M-estimator reduces and/or bounds the influence of outliers in the y-direction only. Therefore, when several influential observations exist, diagnostics in the robust regression is required in order to detect them. In this paper, we propose influence diagnostics in the robust regression based on M-estimator and its one-step version. Noting that M-estimator can be obtained through iterative weighted least squares regression by using internal weights, we apply the weighted least squares (WLS) regression diagnostics to robust regression.

  • PDF

On a Nonparametric Test for Parallelism against Ordered Alternatives

  • Song, Moon Sup;Kim, Jaehee;Jean, Jong Woo;Park, Changsoon
    • Journal of Korean Society for Quality Management
    • /
    • v.17 no.2
    • /
    • pp.70-80
    • /
    • 1989
  • A nonparametric test for testing the parallelism of regression lines against ordered alternatives is proposed. The proposed test statistic is based on a linear combination of robust slope estimators. It is a modified version of the Adichie's test statistics based on scores. A snail-sample Monte Carlo study shows that the proposed test is compatible with the Adichie's test.

  • PDF

Suggestion of batter ability index in Korea baseball - focusing on the sabermetrics statistics WAR (한국프로야구에서 타자능력지수 제안 - 대체선수대비승수(WAR)을 중심으로)

  • Lee, Jea-Young;Kim, Hyeon-Gyu
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.7
    • /
    • pp.1271-1281
    • /
    • 2016
  • Wins above replacement (WAR) is one of the most widely used statistic among sabermatrics statistics that measure the ability of a batter in baseball. WAR has a great advantage that is to represent the attack power of the player and the base running ability, defensive ability as a single value. In this study, we proposed a hitter ability index using the sabermetrics statistics that can replace WAR based on Korea Baseball Record Data of the last three years (2013-2015). First, we calculated Batter ability index through the arithmetic mean method, the weighted average method, principal component regression and selected the method that had high correlation with WAR.

Improvement of Genetic Programming Based Nonlinear Regression Using ADF and Application for Prediction MOS of Wind Speed (ADF를 사용한 유전프로그래밍 기반 비선형 회귀분석 기법 개선 및 풍속 예보 보정 응용)

  • Oh, Seungchul;Seo, Kisung
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.64 no.12
    • /
    • pp.1748-1755
    • /
    • 2015
  • A linear regression is widely used for prediction problem, but it is hard to manage an irregular nature of nonlinear system. Although nonlinear regression methods have been adopted, most of them are only fit to low and limited structure problem with small number of independent variables. However, real-world problem, such as weather prediction required complex nonlinear regression with large number of variables. GP(Genetic Programming) based evolutionary nonlinear regression method is an efficient approach to attach the challenging problem. This paper introduces the improvement of an GP based nonlinear regression method using ADF(Automatically Defined Function). It is believed ADFs allow the evolution of modular solutions and, consequently, improve the performance of the GP technique. The suggested ADF based GP nonlinear regression methods are compared with UM, MLR, and previous GP method for 3 days prediction of wind speed using MOS(Model Output Statistics) for partial South Korean regions. The UM and KLAPS data of 2007-2009, 2011-2013 years are used for experimentation.

Alternative Tests for the Nested Error Component Regression Model

  • Song, Seuck-Heun;Jung, Byoung-Cheol
    • Journal of the Korean Statistical Society
    • /
    • v.29 no.1
    • /
    • pp.63-80
    • /
    • 2000
  • We consider the panel data regression model with nested error componets. In this paper, the several Lagrange Multipler tests for the nested error component model are derived. These tests extend the earlier work of Honda(1985), Moulton and Randolph(1989), Baltagi, et al.(1992) and King and Wu(1997) to the nested error component case. Monte Carlo experiments are conducted to study the performance of these LM tests.

  • PDF

Design and Weighting Effects in Small Firm Server in Korea

  • Lee, Keejae;Lepkowski, James M.
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.3
    • /
    • pp.775-786
    • /
    • 2002
  • In this paper, we conducted an empirical study to investigate the design and weighting effects on descriptive and analytic statistics. The design and weighting effects were calculated for estimates produced from the 1998 small firm survey data. We considered the design and weighting effects on coefficients estimates of regression model using the design-based approach and the GEE approach.