• Title/Summary/Keyword: regression statistics

Search Result 5,255, Processing Time 0.027 seconds

Steal Success Model for 2007 Korean Professional Baseball Games (2007년 한국프로야구에서 도루성공모형)

  • Hong, Chong-Sun;Choi, Jeong-Min
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.3
    • /
    • pp.455-468
    • /
    • 2008
  • Based on the huge baseball game records, the steal plays an important role to affect the result of games. For the research about success or failure of the steal in baseball games, logistic regression models are developed based on 2007 Korean professional baseball games. The analyses of logistic regression models are compared of those of the discriminant models. It is found that the performance of the logistic regression analysis is more efficient than that of the discriminant analysis. Also, we consider an alternative logistic regression model based on categorical data which are transformed from uneasy obtainable continuous data.

Variable selection in partial linear regression using the least angle regression (부분선형모형에서 LARS를 이용한 변수선택)

  • Seo, Han Son;Yoon, Min;Lee, Hakbae
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.6
    • /
    • pp.937-944
    • /
    • 2021
  • The problem of selecting variables is addressed in partial linear regression. Model selection for partial linear models is not easy since it involves nonparametric estimation such as smoothing parameter selection and estimation for linear explanatory variables. In this work, several approaches for variable selection are proposed using a fast forward selection algorithm, least angle regression (LARS). The proposed procedures use t-test, all possible regressions comparisons or stepwise selection process with variables selected by LARS. An example based on real data and a simulation study on the performance of the suggested procedures are presented.

Introduction to variational Bayes for high-dimensional linear and logistic regression models (고차원 선형 및 로지스틱 회귀모형에 대한 변분 베이즈 방법 소개)

  • Jang, Insong;Lee, Kyoungjae
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.3
    • /
    • pp.445-455
    • /
    • 2022
  • In this paper, we introduce existing Bayesian methods for high-dimensional sparse regression models and compare their performance in various simulation scenarios. Especially, we focus on the variational Bayes approach proposed by Ray and Szabó (2021), which enables scalable and accurate Bayesian inference. Based on simulated data sets from sparse high-dimensional linear regression models, we compare the variational Bayes approach with other Bayesian and frequentist methods. To check the practical performance of the variational Bayes in logistic regression models, a real data analysis is conducted using leukemia data set.

Application of Crossover Analysis-logistic Regression in the Assessment of Gene- environmental Interactions for Colorectal Cancer

  • Wu, Ya-Zhou;Yang, Huan;Zhang, Ling;Zhang, Yan-Qi;Liu, Ling;Yi, Dong;Cao, Jia
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.13 no.5
    • /
    • pp.2031-2037
    • /
    • 2012
  • Background: Analysis of gene-gene and gene-environment interactions for complex multifactorial human disease faces challenges regarding statistical methodology. One major difficulty is partly due to the limitations of parametric-statistical methods for detection of gene effects that are dependent solely or partially on interactions with other genes or environmental exposures. Based on our previous case-control study in Chongqing of China, we have found increased risk of colorectal cancer exists in individuals carrying a novel homozygous TT at locus rs1329149 and known homozygous AA at locus rs671. Methods: In this study, we proposed statistical method-crossover analysis in combination with logistic regression model, to further analyze our data and focus on assessing gene-environmental interactions for colorectal cancer. Results: The results of the crossover analysis showed that there are possible multiplicative interactions between loci rs671 and rs1329149 with alcohol consumption. Multifactorial logistic regression analysis also validated that loci rs671 and rs1329149 both exhibited a multiplicative interaction with alcohol consumption. Moreover, we also found additive interactions between any pair of two factors (among the four risk factors: gene loci rs671, rs1329149, age and alcohol consumption) through the crossover analysis, which was not evident on logistic regression. Conclusions: In conclusion, the method based on crossover analysis-logistic regression is successful in assessing additive and multiplicative gene-environment interactions, and in revealing synergistic effects of gene loci rs671 and rs1329149 with alcohol consumption in the pathogenesis and development of colorectal cancer.

Bayesian ordinal probit semiparametric regression models: KNHANES 2016 data analysis of the relationship between smoking behavior and coffee intake (베이지안 순서형 프로빗 준모수 회귀 모형 : 국민건강영양조사 2016 자료를 통한 흡연양태와 커피섭취 간의 관계 분석)

  • Lee, Dasom;Lee, Eunji;Jo, Seogil;Choi, Taeryeon
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.1
    • /
    • pp.25-46
    • /
    • 2020
  • This paper presents ordinal probit semiparametric regression models using Bayesian Spectral Analysis Regression (BSAR) method. Ordinal probit regression is a way of modeling ordinal responses - usually more than two categories - by connecting the probability of falling into each category explained by a combination of available covariates using a probit (an inverse function of normal cumulative distribution function) link. The Bayesian probit model facilitates posterior sampling by bringing a latent variable following normal distribution, therefore, the responses are categorized by the cut-off points according to values of latent variables. In this paper, we extend the latent variable approach to a semiparametric model for the Bayesian ordinal probit regression with nonparametric functions using a spectral representation of Gaussian processes based BSAR method. The latent variable is decomposed into a parametric component and a nonparametric component with or without a shape constraint for modeling ordinal responses and predicting outcomes more flexibly. We illustrate the proposed methods with simulation studies in comparison with existing methods and real data analysis applied to a Korean National Health and Nutrition Examination Survey (KNHANES) 2016 for investigating nonparametric relationship between smoking behavior and coffee intake.

A Study on Change of Logistics in the region of Seoul, Incheon, Kyunggi (물류예측모형에 관한 연구 -수도권 물동량 예측을 중심으로-)

  • Roh Kyung-Ho
    • Management & Information Systems Review
    • /
    • v.7
    • /
    • pp.427-450
    • /
    • 2001
  • This research suggests the estimation methodology of Logistics. This paper elucidates the main problems associated with estimation in the regression model. We review the methods for estimating the parameters in the model and introduce a modified procedure in which all models are fitted and combined to construct a combination of estimates. The resulting estimators are found to be as efficient as the maximum likelihood (ML) estimators in various cases. Our method requires more computations but has an advantage for large data sets. Also, it enables to detect particular features in the data structure. Examples of real data are used to illustrate the properties of the estimators. The backgrounds of estimation of logistic regression model is the increasing logistic environment importance today. In the first phase, we conduct an exploratory study to discuss 9 independent variables. In the second phase, we try to find the fittest logistic regression model. In the third phase, we calculate the logistic estimation using logistic regression model. The parameters of logistic regression model were estimated using ordinary least squares regression. The standard assumptions of OLS estimation were tested. The calculated value of the F-statistics for the logistic regression model is significant at the 5% level. The logistic regression model also explains a significant amount of variance in the dependent variable. The parameter estimates of the logistic regression model with t-statistics in parentheses are presented in Table. The object of this paper is to find the best logistic regression model to estimate the comparative accurate logistics.

  • PDF

Applications on p-values of Chi-Square Distribution

  • Hong, Chong Sun;Hong, Sung Sick
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.3
    • /
    • pp.877-887
    • /
    • 2002
  • In this paper, behaviors and properties of p-values for goodness-of-fit test are investigated. With some findings on the p-values, we consider some applications to determine sample size of a survey research using the regression equation based on a pilot study data. Regression equations are obtained by the well-known least squared method, and we find that regression lines could be formulated with only two data points, alternatively. For further studies, this works might be extended to t distributions for testing hypotheses about population mean in order to determine sample size of a prospective study. Also similar arguments could be explored for F test statistics.

Influence Diagnostic Measure for Spline Estimator

  • Lee, In-Suk;Cho, Gyo-Young;Jung, Won-Tae
    • Journal of Korean Society for Quality Management
    • /
    • v.23 no.4
    • /
    • pp.58-63
    • /
    • 1995
  • To access the quality of a fit to a set of data it is always useful to conduct a posteriori analysis involving the examination of residuals, detection of influential data values, etc. Smoothing splines are a type of nonparametric regression estimators for the diagnostic problem. And leverage value, Cook's distance, and DFFITS are used for detecting influential data. Since high leverage points will always have small residuals, the new diagnostic measures including of properties of leverage and residuals are needed. In this paper, we propose FVARATIO version as diagnostic measure in nonparametric regression. Also we consider the rough bound as analogy with linear regression case.

  • PDF

Permutation Predictor Tests in Linear Regression

  • Ryu, Hye Min;Woo, Min Ah;Lee, Kyungjin;Yoo, Jae Keun
    • Communications for Statistical Applications and Methods
    • /
    • v.20 no.2
    • /
    • pp.147-155
    • /
    • 2013
  • To determine whether each coefficient is equal to zero or not, usual $t$-tests are a popular choice (among others) in linear regression to practitioners because all statistical packages provide the statistics and their corresponding $p$-values. Under smaller samples (especially with non-normal errors) the tests often fail to correctly detect statistical significance. We propose a permutation approach by adopting a sufficient dimension reduction methodology to overcome this deficit. Numerical studies confirm that the proposed method has potential advantages over the t-tests. In addition, data analysis is also presented.