• Title/Summary/Keyword: statistical coefficient of determination

Search Result 160, Processing Time 0.027 seconds

Information Theoretic Standardized Logistic Regression Coefficients with Various Coefficients of Determination

  • Hong Chong-Sun;Ryu Hyeon-Sang
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.1
    • /
    • pp.49-60
    • /
    • 2006
  • There are six approaches to constructing standardized coefficient for logistic regression. The standardized coefficient based on Kruskal's information theory is known to be the best from a conceptual standpoint. In order to calculate this standardized coefficient, the coefficient of determination based on entropy loss is used among many kinds of coefficients of determination for logistic regression. In this paper, this standardized coefficient is obtained by using four kinds of coefficients of determination which have the most intuitively reasonable interpretation as a proportional reduction in error measure for logistic regression. These four kinds of the sixth standardized coefficient are compared with other kinds of standardized coefficients.

A Study on the Coefficient of Determination in Linear Regression Analysis

  • S. H. Park;Sung-im Lee
    • Communications for Statistical Applications and Methods
    • /
    • v.2 no.1
    • /
    • pp.32-47
    • /
    • 1995
  • The coefficient of determination R/sup 2/, as the proprtation of by explained by a set of independent variavles x/sub 1/, x/sub 2, .cdots., x/sub k/ through a linear regression model, is a very useful tool in linear regression analysis. Suppose R/sup 2//sub yx/ is the coefficient of determination when y is regressed only on x/sub i/ alone. If the independent variables are correlaated, the sum, R/sup 2//sub {yx/sub 1/}/ +R/sup 2//sub {yx/sub 2/}/+.cdots.R/sup 2//sub {yx/sub k/}/, is not equal to R/sup 2/sub {yx/sub 1/x/sub 2/.cots.x/sub k/}/, where R/sup 2//sub {yx/sub 1/x/sub 2/.cdots.x/sub k/}/ is the coefficient of determination when y is regressed simultaneously on x/sub 1/, x/sub 2/,.cdots., x/sub k/. In this paper it is discussed that under what conditions the sum is greater than, equal to, or less than R/sup 2//sub {yx/sub 1/x/sub 2/.cdots.x/sub k/}/, and then the proofs for these conditions are given. Also illustrated examples are provided. In addition, we will discuss about inequality between R/sup 2//sub {yx/sub 1/x/sub 2/.cdots.x/sub k/}/ and the sum, R/sup 2//sub {yx/sub 1/}/+R/sup 2//sub {yx/sub 2/}/+.cdots.+R/sup 2//sub {yx/sub k/}/.

  • PDF

Graphical Descriptions for Hierarchical Log Linear Models

  • Hyun Jip Choi;Chong Sun Hong
    • Communications for Statistical Applications and Methods
    • /
    • v.2 no.2
    • /
    • pp.310-319
    • /
    • 1995
  • We represent graphically the relationship of hierachical log linear models by regarding the values of the likelihood ratio statistics as the squared norm of the corresponding vectors. Right angled triangles, tetrahedrons, and modified polyhedrons are used for graphical description. We find that the angle between the two vectors depends on the coefficient of determination and the partial coefficent of determination. Thess graphical descriptions could be applied to the model selection method.

  • PDF

A Technique to Improve the Fit of Linear Regression Models for Successive Sets of Data

  • Park, Sung H.
    • Journal of the Korean Statistical Society
    • /
    • v.5 no.1
    • /
    • pp.19-28
    • /
    • 1976
  • In empirical study for fitting a multiple linear regression model for successive cross-sections data observed on the same set of independent variables over several time periods, one often faces the problem of poor $R^2$, the multiple coefficient of determination, which provides a standard measure of how good a specified regression line fits the sample data.

  • PDF

Variable Selection Criteria in Regression

  • Kim, Choong-Rak
    • Journal of the Korean Statistical Society
    • /
    • v.23 no.2
    • /
    • pp.293-301
    • /
    • 1994
  • In this paper we propose a variable selection criterion minimizing influence curve in regression, and compare it with other criteria such as $C_p$(Mallows 1973) and adjusted coefficient of determination. Examples and extension to the generalized linear models are given.

  • PDF

Case influence diagnostics for the significance of the linear regression model

  • Bae, Whasoo;Noh, Soyoung;Kim, Choongrak
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.2
    • /
    • pp.155-162
    • /
    • 2017
  • In this paper we propose influence measures for two basic goodness-of-fit statistics, the coefficient of determination $R^2$ and test statistic F in the linear regression model using the deletion method. Some useful lemmas are provided. We also express the influence measures in terms of basic building blocks such as residual, leverage, and deviation that showed them as increasing function of residuals and a decreasing function of deviation. Further, the proposed measure reduces computational burden from O(n) to O(1). As illustrative examples, we applied the proposed measures to the stackloss data sets. We verified that deletion of one or few influential observations may result in big change in $R^2$ and F-statistic.

Sample Size Determination Using the Stratification Algorithms with the Occurrence of Stratum Jumpers

  • Hong, Taekyong;Ahn, Jihun;Namkung, Pyong
    • Communications for Statistical Applications and Methods
    • /
    • v.11 no.2
    • /
    • pp.297-311
    • /
    • 2004
  • In the sample survey for a highly skewed population, stratum jumpers often occur. Stratum jumpers are units having large discrepancies between a stratification variable and a study variable. We propose two models for stratum jumpers: a multiplicative model and a random replacement model. We also consider the modification of the L-H stratification algorithm such that we apply the previous models to L-H algorithm in determination of the sample sizes and the stratum boundaries. We evaluate the performances of the new stratification algorithms using real data. The result shows that L-H algorithm for the random replacement model outperforms other algorithms since the estimator has the least coefficient of variation.

Collapsibility and Suppression for Cumulative Logistic Model

  • Hong, Chong-Sun;Kim, Kil-Tae
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.2
    • /
    • pp.313-322
    • /
    • 2005
  • In this paper, we discuss suppression for logistic regression model. Suppression for linear regression model was defined as the relationship among sums of squared for regression as well as correlation coefficients of. variables. Since it is not common to obtain simple correlation coefficient for binary response variable of logistic model, we consider cumulative logistic models with multinomial and ordinal response variables rather than usual logistic model. As number of category of a response variable for the cumulative logistic model gets collapsed into binary, it is found that suppressions for these logistic models are changed. These suppression results for cumulative logistic models are discussed and compared with those of linear model.

Determination of a Homogeneous Segment for Short-term Traffic Count Efficiency Using a Statistical Approach (통계적인 기법을 활용한 동질성구간에 따른 교통량 수시조사 효율화 연구)

  • Jung, YooSeok;Oh, JuSam
    • International Journal of Highway Engineering
    • /
    • v.17 no.4
    • /
    • pp.135-141
    • /
    • 2015
  • PURPOSES: This study has been conducted to determine a homogeneous segment and integration to improve the efficiency of short-term traffic count. We have also attempted to reduce the traffic monitoring budget. METHODS: Based on the statistical approach, a homogeneous segment in the same road section is determined. Statistical analysis using t-test, mean difference, and correlation coefficient are carried out for 10-year-long (2004-2013) short-term count traffic data and the MAPE of fresh data (2014) are evaluated. The correlation coefficient represents a trend in traffic count, while the mean difference and t-score represent an average traffic count. RESULTS : The statistical analysis suggests that the number of target segments varies with the criteria. The correlation coefficient of more than 30% of the adjacent segment is higher than 0.8. A mean difference of 36.2% and t-score of 19.5% for adjacent segments are below 20% and 2.8, respectively. According to the effectiveness analysis, the integration criteria of the mean difference have a higher effect as compared to the t-score criteria. Thus, the mean difference represents a traffic volume similarity. CONCLUSIONS : The integration of 47 road segments from 882 adjacent road segments indicate 8.87% of MAPE, which is within an acceptable range. It can reduce the traffic monitoring budget and increase the count to improve an accuracy of traffic volume estimation.