• 제목/요약/키워드: Multicollinearity

검색결과 174건 처리시간 0.022초

3지와 4지 회전교차로의 사고분석 (Accident Analysis of 3-legged and 4-legged Roundabouts)

  • 박민규;박병호
    • 한국안전학회지
    • /
    • 제27권3호
    • /
    • pp.161-166
    • /
    • 2012
  • This study deals with the accident of roundabout. The objective is to analyze the traffic accidents occurred in 3-legged and 4-legged roundabouts through the developed models. In developing the multiple linear regression models, this study uses the number of traffic accidents as a dependent variable and such the variables as geometric structures, traffic characters and others as the independent variables. The correlation and multicollinearity of variables were analyzed using SPSS17.0. The main results are as follows. First, R-square value of developed models were analyzed to be 0.851(3-leg) and 0.689(4-leg), respectively. Second, the independent variables in the 3-legged roundabout accident model were analyzed to be the traffic volume and number of crosswalk, and the variables in the 4-legged roundabouts were evaluated to be the traffic volume and signal. Finally, the paired t-test shows that the predicted values and observed values are not statistically different.

Analysis of Success Factors for Mobile Commerce using Text Mining and PLS Regression

  • Kim, Yong-Hwan;Kim, Ja-Hee;Park, Ji hoon;Lee, Seung-Jun
    • 한국컴퓨터정보학회논문지
    • /
    • 제21권11호
    • /
    • pp.127-134
    • /
    • 2016
  • In this paper, we propose factors that influence on the mobile commerce satisfaction conducted by data mining and a PLS regression analysis. We extracted the most frequent words from mobile application reviews in which there are a large number of user's requests. We employed the content analysis to condense the large number of texts. We took a survey with the categories by which data are condensed and specified as factors that influence on the mobile commerce satisfaction. To avoid multicollinearity, we employed a PLS regression analysis instead of using a multiple regression analysis. Discovered factors that are potential consequences of customer satisfaction from direct requests by customers, the result may be an appropriate indicator for the mobile commerce market to improve its services.

Estimation of error variance in nonparametric regression under a finite sample using ridge regression

  • Park, Chun-Gun
    • Journal of the Korean Data and Information Science Society
    • /
    • 제22권6호
    • /
    • pp.1223-1232
    • /
    • 2011
  • Tong and Wang's estimator (2005) is a new approach to estimate the error variance using least squares method such that a simple linear regression is asymptotically derived from Rice's lag- estimator (1984). Their estimator highly depends on the setting of a regressor and weights in small sample sizes. In this article, we propose a new approach via a local quadratic approximation to set regressors in a small sample case. We estimate the error variance as the intercept using a ridge regression because the regressors have the problem of multicollinearity. From the small simulation study, the performance of our approach with some existing methods is better in small sample cases and comparable in large cases. More research is required on unequally spaced points.

국내 원형교차로 사고모형 (Accident Models of Circular Intersections in Korea)

  • 이승주;박민규;박병호
    • 한국안전학회지
    • /
    • 제29권1호
    • /
    • pp.54-58
    • /
    • 2014
  • This study deals with the accidents of circular intersections in Korea. The goal is to develop the accident models for 94 circular intersections. In pursuing the above, this study gives particular attentions to collecting the data of geometric structure and accidents, and comparatively analyzing such the models as Poisson and NB regression and multiple regression model using SPSS 17.0 and LIMDEP 3.0. The main results are as follows. First, the negative binomial model among various models was analyzed to be the most appropriate. Second, 3 independent variables was adopted in the model, and these variables was analyzed to have a positive relation to the accident rate. Finally, the reduced width of circulatory roadway, removal of the parking lot within circulatory roadway and appropriate levels of approach lane were required to improve the safety of circular intersection.

Least absolute deviation estimator based consistent model selection in regression

  • Shende, K.S.;Kashid, D.N.
    • Communications for Statistical Applications and Methods
    • /
    • 제26권3호
    • /
    • pp.273-293
    • /
    • 2019
  • We consider the problem of model selection in multiple linear regression with outliers and non-normal error distributions. In this article, the robust model selection criterion is proposed based on the robust estimation method with the least absolute deviation (LAD). The proposed criterion is shown to be consistent. We suggest proposed criterion based algorithms that are suitable for a large number of predictors in the model. These algorithms select only relevant predictor variables with probability one for large sample sizes. An exhaustive simulation study shows that the criterion performs well. However, the proposed criterion is applied to a real data set to examine its applicability. The simulation results show the proficiency of algorithms in the presence of outliers, non-normal distribution, and multicollinearity.

Machine learning-based regression analysis for estimating Cerchar abrasivity index

  • Kwak, No-Sang;Ko, Tae Young
    • Geomechanics and Engineering
    • /
    • 제29권3호
    • /
    • pp.219-228
    • /
    • 2022
  • The most widely used parameter to represent rock abrasiveness is the Cerchar abrasivity index (CAI). The CAI value can be applied to predict wear in TBM cutters. It has been extensively demonstrated that the CAI is affected significantly by cementation degree, strength, and amount of abrasive minerals, i.e., the quartz content or equivalent quartz content in rocks. The relationship between the properties of rocks and the CAI is investigated in this study. A database comprising 223 observations that includes rock types, uniaxial compressive strengths, Brazilian tensile strengths, equivalent quartz contents, quartz contents, brittleness indices, and CAIs is constructed. A linear model is developed by selecting independent variables while considering multicollinearity after performing multiple regression analyses. Machine learning-based regression methods including support vector regression, regression tree regression, k-nearest neighbors regression, random forest regression, and artificial neural network regression are used in addition to multiple linear regression. The results of the random forest regression model show that it yields the best prediction performance.

Prediction of extreme PM2.5 concentrations via extreme quantile regression

  • Lee, SangHyuk;Park, Seoncheol;Lim, Yaeji
    • Communications for Statistical Applications and Methods
    • /
    • 제29권3호
    • /
    • pp.319-331
    • /
    • 2022
  • In this paper, we develop a new statistical model to forecast the PM2.5 level in Seoul, South Korea. The proposed model is based on the extreme quantile regression model with lasso penalty. Various meteorological variables and air pollution variables are considered as predictors in the regression model, and the lasso quantile regression performs variable selection and solves the multicollinearity problem. The final prediction model is obtained by combining various extreme lasso quantile regression estimators and we construct a binary classifier based on the model. Prediction performance is evaluated through the statistical measures of the performance of a binary classification test. We observe that the proposed method works better compared to the other classification methods, and predicts 'very bad' cases of the PM2.5 level well.

Comparison of tree-based ensemble models for regression

  • Park, Sangho;Kim, Chanmin
    • Communications for Statistical Applications and Methods
    • /
    • 제29권5호
    • /
    • pp.561-589
    • /
    • 2022
  • When multiple classifications and regression trees are combined, tree-based ensemble models, such as random forest (RF) and Bayesian additive regression trees (BART), are produced. We compare the model structures and performances of various ensemble models for regression settings in this study. RF learns bootstrapped samples and selects a splitting variable from predictors gathered at each node. The BART model is specified as the sum of trees and is calculated using the Bayesian backfitting algorithm. Throughout the extensive simulation studies, the strengths and drawbacks of the two methods in the presence of missing data, high-dimensional data, or highly correlated data are investigated. In the presence of missing data, BART performs well in general, whereas RF provides adequate coverage. The BART outperforms in high dimensional, highly correlated data. However, in all of the scenarios considered, the RF has a shorter computation time. The performance of the two methods is also compared using two real data sets that represent the aforementioned situations, and the same conclusion is reached.

THE DEVELOPMENT OF AN OBESITY INDEX MODEL AS A COMPLEMENT TO BMI FOR ADULT: USING THE BLOOD DATA OF KNHANES

  • Ko, Kwanghee;Oh, Chunyoung
    • 호남수학학술지
    • /
    • 제43권4호
    • /
    • pp.717-739
    • /
    • 2021
  • We used blood data to predict obesity by complementing the BMI risk, because some blood factors are significantly associated with obesity. For the sampling method, a two-step stratified colony sampling method was used based on sixteen blood factors collected by the Korea National Health and Nutrition Examination Survey(KNHANES). We identify the number of effective blood data of obesity in the final model as 6 ~ 8 factors that differ somewhat depending on age and gender. Also, the coefficient of determination that represents the predictive power of obesity in the regression model is the highest for both men and women of aged 19 and in their 20s and 30s, and the predictive power decreases with increasing age.

Bayesian inference of the cumulative logistic principal component regression models

  • Kyung, Minjung
    • Communications for Statistical Applications and Methods
    • /
    • 제29권2호
    • /
    • pp.203-223
    • /
    • 2022
  • We propose a Bayesian approach to cumulative logistic regression model for the ordinal response based on the orthogonal principal components via singular value decomposition considering the multicollinearity among predictors. The advantage of the suggested method is considering dimension reduction and parameter estimation simultaneously. To evaluate the performance of the proposed model we conduct a simulation study with considering a high-dimensional and highly correlated explanatory matrix. Also, we fit the suggested method to a real data concerning sprout- and scab-damaged kernels of wheat and compare it to EM based proportional-odds logistic regression model. Compared to EM based methods, we argue that the proposed model works better for the highly correlated high-dimensional data with providing parameter estimates and provides good predictions.