• Title/Summary/Keyword: linear regression analysis

Search Result 2,839, Processing Time 0.036 seconds

Estimation model of shear strength of soil layer using linear regression analysis (선형회귀분석에 의한 토층의 전단강도 산정모델)

  • Lee, Moon-Se;Kim, Kyeong-Su
    • Proceedings of the Korean Geotechical Society Conference
    • /
    • 2009.09a
    • /
    • pp.1065-1078
    • /
    • 2009
  • The shear strength has been managed as an important factor in soil mechanics. The shear strength estimation model was developed to evaluate the shear strength using only a few soil properties by the linear regression analysis model which is one of the statistical methods. The shear strength is divided into two part; one is the internal friction angle ($\Phi$) and the other is the cohesion (c). Therefore, some valid soil factors among the results of soil tests are selected through the correlation analysis using SPSS and then the model are formulated by the linear regression analysis based on the relationship between factors. Also, the developed model is compared with the result of direct shear test to prove the rationality of model. As the results of analysis about relationship between soil properties and shear strength, the internal friction angle is highly influenced by the void ratio and the dry unit weight and the cohesion is mainly influenced by the void ratio, the dry unit weight and the plastic index. Meanwhile, the shear strength estimated by the developed model is similar with that of the direct shear test. Therefore, the developed model may be used to estimate the shear strength of soils in the same condition of study area.

  • PDF

Price Determinant Factors of Artworks and Prediction Model Based on Machine Learning (작품 가격 추정을 위한 기계 학습 기법의 응용 및 가격 결정 요인 분석)

  • Jang, Dongryul;Park, Minjae
    • Journal of Korean Society for Quality Management
    • /
    • v.47 no.4
    • /
    • pp.687-700
    • /
    • 2019
  • Purpose: The purpose of this study is to investigate the interaction effects between price determinants of artworks. We expand the methodology in art market by applying machine learning techniques to estimate the price of artworks and compare linear regression and machine learning in terms of prediction accuracy. Methods: Moderated regression analysis was performed to verify the interaction effects of artistic characteristics on price. The moderating effects were studied by confirming the significance level of the interaction terms of the derived regression equation. In order to derive price estimation model, we use multiple linear regression analysis, which is a parametric statistical technique, and k-nearest neighbor (kNN) regression, which is a nonparametric statistical technique in machine learning methods. Results: Mostly, the influences of the price determinants of art are different according to the auction types and the artist 's reputation. However, the auction type did not control the influence of the genre of the work on the price. As a result of the analysis, the kNN regression was superior to the linear regression analysis based on the prediction accuracy. Conclusion: It provides a theoretical basis for the complexity that exists between pricing determinant factors of artworks. In addition, the nonparametric models and machine learning techniques as well as existing parameter models are implemented to estimate the artworks' price.

Statistical notes for clinical researchers: simple linear regression 3 - residual analysis

  • Kim, Hae-Young
    • Restorative Dentistry and Endodontics
    • /
    • v.44 no.1
    • /
    • pp.11.1-11.8
    • /
    • 2019
  • In the previous sections, simple linear regression (SLR) 1 and 2, we developed a SLR model and evaluated its predictability. To obtain the best fitted line the intercept and slope were calculated by using the least square method. Predictability of the model was assessed by the proportion of the explained variability among the total variation of the response variable. In this session, we will discuss four basic assumptions of regression models for justification of the estimated regression model and residual analysis to check them.

Prediction of New Confirmed Cases of COVID-19 based on Multiple Linear Regression and Random Forest (다중 선형 회귀와 랜덤 포레스트 기반의 코로나19 신규 확진자 예측)

  • Kim, Jun Su;Choi, Byung-Jae
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.4
    • /
    • pp.249-255
    • /
    • 2022
  • The COVID-19 virus appeared in 2019 and is extremely contagious. Because it is very infectious and has a huge impact on people's mobility. In this paper, multiple linear regression and random forest models are used to predict the number of COVID-19 cases using COVID-19 infection status data (open source data provided by the Ministry of health and welfare) and Google Mobility Data, which can check the liquidity of various categories. The data has been divided into two sets. The first dataset is COVID-19 infection status data and all six variables of Google Mobility Data. The second dataset is COVID-19 infection status data and only two variables of Google Mobility Data: (1) Retail stores and leisure facilities (2) Grocery stores and pharmacies. The models' performance has been compared using the mean absolute error indicator. We also a correlation analysis of the random forest model and the multiple linear regression model.

Orographic Precipitation Analysis with Regional Frequency Analysis and Multiple Linear Regression (지역빈도해석 및 다중회귀분석을 이용한 산악형 강수해석)

  • Yun, Hye-Seon;Um, Myoung-Jin;Cho, Won-Cheol;Heo, Jun-Haeng
    • Journal of Korea Water Resources Association
    • /
    • v.42 no.6
    • /
    • pp.465-480
    • /
    • 2009
  • In this study, single and multiple linear regression model were used to derive the relationship between precipitation and altitude, latitude and longitude in Jejudo. The single linear regression analysis was focused on whether orographic effect was existed in Jejudo by annual average precipitation, and the multiple linear regression analysis on whether orographic effect was applied to each duration and return period of quantile from regional frequency analysis by index flood method. As results of the regression analysis, it shows the relationship between altitude and precipitation strongly form a linear relationship as the length of duration and return period increase. The multiple linear regression precipitation estimates(which used altitude, latitude, and longitude information) were found to be more reasonable than estimates obtained using altitude only or altitude-latitude and altitude-longitude. Especially, as results of spatial distribution analysis by kriging method using GIS, it also provides realistic estimates for precipitation that the precipitation was occurred the southeast region as real climate of Jejudo. However, the accuracy of regression model was decrease which derived a short duration of precipitation or estimated high region precipitation even had long duration. Consequently the other factor caused orographic effect would be needed to estimate precipitation to improve accuracy.

Support Vector Machine for Interval Regression

  • Hong Dug Hun;Hwang Changha
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2004.11a
    • /
    • pp.67-72
    • /
    • 2004
  • Support vector machine (SVM) has been very successful in pattern recognition and function estimation problems for crisp data. This paper proposes a new method to evaluate interval linear and nonlinear regression models combining the possibility and necessity estimation formulation with the principle of SVM. For data sets with crisp inputs and interval outputs, the possibility and necessity models have been recently utilized, which are based on quadratic programming approach giving more diverse spread coefficients than a linear programming one. SVM also uses quadratic programming approach whose another advantage in interval regression analysis is to be able to integrate both the property of central tendency in least squares and the possibilistic property In fuzzy regression. However this is not a computationally expensive way. SVM allows us to perform interval nonlinear regression analysis by constructing an interval linear regression function in a high dimensional feature space. In particular, SVM is a very attractive approach to model nonlinear interval data. The proposed algorithm here is model-free method in the sense that we do not have to assume the underlying model function for interval nonlinear regression model with crisp inputs and interval output. Experimental results are then presented which indicate the performance of this algorithm.

  • PDF

THE USE OF MATHEMATICAL PROGRAMMING FOR LINEAR REGRESSION PROBLEMS

  • Park, Sung-Hyun
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.3 no.1
    • /
    • pp.75-79
    • /
    • 1978
  • The use of three mathematical programming techniques (quadratic programming, integer quadratic programming and linear programming) is discussed to solve some problems in linear regression analysis. When the criterion is the minimization of the sum of squared deviations and the parameters are linearly constrained, the problem may be formulated as quadratic programming problem. For the selection of variables to find "best" regression equation in statistics, the technique of integer quadratic programming is proposed and found to be a very useful tool. When the criterion of fitting a linear regression is the minimization of the sum of absolute deviations from the regression function, the problem may be reduced to a linear programming problem and can be solved reasonably well.ably well.

  • PDF

Correlation and Simple Linear Regression (상관성과 단순선형회귀분석)

  • Pak, Son-Il;Oh, Tae-Ho
    • Journal of Veterinary Clinics
    • /
    • v.27 no.4
    • /
    • pp.427-434
    • /
    • 2010
  • Correlation is a technique used to measure the strength or the degree of closeness of the linear association between two quantitative variables. Common misuses of this technique are highlighted. Linear regression is a technique used to identify a relationship between two continuous variables in mathematical equations, which could be used for comparison or estimation purposes. Specifically, regression analysis can provide answers for questions such as how much does one variable change for a given change in the other, how accurately can the value of one variable be predicted from the knowledge of the other. Regression does not give any indication of how good the association is while correlation provides a measure of how well a least-squares regression line fits the given set of data. The better the correlation, the closer the data points are to the regression line. In this tutorial article, the process of obtaining a linear regression relationship for a given set of bivariate data was described. The least square method to obtain the line which minimizes the total error between the data points and the regression line was employed and illustrated. The coefficient of determination, the ratio of the explained variation of the values of the independent variable to total variation, was described. Finally, the process of calculating confidence and prediction interval was reviewed and demonstrated.

Development of the Index for Estimating the Arc Status in the Short-circuiting Transfer Region of GMA Welding (GMA용접의 단락이행영역에 있어서 아크 상태 평가를 위한 모델 개발)

  • 강문진;이세헌;엄기원
    • Journal of Welding and Joining
    • /
    • v.17 no.4
    • /
    • pp.85-92
    • /
    • 1999
  • In GMAW, the spatter is generated because of the variation of the arc state. If the arc state is quantitatively assessed, the control method to make the spatter be reduced is able to develop. This study was attempted to develop the optimal model that could estimate the arc state quantitatively. To do this, the generated spatters was captured under the limited welding conditions, and the waveforms of the arc voltage and of the welding current were collected. From the collected waveforms, the waveform factors and their standard deviations were produced, and the linear and non-linear regression models constituted using the factors and their standard deviations are proposed to estimate the arc state. the performance test to the proposed models was practiced. Obtained results are as follow. From the results of correlation analysis between the factors and the amount of the generated spatters, the standard deviations of the waveform factors have more the multiple regression coefficients than the waveform factors. Because the correlation coefficient between T and {TEX}$T_{a}${/TEX}, and s[T] and s[{TEX}$T_{a}${/TEX}] was nearly one, it was found that these factors have the same effect to the spatter generation. In the regression models to estimate the arc state, it was fond that the linear and the non linear models were also consisted of similar factors. In addition, the linear regression model was assessed the optimal model for estimating the arc state because the variance of data was narrow and multiple regression coefficient was highest among the models. But in the welding conditions which the amount of the generated spatters were small, it was found that the non linear regression model had better the estimation performance for the spatter generation than the linear.

  • PDF

Pre-processing and Bias Correction for AMSU-A Radiance Data Based on Statistical Methods (통계적 방법에 근거한 AMSU-A 복사자료의 전처리 및 편향보정)

  • Lee, Sihye;Kim, Sangil;Chun, Hyoung-Wook;Kim, Ju-Hye;Kang, Jeon-Ho
    • Atmosphere
    • /
    • v.24 no.4
    • /
    • pp.491-502
    • /
    • 2014
  • As a part of the KIAPS (Korea Institute of Atmospheric Prediction Systems) Package for Observation Processing (KPOP), we have developed the modules for Advanced Microwave Sounding Unit-A (AMSU-A) pre-processing and its bias correction. The KPOP system calculates the airmass bias correction coefficients via the method of multiple linear regression in which the scan-corrected innovation and the thicknesses of 850~300, 200~50, 50~5, and 10~1 hPa are respectively used for dependent and independent variables. Among the four airmass predictors, the multicollinearity has been shown by the Variance Inflation Factor (VIF) that quantifies the severity of multicollinearity in a least square regression. To resolve the multicollinearity, we adopted simple linear regression and Principal Component Regression (PCR) to calculate the airmass bias correction coefficients and compared the results with those from the multiple linear regression. The analysis shows that the order of performances is multiple linear, principal component, and simple linear regressions. For bias correction for the AMSU-A channel 4 which is the most sensitive to the lower troposphere, the multiple linear regression with all four airmass predictors is superior to the simple linear regression with one airmass predictor of 850~300 hPa. The results of PCR with 95% accumulated variances accounted for eigenvalues showed the similar results of the multiple linear regression.