• Title/Summary/Keyword: linear regression models

Search Result 943, Processing Time 0.03 seconds

Residuals Plots for Repeated Measures Data

  • PARK TAESUNG
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2000.11a
    • /
    • pp.187-191
    • /
    • 2000
  • In the analysis of repeated measurements, multivariate regression models that account for the correlations among the observations from the same subject are widely used. Like the usual univariate regression models, these multivariate regression models also need some model diagnostic procedures. In this paper, we propose a simple graphical method to detect outliers and to investigate the goodness of model fit in repeated measures data. The graphical method is based on the quantile-quantile(Q-Q) plots of the $X^2$ distribution and the standard normal distribution. We also propose diagnostic measures to detect influential observations. The proposed method is illustrated using two examples.

  • PDF

Drawbead Model for 3-Dimensional Finite Element Analysis of Sheet Metal Forming Processess (3차원 박판형성 공정 유한요소해석용 드로우비드 모델)

  • 금영탁;김준환;차지혜
    • Transactions of Materials Processing
    • /
    • v.11 no.5
    • /
    • pp.394-404
    • /
    • 2002
  • The drawbead model for a three-dimensional a finite element analysis of sheet metal forming processes is developed. The mathematical models of the basic drawbeads like circular drawbead, stepped drawbead, and squared drawbaed are first derived using the bending theory, belt-pulley equation, and Coulomb friction law. Next, the experiments for finding the drawing characteristics of the drawbead are performed. Based on mathematical models and drawing test results, expert models of basic drawbeads are then developed employing a linear multiple regression method. For the expert models of combined drawbeads such as the double circular drawbead, double stepped drawbead, circular-and-stepped drawbead, etc., those of the basic drawbeads are summed. Finally, in order to verify the expert models developed, the drawing characteristics calculated by the expert models of the double circular drawbead and circular-and-stepped drawbead are compared with those obtained from the experiments. The predictions by expert models agree well with the measurements by experiments.

Development of Regression Models Resolving High-Dimensional Data and Multicollinearity Problem for Heavy Rain Damage Data (호우피해자료에서의 고차원 자료 및 다중공선성 문제를 해소한 회귀모형 개발)

  • Kim, Jeonghwan;Park, Jihyun;Choi, Changhyun;Kim, Hung Soo
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.38 no.6
    • /
    • pp.801-808
    • /
    • 2018
  • The learning of the linear regression model is stable on the assumption that the sample size is sufficiently larger than the number of explanatory variables and there is no serious multicollinearity between explanatory variables. In this study, we investigated the difficulty of model learning when the assumption was violated by analyzing a real heavy rain damage data and we proposed to use a principal component regression model or a ridge regression model after integrating data to overcome the difficulty. We evaluated the predictive performance of the proposed models by using the test data independent from the training data, and confirmed that the proposed methods showed better predictive performances than the linear regression model.

Evaluation of applicability of pan coefficient estimation method by multiple linear regression analysis (다변량 선형회귀분석을 이용한 증발접시계수 산정방법 적용성 검토)

  • Rim, Chang-Soo
    • Journal of Korea Water Resources Association
    • /
    • v.55 no.3
    • /
    • pp.229-243
    • /
    • 2022
  • The effects of monthly meteorological data measured at 11 stations in South Korea on pan coefficient were analyzed to develop the four types of multiple linear regression models for estimating pan coefficients. To evaluate the applicability of developed models, the models were compared with six previous models. Pan coefficients were most affected by air temperature for January, February, March, July, November and December, and by solar radiation for other months. On the whole, for 12 months of the year, the effects of wind speed and relative humidity on pan coefficient were less significant, compared with those of air temperature and solar radiation. For all meteorological stations and months, the model developed by applying 5 independent variables (wind speed, relative humidity, air temperature, ratio of sunshine duration and daylight duration, and solar radiation) for each station was the most effective for evaporation estimation. The model validation results indicate that the multiple linear regression models can be applied to some particular stations and months.

Prediction of Future Sea Surface Temperature around the Korean Peninsular based on Statistical Downscaling (통계적 축소법을 이용한 한반도 인근해역의 미래 표층수온 추정)

  • Ham, Hee-Jung;Kim, Sang-Su;Yoon, Woo-Seok
    • Journal of Industrial Technology
    • /
    • v.31 no.B
    • /
    • pp.107-112
    • /
    • 2011
  • Recently, climate change around the world due to global warming has became an important issue and damages by climate change have a bad effect on human life. Changes of Sea Surface Temperature(SST) is associated with natural disaster such as Typhoon and El Nino. So we predicted daily future SST using Statistical Downscaling Method and CGCM 3.1 A1B scenario. 9 points of around Korea peninsular were selected to predict future SST and built up a regression model using Multiple Linear Regression. CGCM 3.1 was simulated with regression model, and that comparing Probability Density Function, Box-Plot, and statistical data to evaluate suitability of regression models, it was validated that regression models were built up properly.

  • PDF

Introduction to variational Bayes for high-dimensional linear and logistic regression models (고차원 선형 및 로지스틱 회귀모형에 대한 변분 베이즈 방법 소개)

  • Jang, Insong;Lee, Kyoungjae
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.3
    • /
    • pp.445-455
    • /
    • 2022
  • In this paper, we introduce existing Bayesian methods for high-dimensional sparse regression models and compare their performance in various simulation scenarios. Especially, we focus on the variational Bayes approach proposed by Ray and Szabó (2021), which enables scalable and accurate Bayesian inference. Based on simulated data sets from sparse high-dimensional linear regression models, we compare the variational Bayes approach with other Bayesian and frequentist methods. To check the practical performance of the variational Bayes in logistic regression models, a real data analysis is conducted using leukemia data set.

Price Determinant Factors of Artworks and Prediction Model Based on Machine Learning (작품 가격 추정을 위한 기계 학습 기법의 응용 및 가격 결정 요인 분석)

  • Jang, Dongryul;Park, Minjae
    • Journal of Korean Society for Quality Management
    • /
    • v.47 no.4
    • /
    • pp.687-700
    • /
    • 2019
  • Purpose: The purpose of this study is to investigate the interaction effects between price determinants of artworks. We expand the methodology in art market by applying machine learning techniques to estimate the price of artworks and compare linear regression and machine learning in terms of prediction accuracy. Methods: Moderated regression analysis was performed to verify the interaction effects of artistic characteristics on price. The moderating effects were studied by confirming the significance level of the interaction terms of the derived regression equation. In order to derive price estimation model, we use multiple linear regression analysis, which is a parametric statistical technique, and k-nearest neighbor (kNN) regression, which is a nonparametric statistical technique in machine learning methods. Results: Mostly, the influences of the price determinants of art are different according to the auction types and the artist 's reputation. However, the auction type did not control the influence of the genre of the work on the price. As a result of the analysis, the kNN regression was superior to the linear regression analysis based on the prediction accuracy. Conclusion: It provides a theoretical basis for the complexity that exists between pricing determinant factors of artworks. In addition, the nonparametric models and machine learning techniques as well as existing parameter models are implemented to estimate the artworks' price.

Comparison of different post-processing techniques in real-time forecast skill improvement

  • Jabbari, Aida;Bae, Deg-Hyo
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2018.05a
    • /
    • pp.150-150
    • /
    • 2018
  • The Numerical Weather Prediction (NWP) models provide information for weather forecasts. The highly nonlinear and complex interactions in the atmosphere are simplified in meteorological models through approximations and parameterization. Therefore, the simplifications may lead to biases and errors in model results. Although the models have improved over time, the biased outputs of these models are still a matter of concern in meteorological and hydrological studies. Thus, bias removal is an essential step prior to using outputs of atmospheric models. The main idea of statistical bias correction methods is to develop a statistical relationship between modeled and observed variables over the same historical period. The Model Output Statistics (MOS) would be desirable to better match the real time forecast data with observation records. Statistical post-processing methods relate model outputs to the observed values at the sites of interest. In this study three methods are used to remove the possible biases of the real-time outputs of the Weather Research and Forecast (WRF) model in Imjin basin (North and South Korea). The post-processing techniques include the Linear Regression (LR), Linear Scaling (LS) and Power Scaling (PS) methods. The MOS techniques used in this study include three main steps: preprocessing of the historical data in training set, development of the equations, and application of the equations for the validation set. The expected results show the accuracy improvement of the real-time forecast data before and after bias correction. The comparison of the different methods will clarify the best method for the purpose of the forecast skill enhancement in a real-time case study.

  • PDF

Quasi-Likelihood Approach for Linear Models with Censored Data

  • Ha, Il-Do;Cho, Geon-Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.9 no.2
    • /
    • pp.219-225
    • /
    • 1998
  • The parameters in linear models with censored normal responses are usually estimated by the iterative maximum likelihood and least square methods. However, the iterative least square method is simple but hardly has theoretical justification, and the iterative maximum likelihood estimating equations are complicatedly derived. In this paper, we justify these methods via Wedderburn (1974)'s quasi-likelihood approach. This provides an explicit justification for the iterative least square method and also directly the iterative maximum likelihood method for estimating the regression coefficients.

  • PDF

Determination of Regression Model for Estimating Root Fresh Weight Using Maximum Leaf Length and Width of Root Vegetables Grown in Reclaimed Land (간척지 재배 근채류의 최대 엽장과 엽폭을 이용한 지하부 생체중 추정용 회귀 모델 결정)

  • Jung, Dae Ho;Yi, Pyoung Ho;Lee, In-Bog
    • Korean Journal of Environmental Agriculture
    • /
    • v.39 no.3
    • /
    • pp.204-213
    • /
    • 2020
  • BACKGROUND: Since the number of crops cultivated in reclaimed land is huge, it is very difficult to quantify the total crop production. Therefore, a non-destructive method for predicting crop production is needed. Salt tolerant root vegetables such as red beets and sugar beet are suitable for cultivation in reclaimed land. If their underground biomass can be predicted, it helps to estimate crop productivity. Objectives of this study are to investigate maximum leaf length and weight of red beet, sugar beet, and turnips grown in reclaimed land, and to determine optimal model with regression analysis for linear and allometric growth models. METHODS AND RESULTS: Maximum leaf length, width, and root fresh weight of red beets, sugar beets, and turnips were measured. Ten linear models and six allometric growth models were selected for estimation of root fresh weight and non-linear regression analysis was conducted. The allometric growth model, which have a variable multiplied by square of maximum leaf length and maximum leaf width, showed highest R2 values of 0.67, 0.70, and 0.49 for red beets, sugar beets, and turnips, respectively. Validation results of the models for red beets and sugar beets showed the R2 values of 0.63 and 0.65, respectively. However, the model for turnips showed the R2 value of 0.48. The allometric growth model was suitable for estimating the root fresh weight of red beets and sugar beets, but the accuracy for turnips was relatively low. CONCLUSION: The regression models established in this study may be useful to estimate the total production of root vegetables cultivated in reclaimed land, and it will be used as a non-destructive method for prediction of crop information.