• Title/Summary/Keyword: Multiple Linear Regression

Search Result 1,728, Processing Time 0.036 seconds

A Multivariate Analysis of Korean Professional Players Salary (한국 프로스포츠 선수들의 연봉에 대한 다변량적 분석)

  • Song, Jong-Woo
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.3
    • /
    • pp.441-453
    • /
    • 2008
  • We analyzed Korean professional basketball and baseball players salary under the assumption that it depends on the personal records and contribution to the team in the previous year. We extensively used data visualization tools to check the relationship among the variables, to find outliers and to do model diagnostics. We used multiple linear regression and regression tree to fit the model and used cross-validation to find an optimal model. We check the relationship between variables carefully and chose a set of variables for the stepwise regression instead of using all variables. We found that points per game, number of assists, number of free throw successes, career are important variables for the basketball players. For the baseball pitchers, career, number of strike-outs per 9 innings, ERA, number of homeruns are important variables. For the baseball hitters, career, number of hits, FA are important variables.

Prediction Models of Residual Chlorine in Sediment Basin to Control Pre-chlorination in Water Treatment Plant (정수장 전염소 공정 제어를 위한 침전지 잔류 염소 농도 예측모델 개발)

  • Lee, Kyung-Hyuk;Kim, Ju-Hwan;Lim, Jae-Lim;Chae, Seon Ha
    • Journal of Korean Society of Water and Wastewater
    • /
    • v.21 no.5
    • /
    • pp.601-607
    • /
    • 2007
  • In order to maintain constant residual chlorine in sedimentation basin, It is necessary to develop real time prediction model of residual chlorine considering water treatment plant data such as water qualities, weather, and plant operation conditions. Based on the operation data acquired from K water treatment plant, prediction models of residual chlorine in sediment basin were accomplished. The input parameters applied in the models were water temperature, turbidity, pH, conductivity, flow rate, alkalinity and pre-chlorination dosage. The multiple regression models were established with linear and non-linear model with 5,448 data set. The corelation coefficient (R) for the linear and non-linear model were 0.39 and 0.374, respectively. It shows low correlation coefficient, that is, these multiple regression models can not represent the residual chlorine with the input parameters which varies independently with time changes related to weather condition. Artificial neural network models are applied with three different conditions. Input parameters are consisted of water quality data observed in water treatment process based on the structure of auto-regressive model type, considering a time lag. The artificial neural network models have better ability to predict residual chlorine at sediment basin than conventional linear and nonlinear multi-regression models. The determination coefficients of each model in verification process were shown as 0.742, 0.754, and 0.869, respectively. Consequently, comparing the results of each model, neural network can simulate the residual chlorine in sedimentation basin better than mathematical regression models in terms of prediction performance. This results are expected to contribute into automation control of water treatment processes.

Multivariate Analysis for Clinicians (임상의를 위한 다변량 분석의 실제)

  • Oh, Joo Han;Chung, Seok Won
    • Clinics in Shoulder and Elbow
    • /
    • v.16 no.1
    • /
    • pp.63-72
    • /
    • 2013
  • In medical research, multivariate analysis, especially multiple regression analysis, is used to analyze the influence of multiple variables on the result. Multiple regression analysis should include variables in the model and the problem of multi-collinearity as there are many variables as well as the basic assumption of regression analysis. The multiple regression model is expressed as the coefficient of determination, $R^2$ and the influence of independent variables on result as a regression coefficient, ${\beta}$. Multiple regression analysis can be divided into multiple linear regression analysis, multiple logistic regression analysis, and Cox regression analysis according to the type of dependent variables (continuous variable, categorical variable (binary logit), and state variable, respectively), and the influence of variables on the result is evaluated by regression coefficient${\beta}$, odds ratio, and hazard ratio, respectively. The knowledge of multivariate analysis enables clinicians to analyze the result accurately and to design the further research efficiently.

MapReduce-based Localized Linear Regression for Electricity Price Forecasting (전기 가격 예측을 위한 맵리듀스 기반의 로컬 단위 선형회귀 모델)

  • Han, Jinju;Lee, Ingyu;On, Byung-Won
    • The Transactions of the Korean Institute of Electrical Engineers P
    • /
    • v.67 no.4
    • /
    • pp.183-190
    • /
    • 2018
  • Predicting accurate electricity prices is an important task in the electricity trading market. To address the electricity price forecasting problem, various approaches have been proposed so far and it is known that linear regression-based approaches are the best. However, the use of such linear regression-based methods is limited due to low accuracy and performance. In traditional linear regression methods, it is not practical to find a nonlinear regression model that explains the training data well. If the training data is complex (i.e., small-sized individual data and large-sized features), it is difficult to find the polynomial function with n terms as the model that fits to the training data. On the other hand, as a linear regression model approximating a nonlinear regression model is used, the accuracy of the model drops considerably because it does not accurately reflect the characteristics of the training data. To cope with this problem, we propose a new electricity price forecasting method that divides the entire dataset to multiple split datasets and find the best linear regression models, each of which is the optimal model in each dataset. Meanwhile, to improve the performance of the proposed method, we modify the proposed localized linear regression method in the map and reduce way that is a framework for parallel processing data stored in a Hadoop distributed file system. Our experimental results show that the proposed model outperforms the existing linear regression model. Specifically, the accuracy of the proposed method is improved by 45% and the performance is faster 5 times than the existing linear regression-based model.

On study for change point regression problems using a difference-based regression model

  • Park, Jong Suk;Park, Chun Gun;Lee, Kyeong Eun
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.6
    • /
    • pp.539-556
    • /
    • 2019
  • This paper derive a method to solve change point regression problems via a process for obtaining consequential results using properties of a difference-based intercept estimator first introduced by Park and Kim (Communications in Statistics - Theory Methods, 2019) for outlier detection in multiple linear regression models. We describe the statistical properties of the difference-based regression model in a piecewise simple linear regression model and then propose an efficient algorithm for change point detection. We illustrate the merits of our proposed method in the light of comparison with several existing methods under simulation studies and real data analysis. This methodology is quite valuable, "no matter what regression lines" and "no matter what the number of change points".

Estimation of Tool life by Simple & Multiple Linear Regression Analysis of $Si_3N_4$ Ceramic Cutting Tools (회귀분석에 의한 $Si_3N_4$세라믹 절삭공구의 공구수명 추정)

  • 안영진;권원태;김영욱
    • Transactions of the Korean Society of Machine Tool Engineers
    • /
    • v.13 no.4
    • /
    • pp.23-29
    • /
    • 2004
  • In this study, four kinds of $Si_3N_4$-based ceramic cutting tools with different sintering time were fabricated to investigate the relation among mechanical properties, grain size and tool life. They were used to turn gray cast iron at a cutting speed of 330m/min and depth of cut of 0.5mm and 1mm in dry, continuos cutting conditions. Multiple linear regression model was used to determine the relations among the mechanical property, grain size and the density. It was found that the combination of hardness and fracture toughness showed a good relation with tool life. It was also shown that hardness was the most important single element for the tool life.

Prediction Techniques for Difficulty Level of Hanja Using Multiple Linear Regression (다중 회귀 분석을 이용한 한자 난이도 예측 기법 연구)

  • Choi, Jeongwhan;Noh, Jiwoo;Kim, Suntae
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.6
    • /
    • pp.219-225
    • /
    • 2019
  • There is a problem with the existing method of selecting the difficulty levels of Hanja characters. Some Hanja characters selected by the existing methods are different from Sino-Korean words used in real life and it is impossible to know how many times the Hanja characters are used. To solve this problem, we measure the difficulty of Hanja characters using the multiple regression analysis with the frequency as the features. Based on the elementary textbooks, FWS and FHU are counted. A questionnaire is written using the two frequencies and stroke together to answer the appropriate timing of learning the Hanja characters and use them as target variables for regression. Use stepwise regression to select the appropriate features and perform multiple linear regression. The R2 score of the model was 0.1105 and the RMSE was 0.1105.

Autocovariance based estimation in the linear regression model (선형회귀 모형에서 자기공분산 기반 추정)

  • Park, Cheol-Yong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.5
    • /
    • pp.839-847
    • /
    • 2011
  • In this study, we derive an estimator based on autocovariance for the regression coefficients vector in the multiple linear regression model. This method is suggested by Park (2009), and although this method does not seem to be intuitively attractive, this estimator is unbiased for the regression coefficients vector. When the vectors of exploratory variables satisfy some regularity conditions, under mild conditions which are satisfied when errors are from autoregressive and moving average models, this estimator has asymptotically the same distribution as the least squares estimator and also converges in probability to the regression coefficients vector. Finally we provide a simulation study that the forementioned theoretical results hold for small sample cases.

APPAREL PRODUCTS RETRIEVAL SYSTEM BASED ON PSYCOLOGICAL FEATURE SPACE

  • Ohtake, Atsushi;Takatera, Masayuki;Furukawa, Takao;Shimizu, Yoshio
    • Proceedings of the Korean Society for Emotion and Sensibility Conference
    • /
    • 2000.04a
    • /
    • pp.240-243
    • /
    • 2000
  • An apparel products retrieval system was proposed in which users can refer to products using Kansei evaluation values. The system adopts relevance feedback using history of the retrieval to learn the tendency of user evaluation. The system is based on a vector space retrieval model using products images expression as semantic scales. The system makes a query from user inputting information and retrieves closest products from the database. Revising algorithms of the difference method. linear multiple regression performed to investigate the effectiveness and criteria of the search. As a result of evaluation of the accuracy, it was found that the linear multiple regression and the neural network models are effective for the retrieval considering the individual Kansei.

  • PDF

Clustering Observations for Detecting Multiple Outliers in Regression Models

  • Seo, Han-Son;Yoon, Min
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.3
    • /
    • pp.503-512
    • /
    • 2012
  • Detecting outliers in a linear regression model eventually fails when similar observations are classified differently in a sequential process. In such circumstances, identifying clusters and applying certain methods to the clustered data can prevent a failure to detect outliers and is computationally efficient due to the reduction of data. In this paper, we suggest to implement a clustering procedure for this purpose and provide examples that illustrate the suggested procedure applied to the Hadi-Simonoff (1993) method, reverse Hadi-Simonoff method, and Gentleman-Wilk (1975) method.