• Title/Summary/Keyword: multiple linear regression models

Search Result 318, Processing Time 0.033 seconds

Prediction of unconfined compressive and Brazilian tensile strength of fiber reinforced cement stabilized fly ash mixes using multiple linear regression and artificial neural network

  • Chore, H.S.;Magar, R.B.
    • Advances in Computational Design
    • /
    • v.2 no.3
    • /
    • pp.225-240
    • /
    • 2017
  • This paper presents the application of multiple linear regression (MLR) and artificial neural network (ANN) techniques for developing the models to predict the unconfined compressive strength (UCS) and Brazilian tensile strength (BTS) of the fiber reinforced cement stabilized fly ash mixes. UCS and BTS is a highly nonlinear function of its constituents, thereby, making its modeling and prediction a difficult task. To establish relationship between the independent and dependent variables, a computational technique like ANN is employed which provides an efficient and easy approach to model the complex and nonlinear relationship. The data generated in the laboratory through systematic experimental programme for evaluating UCS and BTS of fiber reinforced cement fly ash mixes with respect to 7, 14 and 28 days' curing is used for development of the MLR and ANN model. The data used in the models is arranged in the format of four input parameters that cover the contents of cement and fibers along with maximum dry density (MDD) and optimum moisture contents (OMC), respectively and one dependent variable as unconfined compressive as well as Brazilian tensile strength. ANN models are trained and tested for various combinations of input and output data sets. Performance of networks is checked with the statistical error criteria of correlation coefficient (R), mean square error (MSE) and mean absolute error (MAE). It is observed that the ANN model predicts both, the unconfined compressive and Brazilian tensile, strength quite well in the form of R, RMSE and MAE. This study shows that as an alternative to classical modeling techniques, ANN approach can be used accurately for predicting the unconfined compressive strength and Brazilian tensile strength of fiber reinforced cement stabilized fly ash mixes.

Testing for A Change Point by Model Selection Tools in Linear Regression Models

  • Yoon, Yong-Hwa;Kim, Jong-Tae;Cho, Kil-Ho;Shin, Kyung-A
    • Communications for Statistical Applications and Methods
    • /
    • v.7 no.3
    • /
    • pp.655-665
    • /
    • 2000
  • Several information criterions, Schwarz information criterion (SIC), Akaike information criterion (AIC), and the modified Akaike information criterion ($AIC_c$), are proposed to locate a change point in the multiple linear regression model. These methods are applied to a stock Exchange data set and compared to the results.

  • PDF

Statistical Models of Air Temperatures in Seoul (서울시 도시기온 변화에 관한 모델 연구)

  • 김학열;김운수
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.31 no.3
    • /
    • pp.74-82
    • /
    • 2003
  • Under the assumption that the temperature of one location is closely related to land use characteristics around that location, this study is carried out to assess the impact of urban land use patterns on air temperature. In order to investigate the relationship, GIS techniques and statistical analyses are utilized, after spatially connecting urban land use data in Seoul Metropolitan Area with atmospheric data observed at Automatic Weather Stations (AWS). The research method is as follows: (1) To find out important land use factors on temperature, simple linear regressions for a specific time period (pilot study) are conducted with urban land use characteristics, (2) To make a final model, multiple regressions are carried out with those factors and, (3) To verify that the final model could be appled to explain temperature variations beyond the period, the model is extensively used for 5 different time periods: 1999 as a whole; summer in 1999; 1998 as a whole; summer in 1998; August in 1998. The results of simple linear regression models in the pilot study show that transportation facilities and open space area are very influential on urban air temperature variations, which explain 66 and 61 percent of the variations, respectively. However, the other land use variables (residential, commercial, and mixed land use) are found to have weak or insignificant relationship to the air temperatures. Multiple linear regression with the two important variables in the pilot study is estimated, which shows that the model explains 75 percent of the variability in air temperatures with correct signs of regression coefficients. Thus, it is empirically shown that an increase in open space and a decrease in transportation facilities area can leads to the decrease in air temperature. After the final model is extensively applied to the 5 different time periods, the estimated models explain 68 ∼ 75 percent of the variations in the temperatures is significant regression coefficients for all explanatory variables. This result provides a possibility that one air temperature model for a specific time period could be a good model for other time periods near to the period. The important implications of this result to lessen high air temperature we: (1) to expand and to conserve open space and (2) to control transportation-related factors such as transportation facilities area, road pavement and traffic congestion.

Subset selection in multiple linear regression: An improved Tabu search

  • Bae, Jaegug;Kim, Jung-Tae;Kim, Jae-Hwan
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.40 no.2
    • /
    • pp.138-145
    • /
    • 2016
  • This paper proposes an improved tabu search method for subset selection in multiple linear regression models. Variable selection is a vital combinatorial optimization problem in multivariate statistics. The selection of the optimal subset of variables is necessary in order to reliably construct a multiple linear regression model. Its applications widely range from machine learning, timeseries prediction, and multi-class classification to noise detection. Since this problem has NP-complete nature, it becomes more difficult to find the optimal solution as the number of variables increases. Two typical metaheuristic methods have been developed to tackle the problem: the tabu search algorithm and hybrid genetic and simulated annealing algorithm. However, these two methods have shortcomings. The tabu search method requires a large amount of computing time, and the hybrid algorithm produces a less accurate solution. To overcome the shortcomings of these methods, we propose an improved tabu search algorithm to reduce moves of the neighborhood and to adopt an effective move search strategy. To evaluate the performance of the proposed method, comparative studies are performed on small literature data sets and on large simulation data sets. Computational results show that the proposed method outperforms two metaheuristic methods in terms of the computing time and solution quality.

Prediction of compressive strength of concrete using multiple regression model

  • Chore, H.S.;Shelke, N.L.
    • Structural Engineering and Mechanics
    • /
    • v.45 no.6
    • /
    • pp.837-851
    • /
    • 2013
  • In construction industry, strength is a primary criterion in selecting a concrete for a particular application. The concrete used for construction gains strength over a long period of time after pouring the concrete. The characteristic strength of concrete is defined as the compressive strength of a sample that has been aged for 28 days. Neither waiting for 28 days for such a test would serve the rapidity of construction, nor would neglecting it serve the quality control process on concrete in large construction sites. Therefore, rapid and reliable prediction of the strength of concrete would be of great significance. On this backdrop, the method is proposed to establish a predictive relationship between properties and proportions of ingredients of concrete, compaction factor, weight of concrete cubes and strength of concrete whereby the strength of concrete can be predicted at early age. Multiple regression analysis was carried out for predicting the compressive strength of concrete containing Portland Pozolana cement using statistical analysis for the concrete data obtained from the experimental work done in this study. The multiple linear regression models yielded fairly good correlation coefficient for the prediction of compressive strength for 7, 28 and 40 days curing. The results indicate that the proposed regression models are effectively capable of evaluating the compressive strength of the concrete containing Portaland Pozolana Cement. The derived formulas are very simple, straightforward and provide an effective analysis tool accessible to practicing engineers.

Comparison of National Occupational Accident Fatality Rates using Statistical Analysis on Economic and Social Indicators (경제⋅사회지표의 다변량 통계 분석을 활용한 국가 간 산업재해 사고사망 상대수준 비교)

  • Kyunghun, Kim;Sudong, Lee
    • Journal of the Korean Society of Safety
    • /
    • v.37 no.6
    • /
    • pp.128-135
    • /
    • 2022
  • The comparative evaluation of occupational accident fatality rates (OAFRs) of different countries is complicated owing to the differences in their level of socio-economic development. However, such evaluation is necessary to assess the national occupational safety and health system of a country. This study proposes a statistical method to compare the OAFRs of countries taking into consideration the difference in their level of socio-economic development. We first collected data on the socio-economic indicators and OAFRs of 11 countries over a 30-year period. Next, based on literature survey and statistical correlation analysis, we selected the significant independent variables and built multiple linear regression models to predict OAFR. We also determined the groups of countries having heterogeneous relationships between the independent variables and OAFRs, which are represented by the regression models. The proposed method is demonstrated by comparing the OAFR of Korea with the OAFRs of 10 other developed countries.

MapReduce-based Localized Linear Regression for Electricity Price Forecasting (전기 가격 예측을 위한 맵리듀스 기반의 로컬 단위 선형회귀 모델)

  • Han, Jinju;Lee, Ingyu;On, Byung-Won
    • The Transactions of the Korean Institute of Electrical Engineers P
    • /
    • v.67 no.4
    • /
    • pp.183-190
    • /
    • 2018
  • Predicting accurate electricity prices is an important task in the electricity trading market. To address the electricity price forecasting problem, various approaches have been proposed so far and it is known that linear regression-based approaches are the best. However, the use of such linear regression-based methods is limited due to low accuracy and performance. In traditional linear regression methods, it is not practical to find a nonlinear regression model that explains the training data well. If the training data is complex (i.e., small-sized individual data and large-sized features), it is difficult to find the polynomial function with n terms as the model that fits to the training data. On the other hand, as a linear regression model approximating a nonlinear regression model is used, the accuracy of the model drops considerably because it does not accurately reflect the characteristics of the training data. To cope with this problem, we propose a new electricity price forecasting method that divides the entire dataset to multiple split datasets and find the best linear regression models, each of which is the optimal model in each dataset. Meanwhile, to improve the performance of the proposed method, we modify the proposed localized linear regression method in the map and reduce way that is a framework for parallel processing data stored in a Hadoop distributed file system. Our experimental results show that the proposed model outperforms the existing linear regression model. Specifically, the accuracy of the proposed method is improved by 45% and the performance is faster 5 times than the existing linear regression-based model.

Relationship between Stream Geomophological Factors and the Vegetation Abundance - With a Special Reference to the Han River System - (하천의 지형학적 인자와 식생종수의 관계 -한강수계를 중심으로-)

  • 이광우;김태균;심우경
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.30 no.3
    • /
    • pp.73-85
    • /
    • 2002
  • The purpose of this study was to develop prediction models for plant species abundance by stream restoration. Generally the stream plant is affected by stream gemophology. So in this study, the relationship between the vegetation abundance and stream gemophology was developed by multiple regression analysis. The stream characteristics utilized in this study were longitudinal slope, transectional slope, micro-landforms through the longitudinal direction, riparian width and geometric mean diameter and biggest diameter of bed material, and cumulated coarse and fine sand weight portion. The Pyungchang River with mountainous watershed and the Kyungan stream and the Bokha stream in the agricultural region were selected and vegetation species abundance and stream characteristics were documented from the site at 2~3km intervals from the upper stream to the lower. The Models for predicting the vegetation abundance were developed by multiple regression analysis using SPSS statistics package. The linear relationship between the dependant(species abundance) and independant(stream characteristics) variables was tested by a graphical method. Longitudinal and transectional slope had a nonlinear relationship with species abundance. In the next step, the independance between the independant variables was tested and the correlation between independant and dependant variables was tested by the Pearson bivariate correlation test. The selected independant variables were transectional slope, riparian width, and cumulated fine sand weight portion. From the multiple regression analysis, the $R^2$for the Pyungchang river, Kyungan stream, Bokga stream were 0.651, 0.512 and 0.240 respectively. The natural stream configuration in the Pyungchang river had the best result and the lower $R^2$for Kyunan and Bokha stream were due to human impact which disturbed the natural ecosystem. The lowest $R^2$for the Bokha stream was due to the shifting sandy bed. If the stream bed is fugitive, the prediction model may not be valid. Using the multiple regression models, the vegetation abundance could be predicted with stream characteristics such as, transection slope, riaparian width, cumulated fine sand weigth portion, after stream restoration.

Heat Demand Forecasting for Local District Heating (지역 난방을 위한 열 수요예측)

  • Song, Ki-Burm;Park, Jin-Soo;Kim, Yun-Bae;Jung, Chul-Woo;Park, Chan-Min
    • IE interfaces
    • /
    • v.24 no.4
    • /
    • pp.373-378
    • /
    • 2011
  • High level of accuracy in forecasting heat demand of each district is required for operating and managing the district heating efficiently. Heat demand has a close connection with the demands of the previous days and the temperature, general demand forecasting methods may be used forecast. However, there are some exceptional situations to apply general methods such as the exceptional low demand in weekends or vacation period. We introduce a new method to forecast the heat demand to overcome these situations, using the linearities between the demand and some other factors. Our method uses the temperature and the past 7 days' demands as the factors which determine the future demand. The model consists of daily and hourly models which are multiple linear regression models. Appling these two models to historical data, we confirmed that our method can forecast the heat demand correctly with reasonable errors.

Development of Accident Density Model in Korea (국내 교통사고 밀도 모형 개발)

  • Park, Na Young;Kim, Tae Yang;Park, Byung Ho
    • Journal of the Korean Society of Safety
    • /
    • v.32 no.3
    • /
    • pp.130-135
    • /
    • 2017
  • This study deal with the traffic accident. The purpose of this study is to develop the accident density models reflecting the transportation and socioeconomic characteristics based on 230 zones of Korea. In this study, The models which are tested to be statistically significant are developed through multiple linear regression analysis. The main research results are as follows. First, in the transportation-based model, road length, avenue ratio, number of intersections and tunnels are analyzed to be positive to the model, however, school zone is analyzed to be negative to the model. Second, in the socioeconomic-based model, population density, transportation vulnerable ratio, children and truck ratio are analyzed to be positive to the model. Finally, in the integrated models, road ratio, population density, transportation vulnerable ratio, children ratio, truck ratio and number of companies are analyzed to be positive, however, school zone is analyzed to be negative to the model. This results could be expected to give good implications to accident-reduction policy-making.