• Title/Summary/Keyword: Multiple regression models

Search Result 879, Processing Time 0.027 seconds

Ensemble approach for improving prediction in kernel regression and classification

  • Han, Sunwoo;Hwang, Seongyun;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • v.23 no.4
    • /
    • pp.355-362
    • /
    • 2016
  • Ensemble methods often help increase prediction ability in various predictive models by combining multiple weak learners and reducing the variability of the final predictive model. In this work, we demonstrate that ensemble methods also enhance the accuracy of prediction under kernel ridge regression and kernel logistic regression classification. Here we apply bagging and random forests to two kernel-based predictive models; and present the procedure of how bagging and random forests can be embedded in kernel-based predictive models. Our proposals are tested under numerous synthetic and real datasets; subsequently, they are compared with plain kernel-based predictive models and their subsampling approach. Numerical studies demonstrate that ensemble approach outperforms plain kernel-based predictive models.

Comparison of Genetic Parameter Estimates of Total Sperm Cells of Boars between Random Regression and Multiple Trait Animal Models

  • Oh, S.-H.;See, M.T.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.21 no.7
    • /
    • pp.923-927
    • /
    • 2008
  • The objective of this study was to compare random regression model and multiple trait animal model estimates of the (co) variance of total sperm cells over the active lifetime of AI boars. Data were provided by Smithfield Premium Genetics (Rose Hill, NC). Total number of records and animals for the random regression model were 19,629 and 1,736, respectively. Data for multiple trait animal model analyses were edited to include only records produced at 9, 12, 15, 18, 21, 24, and 27 months of age. For the multiple trait method estimates of genetic and residual variance for total sperm cells were heterogeneous among age classifications. When comparing multiple trait method to random regression, heritability estimates were similar except for total sperm cells at 24 months of age. The multiple trait method also resulted in higher estimates of heritability of total sperm cells at every age when compared to random regression results. Random regression analysis provided more detail with regard to changes of variance components with age. Random regression methods are the most appropriate to analyze semen traits as they are longitudinal data measured over the lifetime of boars.

Relationship between Stream Geomophological Factors and the Vegetation Abundance - With a Special Reference to the Han River System - (하천의 지형학적 인자와 식생종수의 관계 -한강수계를 중심으로-)

  • 이광우;김태균;심우경
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.30 no.3
    • /
    • pp.73-85
    • /
    • 2002
  • The purpose of this study was to develop prediction models for plant species abundance by stream restoration. Generally the stream plant is affected by stream gemophology. So in this study, the relationship between the vegetation abundance and stream gemophology was developed by multiple regression analysis. The stream characteristics utilized in this study were longitudinal slope, transectional slope, micro-landforms through the longitudinal direction, riparian width and geometric mean diameter and biggest diameter of bed material, and cumulated coarse and fine sand weight portion. The Pyungchang River with mountainous watershed and the Kyungan stream and the Bokha stream in the agricultural region were selected and vegetation species abundance and stream characteristics were documented from the site at 2~3km intervals from the upper stream to the lower. The Models for predicting the vegetation abundance were developed by multiple regression analysis using SPSS statistics package. The linear relationship between the dependant(species abundance) and independant(stream characteristics) variables was tested by a graphical method. Longitudinal and transectional slope had a nonlinear relationship with species abundance. In the next step, the independance between the independant variables was tested and the correlation between independant and dependant variables was tested by the Pearson bivariate correlation test. The selected independant variables were transectional slope, riparian width, and cumulated fine sand weight portion. From the multiple regression analysis, the $R^2$for the Pyungchang river, Kyungan stream, Bokga stream were 0.651, 0.512 and 0.240 respectively. The natural stream configuration in the Pyungchang river had the best result and the lower $R^2$for Kyunan and Bokha stream were due to human impact which disturbed the natural ecosystem. The lowest $R^2$for the Bokha stream was due to the shifting sandy bed. If the stream bed is fugitive, the prediction model may not be valid. Using the multiple regression models, the vegetation abundance could be predicted with stream characteristics such as, transection slope, riaparian width, cumulated fine sand weigth portion, after stream restoration.

A Study on Predictive Models based on the Machine Learning for Evaluating the Extent of Hazardous Zone of Explosive Gases (기계학습 기반의 가스폭발위험범위 예측모델에 관한 연구)

  • Jung, Yong Jae;Lee, Chang Jun
    • Korean Chemical Engineering Research
    • /
    • v.58 no.2
    • /
    • pp.248-256
    • /
    • 2020
  • In this study, predictive models based on machine learning for evaluating the extent of hazardous zone of explosive gases are developed. They are able to provide important guidelines for installing the explosion proof apparatus. 1,200 research data sets including 12 combustible gases and their extents of hazardous zone are generated to train predictive models. The extent of hazardous zone is set to an output variable and 12 variables affecting an output are set as input variables. Multiple linear regression, principal component regression, and artificial neural network are employed to train predictive models. Mean absolute percentage errors of multiple linear regression, principal component regression, and artificial neural network are 44.2%, 49.3%, and 5.7% and root mean square errors are 1.389m, 1.602m, and 0.203 m respectively. Therefore, it can be concluded that the artificial neural network shows the best performance. This model can be easily used to evaluate the extent of hazardous zone for explosive gases.

A Cost Estimation Model for Highway Projects in Korea

  • Kim, Soo-Yong;Kim, Young-Mok;Luu, Truong-Van
    • Proceedings of the Korean Institute Of Construction Engineering and Management
    • /
    • 2008.11a
    • /
    • pp.922-925
    • /
    • 2008
  • Many highway projects are under way in Korea. However, owners frequently find that the project cost exceeds the budget and they are unable to identify the underlining reasons. The main purpose of this research is to develop cost models for transportation projects in Korea using the multiple linear regression (MLR). The data consist of 27 completed transportation projects, built from 1991 to 2001, The technique of multiple regression analysis is used to develop the parametric cost estimating model for total budget cost per highway square meter (TBC/$m^2$). Findings of the study indicated that MLR car be applied to highway projects in Korea. There are twf) major contributions of this research. (1) the identification of transportation parameters as a significant cost driver for transportation costs and (2) the successful development of the parametric cost estimating models for transportation projects in Korea.

  • PDF

Traffic Accident Density Models Reflecting the Characteristics of the Traffic Analysis Zone in Cheongju (존별 특성을 반영한 교통사고밀도 모형 - 청주시 사례를 중심으로 -)

  • Kim, Kyeong Yong;Beck, Tea Hun;Lim, Jin Kang;Park, Byung Ho
    • International Journal of Highway Engineering
    • /
    • v.17 no.6
    • /
    • pp.75-83
    • /
    • 2015
  • PURPOSES : This study deals with the traffic accidents classified by the traffic analysis zone. The purpose is to develop the accident density models by using zonal traffic and socioeconomic data. METHODS : The traffic accident density models are developed through multiple linear regression analysis. In this study, three multiple linear models were developed. The dependent variable was traffic accident density, which is a measure of the relative distribution of traffic accidents. The independent variables were various traffic and socioeconomic variables. CONCLUSIONS : Three traffic accident density models were developed, and all models were statistically significant. Road length, trip production volume, intersections, van ratio, and number of vehicles per person in the transportation-based model were analyzed to be positive to the accident. Residential and commercial area ratio and transportation vulnerability ratio obtained using the socioeconomic-based model were found to affect the accident. The major arterial road ratio, trip production volume, intersection, van ratio, commercial ratio, and number of companies in the integrated model were also found to be related to the accident.

Water consumption prediction based on machine learning methods and public data

  • Kesornsit, Witwisit;Sirisathitkul, Yaowarat
    • Advances in Computational Design
    • /
    • v.7 no.2
    • /
    • pp.113-128
    • /
    • 2022
  • Water consumption is strongly affected by numerous factors, such as population, climatic, geographic, and socio-economic factors. Therefore, the implementation of a reliable predictive model of water consumption pattern is challenging task. This study investigates the performance of predictive models based on multi-layer perceptron (MLP), multiple linear regression (MLR), and support vector regression (SVR). To understand the significant factors affecting water consumption, the stepwise regression (SW) procedure is used in MLR to obtain suitable variables. Then, this study also implements three predictive models based on these significant variables (e.g., SWMLR, SWMLP, and SWSVR). Annual data of water consumption in Thailand during 2006 - 2015 were compiled and categorized by provinces and distributors. By comparing the predictive performance of models with all variables, the results demonstrate that the MLP models outperformed the MLR and SVR models. As compared to the models with selected variables, the predictive capability of SWMLP was superior to SWMLR and SWSVR. Therefore, the SWMLP still provided satisfactory results with the minimum number of explanatory variables which in turn reduced the computation time and other resources required while performing the predictive task. It can be concluded that the MLP exhibited the best result and can be utilized as a reliable water demand predictive model for both of all variables and selected variables cases. These findings support important implications and serve as a feasible water consumption predictive model and can be used for water resources management to produce sufficient tap water to meet the demand in each province of Thailand.

Prediction of compressive strength of concrete using multiple regression model

  • Chore, H.S.;Shelke, N.L.
    • Structural Engineering and Mechanics
    • /
    • v.45 no.6
    • /
    • pp.837-851
    • /
    • 2013
  • In construction industry, strength is a primary criterion in selecting a concrete for a particular application. The concrete used for construction gains strength over a long period of time after pouring the concrete. The characteristic strength of concrete is defined as the compressive strength of a sample that has been aged for 28 days. Neither waiting for 28 days for such a test would serve the rapidity of construction, nor would neglecting it serve the quality control process on concrete in large construction sites. Therefore, rapid and reliable prediction of the strength of concrete would be of great significance. On this backdrop, the method is proposed to establish a predictive relationship between properties and proportions of ingredients of concrete, compaction factor, weight of concrete cubes and strength of concrete whereby the strength of concrete can be predicted at early age. Multiple regression analysis was carried out for predicting the compressive strength of concrete containing Portland Pozolana cement using statistical analysis for the concrete data obtained from the experimental work done in this study. The multiple linear regression models yielded fairly good correlation coefficient for the prediction of compressive strength for 7, 28 and 40 days curing. The results indicate that the proposed regression models are effectively capable of evaluating the compressive strength of the concrete containing Portaland Pozolana Cement. The derived formulas are very simple, straightforward and provide an effective analysis tool accessible to practicing engineers.

Safety Performance Models of Improvement Projects of Frequent Traffic Accident Locations (사고잦은곳 개선사업의 안전성과 모형)

  • Park, Byung-Ho;Park, Gil-Su;Kim, Tae-Young
    • Journal of the Korean Society of Safety
    • /
    • v.25 no.2
    • /
    • pp.89-94
    • /
    • 2010
  • This study deals with the traffic accident according to the improvement projects of frequent accident locations. The objective is to analyze the impact of improvements on the accident reduction. In pursuing the above, the study gives the particular attentions to developing the models based on the data of 70 intersections improved. The main results analyzed are as follows. First, 4 multiple linear regression accident models(total, side right-angle, rear end and side stripe accident) which were statistically significant were developed. Second, total accidents reduction by sight-distance and turning traffic flow improvements, side right-angle by sight-distance, over-speed and lane operation, rear end by turning traffic flow, signal and lane operation, and side stripe by traffic impedance improvements were analyzed. Finally, the above 4 models were evaluated to be statically significant through the correlation analysis and pair-sample t-test.

Predicting the Soluble Solids of Apples by Near Infrared Spectroscopy (I) - Multiple Linear Regression Models - (근적외선을 이용한 사과의 당도예측 (I) - 다중회귀모델 -)

  • ;W. R. Hruschka;J. A. Abbott;;B. S. Park
    • Journal of Biosystems Engineering
    • /
    • v.23 no.6
    • /
    • pp.561-570
    • /
    • 1998
  • The MLR(Multiple Linear Regression) models to estimate soluble solids content non-destructively were presented to make a selection of optimal photosensor utilized to measure the soluble solids content of apples. Visible and NIR absorbance in the 400 to 2498 nanometer(nm) wavelength region, soluble solids content(sugar content), hardness, and weight were measured for 400 apples(gala). Spectrophotometer with fiber optic probe was utilized for spectrum measurement and digital refractometer was used for soluble solids content. Correlation between absorbance spectrum and soluble solids content was analyzed to pick out the optimal wavelengths and to develop corresponding prediction model by means of MLR. For the coefficient of determination($R^2$) to be over 0.92, the MLR models out of the original absorbance were built based on 7 wavelengths of 992, 904, 1096, 1032, 880, 824, 1048nm, and the ones of the second derivative absorbance based on 5 wavelengths of 784, 1056, 992, 808, 872nm. The best model of the second derivative absorbance spectrum had $R^2$=0.91, bias= -0.02bx, SEP=0.28bx for unknown samples.

  • PDF