• Title/Summary/Keyword: Multiple regression model

Search Result 2,523, Processing Time 0.03 seconds

Subset selection in multiple linear regression: An improved Tabu search

  • Bae, Jaegug;Kim, Jung-Tae;Kim, Jae-Hwan
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.40 no.2
    • /
    • pp.138-145
    • /
    • 2016
  • This paper proposes an improved tabu search method for subset selection in multiple linear regression models. Variable selection is a vital combinatorial optimization problem in multivariate statistics. The selection of the optimal subset of variables is necessary in order to reliably construct a multiple linear regression model. Its applications widely range from machine learning, timeseries prediction, and multi-class classification to noise detection. Since this problem has NP-complete nature, it becomes more difficult to find the optimal solution as the number of variables increases. Two typical metaheuristic methods have been developed to tackle the problem: the tabu search algorithm and hybrid genetic and simulated annealing algorithm. However, these two methods have shortcomings. The tabu search method requires a large amount of computing time, and the hybrid algorithm produces a less accurate solution. To overcome the shortcomings of these methods, we propose an improved tabu search algorithm to reduce moves of the neighborhood and to adopt an effective move search strategy. To evaluate the performance of the proposed method, comparative studies are performed on small literature data sets and on large simulation data sets. Computational results show that the proposed method outperforms two metaheuristic methods in terms of the computing time and solution quality.

Development of prediction methodology from CO2 emissions of construction equipment based multiple linear regression (다중선형회귀분석 기반 건설장비 이산화탄소 배출량 예측모델 개발)

  • Gwon, Jae-Min;Lee, Jae-Hak;Jo, Min-Do;Choi, Young-Jun;Han, Seung-Woo
    • Proceedings of the Korean Institute of Building Construction Conference
    • /
    • 2019.11a
    • /
    • pp.38-39
    • /
    • 2019
  • Environmental problems caused by GHG emitted by various industries are emerging around the world, and accordingly, relevant regulations are being applied by countries around the world. Korea is operating a carbon credit system that trades GHG in industry for money, which is expected to be applied to the construction industry. In addition, construction equipment using fossil fuels accounts for the largest portion of $CO_2$ emissions in the construction industry, and the importance of $CO_2$ reduction and prediction is increasing. However, there is a lack of data on the directly measured $CO_2$ emissions of construction equipment and there is no accurate methodology for measuring methods. Therefore, in this study, independent variables were derived based on the $CO_2$ emission data. In addition, multiple linear regression is performed for each independent variable to derive a predictive model of carbon dioxide emission by work type of construction equipment. It is expected that the construction process plan based on environmental factors in the construction industry can be established in the future.

  • PDF

A Study on Predictive Models based on the Machine Learning for Evaluating the Extent of Hazardous Zone of Explosive Gases (기계학습 기반의 가스폭발위험범위 예측모델에 관한 연구)

  • Jung, Yong Jae;Lee, Chang Jun
    • Korean Chemical Engineering Research
    • /
    • v.58 no.2
    • /
    • pp.248-256
    • /
    • 2020
  • In this study, predictive models based on machine learning for evaluating the extent of hazardous zone of explosive gases are developed. They are able to provide important guidelines for installing the explosion proof apparatus. 1,200 research data sets including 12 combustible gases and their extents of hazardous zone are generated to train predictive models. The extent of hazardous zone is set to an output variable and 12 variables affecting an output are set as input variables. Multiple linear regression, principal component regression, and artificial neural network are employed to train predictive models. Mean absolute percentage errors of multiple linear regression, principal component regression, and artificial neural network are 44.2%, 49.3%, and 5.7% and root mean square errors are 1.389m, 1.602m, and 0.203 m respectively. Therefore, it can be concluded that the artificial neural network shows the best performance. This model can be easily used to evaluate the extent of hazardous zone for explosive gases.

Source Identification of Ambient PM-10 Using the PMF Model (PMF 모델을 이용한 대기 중 PM-10 오염원의 확인)

  • 황인조;김동술
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.19 no.6
    • /
    • pp.701-717
    • /
    • 2003
  • The objective of this study was to extensively estimate the air quality trends of the study area by surveying con-centration trends in months or seasons, after analyzing the mass concentration of PM-10 samples and the inorganic lements, ion, and total carbon in PM-10. Also, the study introduced to apply the PMF (Positive Matrix Factoriza-tion) model that is useful when absence of the source profile. Thus the model was thought to be suitable in Korea that often has few information about pollution sources. After obtaining results from the PMF modeling, the existing sources at the study area were qualitatively identified The PM-10 particles collected on quartz fiber filters by a PM-10 high-vol air sampler for 3 years (Mar. 1999∼Dec.2001) in Kyung Hee University. The 25 chemical species (Al, Mn, Ti, V, Cr, Fe, Ni, Cu, Zn, As, Se, Cd, Ba, Ce, Pb, Si, N $a^{#}$, N $H_4$$^{+}$, $K^{+}$, $Mg^{2+}$, $Ca^{2+}$, C $l^{[-10]}$ , N $O_3$$^{[-10]}$ , S $O_4$$^{2-}$, TC) were analyzed by ICP-AES, IC, and EA after executing proper pre - treatments of each sample filter. The PMF model was intensively applied to estimate the quantitative contribution of air pollution sources based on the chemical information (128 samples and 25 chemical species). Through a case study of the PMF modeling for the PM-10 aerosols. the total of 11 factors were determined. The multiple linear regression analysis between the observed PM-10 mass concentration and the estimated G matrix had been performed following the FPEAK test. Finally the regression analysis provided source profiles (scaled F matrix). So, 11 sources were qualitatively identified, such as secondary aerosol related source, soil related source, waste incineration source, field burning source, fossil fuel combustion source, industry related source, motor vehicle source, oil/coal combustion source, non-ferrous metal source, and aged sea- salt source, respectively.ively.y.

The Structural Path Model of Adolescents′ Internet Addiction and Expected Self-Control (청소년의 인터넷 중독현상과 자기통제기대의 구조적 경로모형에 관한 연구)

  • 박재성
    • Korean Journal of Health Education and Promotion
    • /
    • v.21 no.3
    • /
    • pp.1-17
    • /
    • 2004
  • The purpose of this study is to evaluate the roles of expected self-control and expected self-control results in explaining adolescents' Internet addiction. In the study model, expectations of self-control and self-control results directly determine Internet addiction and Internet use time meditates the impacts of expectations of self-control and self-control results on Internet addiction. The study subjects are 1,080 middle and high school students in Busan. Stratified cluster sampling is applied by school type and school year. The response rate is 96%(l,037cases). This study develops the scales of expected self-control and expected self-control results. The scales of Internet addiction are devised by using the concept of functional dependency such as salience, withdrawal symptoms, mood modification, tolerance, relapse, and conflict. For verifying the study model, path analysis and multiple regression models are applied for identifying path significants and evaluating confounding effects of control variables, respectively. Moreover, multi partial F-test is performed for selecting the best regression model. Expected self-control is a significant determinant of Internet addiction and Internet use time that also significantly explains Internet addiction. The total effect of expected self-control towards Internet addiction is -.95. The total effect is comprised with the direct effect (-.71) and the indirect effect(-.24). In this result, the direct effect refers a curative effect since expected self-control directly reduces the level of Internet addiction, and the indirect effect refers a preventive effect because self-control can reduce time of Internet use that is a direct determinant of Internet addiction. In the test of the confounding effects of control variables, there are no confounding effects in the models of multiple regression. It implies a robustness of the study model as regards control variables. In conclusion, improving adolescents' expected self-control can control Internet addiction level. This finding implies that a health promotion program for improving expected self-control can be a cost effective method compared to other approaches.

Prediction of Short and Long-term PV Power Generation in Specific Regions using Actual Converter Output Data (실제 컨버터 출력 데이터를 이용한 특정 지역 태양광 장단기 발전 예측)

  • Ha, Eun-gyu;Kim, Tae-oh;Kim, Chang-bok
    • Journal of Advanced Navigation Technology
    • /
    • v.23 no.6
    • /
    • pp.561-569
    • /
    • 2019
  • Solar photovoltaic can provide electrical energy with only radiation, and its use is expanding rapidly as a new energy source. This study predicts the short and long-term PV power generation using actual converter output data of photovoltaic system. The prediction algorithm uses multiple linear regression, support vector machine (SVM), and deep learning such as deep neural network (DNN) and long short-term memory (LSTM). In addition, three models are used according to the input and output structure of the weather element. Long-term forecasts are made monthly, seasonally and annually, and short-term forecasts are made for 7 days. As a result, the deep learning network is better in prediction accuracy than multiple linear regression and SVM. In addition, LSTM, which is a better model for time series prediction than DNN, is somewhat superior in terms of prediction accuracy. The experiment results according to the input and output structure appear Model 2 has less error than Model 1, and Model 3 has less error than Model 2.

Using the Theory of Planned Behavior to Explain Dairy Food Consumption amount University Female Students (계획적 행동이론을 이용한 여대생의 유제품 섭취 행동 분석)

  • 김경원;신은미
    • Korean Journal of Community Nutrition
    • /
    • v.8 no.1
    • /
    • pp.53-61
    • /
    • 2003
  • This study was designed to explain the intentions and consumption of dairy foods among university female students. The factors related to intentions of consumption or actual consumption of dairy foods were identified within the theory of planned behavior. The survey questionnaire, developed using open-ended questions (n=35) , was administered to university female students (n:184) Subjects completed information regarding attitudes, subjective norms, perceived control, intentions and consumption of dairy foods. Correlation analysis and multiple regression were used to study the association of factors with intentions and consumption of dairy foods. Subjects showed relatively low intention to consume dairy foods (-0.4 $\pm$ 1.6 from a scale of -4-14). They ate 1.2 $\pm$ 0.9 servings of dairy foods a day and 52.2% of subjects had less than a serving a day, showing inadequate consumption of dairy foods. All three factors, attitudes, subjective norms and perceived control were significantly correlated to the intentions to take dairy foods regularly (r : 0.26-0.27) . Multiple regression results, however, revealed that subjective norms (p < 0.01) and perceived control (p < 0.05) contributed to the model of explaining intentions, while attitudes did not (model $R^2$ : 0.154) . To predict and explain actual consumption of dairy foods, two regression models were examined. In the first model, perceived control was significant in predicting dairy foods consumption, while attitudes and subjective norms were not. In the second model, intentions and perceived control were significantly related to actual consumption of dairy foods, providing the empirical evidence of the theory (model $R^2$: 0.121) These results suggest that perceived control was significant in explaining actual behavior as well as intentions. This study suggests that nutrition education to increase dairy foods consumption for young adults should focus on increasing perception of control and eliciting social support from respected others.

Graphical Method for Multiple Regression Model (다중회귀모형의 그래픽적 방법)

  • Lee, W.R.;Lee, U.K.;Hong, C.S.
    • The Korean Journal of Applied Statistics
    • /
    • v.20 no.1
    • /
    • pp.195-204
    • /
    • 2007
  • In order to represent multiple regression data, an alternative graphical method, called as SSR Plot, is proposed by using geometrical description methods. This plot uses the relation that the sum of sqaures for regression (SSR) of two explanatory variables is known as the sum of the SSR of one variable and the increase in the SSR due to the addition of other variable to the model that already contains a variable. This half circle shaped SSR plot contains vectors corresponding explanatory variables. We might conclude that some explanatory variables corresponding to vectors which locate near the horisontal axis do affect the response variable. Also, for the regression model with two explanatory variables, a magnitude of the angle between two vectors can be identified for suppression.

Incremental Ensemble Learning for The Combination of Multiple Models of Locally Weighted Regression Using Genetic Algorithm (유전 알고리즘을 이용한 국소가중회귀의 다중모델 결합을 위한 점진적 앙상블 학습)

  • Kim, Sang Hun;Chung, Byung Hee;Lee, Gun Ho
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.9
    • /
    • pp.351-360
    • /
    • 2018
  • The LWR (Locally Weighted Regression) model, which is traditionally a lazy learning model, is designed to obtain the solution of the prediction according to the input variable, the query point, and it is a kind of the regression equation in the short interval obtained as a result of the learning that gives a higher weight value closer to the query point. We study on an incremental ensemble learning approach for LWR, a form of lazy learning and memory-based learning. The proposed incremental ensemble learning method of LWR is to sequentially generate and integrate LWR models over time using a genetic algorithm to obtain a solution of a specific query point. The weaknesses of existing LWR models are that multiple LWR models can be generated based on the indicator function and data sample selection, and the quality of the predictions can also vary depending on this model. However, no research has been conducted to solve the problem of selection or combination of multiple LWR models. In this study, after generating the initial LWR model according to the indicator function and the sample data set, we iterate evolution learning process to obtain the proper indicator function and assess the LWR models applied to the other sample data sets to overcome the data set bias. We adopt Eager learning method to generate and store LWR model gradually when data is generated for all sections. In order to obtain a prediction solution at a specific point in time, an LWR model is generated based on newly generated data within a predetermined interval and then combined with existing LWR models in a section using a genetic algorithm. The proposed method shows better results than the method of selecting multiple LWR models using the simple average method. The results of this study are compared with the predicted results using multiple regression analysis by applying the real data such as the amount of traffic per hour in a specific area and hourly sales of a resting place of the highway, etc.

Forecasting for a Credit Loan from Households in South Korea

  • Jeong, Dong-Bin
    • The Journal of Industrial Distribution & Business
    • /
    • v.8 no.4
    • /
    • pp.15-21
    • /
    • 2017
  • Purpose - In this work, we examined the causal relationship between credit loans from households (CLH), loan collateralized with housing (LCH) and an interest of certificate of deposit (ICD) among others in South Korea. Furthermore, the optimal forecasts on the underlying model will be obtained and have the potential for applications in the economic field. Research design, data, and methodology - A total of 31 realizations sampled from the 4th quarter in 2008 to the 4th quarter in 2016 was chosen for this research. To achieve the purpose of this study, a regression model with correlated errors was exploited. Furthermore, goodness-of-fit measures was used as tools of optimal model-construction. Results - We found that by applying the regression model with errors component ARMA(1,5) to CLH, the steep and lasting rise can be expected over the next year, with moderate increase of LCH and ICD. Conclusions - Based on 2017-2018 forecasts for CLH, the precipitous and lasting increase can be expected over the next two years, with gradual rise of two major explanatory variables. By affording the assumption that the feedback among variables can exist, we can, in the future, consider more generalized models such as vector autoregressive model and structural equation model, to name a few.