• Title/Summary/Keyword: 다중선형회귀

Search Result 416, Processing Time 0.032 seconds

Improving Polynomial Regression Using Principal Components Regression With the Example of the Numerical Inversion of Probability Generating Function (주성분회귀분석을 활용한 다항회귀분석 성능개선: PGF 수치역변환 사례를 중심으로)

  • Yang, Won Seok;Park, Hyun-Min
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.1
    • /
    • pp.475-481
    • /
    • 2015
  • We use polynomial regression instead of linear regression if there is a nonlinear relation between a dependent variable and independent variables in a regression analysis. The performance of polynomial regression, however, may deteriorate because of the correlation caused by the power terms of independent variables. We present a polynomial regression model for the numerical inversion of PGF and show that polynomial regression results in the deterioration of the estimation of the coefficients. We apply principal components regression to the polynomial regression model and show that principal components regression dramatically improves the performance of the parameter estimation.

An Analysis Study for Optimal Uptake of Nutrient Solution Based on Multiple Linear Regression Model in Strawberry Hydroponic Environments (딸기 수경 재배 환경에서의 다중 선형 회귀 모델 기반의 양액 적정 흡수량 분석 연구)

  • Lim, Jong-Hyun;Lee, Myeong-Bae;Cho, Hyun-Wook;Shin, Chang-Sun;Park, Chang-Woo;Cho, Yong-Yun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.10a
    • /
    • pp.578-580
    • /
    • 2019
  • 우리 나라의 딸기 수경재배 면적은 2002년 5ha로 시작해서, 2007년에는 84ha, 2012년에는 317ha, 2017년에 1,575ha로 매년 30% 이상 급속하게 성장하고 있다. 이런 경향은 수경재배가 토양재배보다 작업이 용이하여 노동시간이 절약되며, 수량을 더 많이 생산할 수 있기 때문이다. 하지만, 공급양액을 배액으로 흘려버리는 비순환식 수경재배 방식이 증가 하면서 환경오염을 유발시킬 뿐만 아니라 수경재배 운영비용의 증가를 가져오고 있다. 본 논문은 작물 생장에 최적화된 양액공급을 위해 상관관계 분석 및 다중 선형 회귀 모델 기반의 딸기 수경재배 환경에서의 최적 양액 흡수량을 분석하고 추정해 보았다. 분석 결과, 수경재배 환경정보(일사량, 온도, 습도, CO2 등)를 대상으로 일사량 및 온도가 습도 및 CO2에 비해 딸기재배를 위한 양액 흡수량에 더 큰 영향을 주는 것으로 분석되었고, 다중 선형 회귀 모델을 통한 회귀식의 R-Square값은 0.358으로 나타났다.

Development of Accident Forecasting Models in Freeway Tunnels using Multiple Linear Regression Analysis (다중선형 회귀분석을 이용한 고속도로 터널구간의 교통사고 예측모형 개발)

  • Park, Ju-Hwan;Kim, Sang-Gu
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.11 no.6
    • /
    • pp.145-154
    • /
    • 2012
  • This paper analyzed the characteristics of traffic accidents in all tunnels on nationwide freeways and selected some various independent variables related to accident occurrence in tunnels. The study aims to develop reliable accident forecasting models using the various dependent variables such as the number of accident (no.), no./km, and no./MVK. Finally, reliable multiple linear regression models were proposed in this paper. This study tested the validity verification of developed models through statistics such as $R^2$, F values, multicollinearity, residual analysis. The paper selected the accident forecasting models considering the characteristics of tunnel accidents and two models were finally proposed according to two groups of tunnel length. In the selected models, natural logarithm of ln(no./MVK) is used for the dependent variable and AADT, vertical slope, and tunnel hight are used for the independent variables. The reliability of two models was proved by the comparison analysis between field data and estimating data using RMSE and MAE. These models may be not only effective in evaluating tunnel safety under design and planning phases of tunnel but also useful to reduce traffic accidents in tunnels and to manage the traffic flow of tunnel.

Estimation of river water depth using UAV-assisted RGB imagery and multiple linear regression analysis (무인기 지원 RGB 영상과 다중선형회귀분석을 이용한 하천 수심 추정)

  • Moon, Hyeon-Tae;Lee, Jung-Hwan;Yuk, Ji-Moon;Moon, Young-Il
    • Journal of Korea Water Resources Association
    • /
    • v.53 no.12
    • /
    • pp.1059-1070
    • /
    • 2020
  • River cross-section measurement data is one of the most important input data in research related to hydraulic and hydrological modeling, such as flow calculation and flood forecasting warning methods for river management. However, the acquisition of accurate and continuous cross-section data of rivers leading to irregular geometric structure has significant limitations in terms of time and cost. In this regard, a primary objective of this study is to develop a methodology that is able to measure the spatial distribution of continuous river characteristics by minimizing the input of time, cost, and manpower. Therefore, in this study, we tried to examine the possibility and accuracy of continuous cross-section estimation by estimating the water depth for each cross-section through multiple linear regression analysis using RGB-based aerial images and actual data. As a result of comparing with the actual data, it was confirmed that the depth can be accurately estimated within about 2 m of water depth, which can capture spatially heterogeneous relationships, and this is expected to contribute to accurate and continuous river cross-section acquisition.

Characteristics and Models of the Side-swipe Accident in the Case of Cheongju 4-legged Signalized Intersections (4지 신호교차로의 측면접촉사고 특성 및 사고모형 - 청주시를 사례로 -)

  • Park, Sang-Hyuk;Kim, Tae-Young;Park, Byung-Ho
    • International Journal of Highway Engineering
    • /
    • v.11 no.4
    • /
    • pp.41-47
    • /
    • 2009
  • This study deals with the side-swipe accidents of 4-legged signalized intersections in Cheongju. The objectives are to analyze the characteristics of the accidents and to develop the related models. In pursuing the above, this study gives particular emphasis to finding the appropriate methodology to modelling. The main results are as follows. First, injuries were analyzed to be twice than property-only accidents in the side-swipe accidents. The accidents were evaluated to occur more in inside-intersection. Also, the accidents were analyzed to be almost the auto-related accidents and to be occurred by the unsafely-driving activity. Second, multiple linear regression models were evaluated to be more statistically significant than multiple non-linear. The most fitted models were analyzed to be the models with the number of accidents as the dependent variable. The factors of side-swipe accidents analyzed in this study were ADT, area of intersection, right-turn-only-lane, number of pedestrian crossings, limited speed of main road, maximum grade and number of signal phase.

  • PDF

Multivariate Statistical Analysis and Prediction for the Flash Points of Binary Systems Using Physical Properties of Pure Substances (순수 성분의 물성 자료를 이용한 2성분계 혼합물의 인화점에 대한 다변량 통계 분석 및 예측)

  • Lee, Bom-Sock;Kim, Sung-Young
    • Journal of the Korean Institute of Gas
    • /
    • v.11 no.3
    • /
    • pp.13-18
    • /
    • 2007
  • The multivariate statistical analysis, using the multiple linear regression(MLR), have been applied to analyze and predict the flash points of binary systems. Prediction for the flash points of flammable substances is important for the examination of the fire and explosion hazards in the chemical process design. In this paper, the flash points are predicted by MLR based on the physical properties of pure substances and the experimental flash points data. The results of regression and prediction by MLR are compared with the values calculated by Raoult's law and Van Laar equation.

  • PDF

Multi-objective Genetic Algorithm for Variable Selection in Linear Regression Model and Application (선형회귀모델의 변수선택을 위한 다중목적 유전 알고리즘과 응용)

  • Kim, Dong-Il;Park, Cheong-Sool;Baek, Jun-Geol;Kim, Sung-Shick
    • Journal of the Korea Society for Simulation
    • /
    • v.18 no.4
    • /
    • pp.137-148
    • /
    • 2009
  • The purpose of this study is to implement variable selection algorithm which helps construct a reliable linear regression model. If we use all candidate variables to construct a linear regression model, the significance of the model will be decreased and it will cause 'Curse of Dimensionality'. And if the number of data is less than the number of variables (dimension), we cannot construct the regression model. Due to these problems, we consider the variable selection problem as a combinatorial optimization problem, and apply GA (Genetic Algorithm) to the problem. Typical measures of estimating statistical significance are $R^2$, F-value of regression model, t-value of regression coefficients, and standard error of estimates. We design GA to solve multi-objective functions, because statistical significance of model is not to be estimated by a single measure. We perform experiments using simulation data, designed to consider various kinds of situations. As a result, it shows better performance than LARS (Least Angle Regression) which is an algorithm to solve variable selection problems. We modify algorithm to solve portfolio selection problem which construct portfolio by selecting stocks. We conclude that the algorithm is able to solve real problems.

Procedure for the Selection of Principal Components in Principal Components Regression (주성분회귀분석에서 주성분선정을 위한 새로운 방법)

  • Kim, Bu-Yong;Shin, Myung-Hee
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.5
    • /
    • pp.967-975
    • /
    • 2010
  • Since the least squares estimation is not appropriate when multicollinearity exists among the regressors of the linear regression model, the principal components regression is used to deal with the multicollinearity problem. This article suggests a new procedure for the selection of suitable principal components. The procedure is based on the condition index instead of the eigenvalue. The principal components corresponding to the indices are removed from the model if any condition indices are larger than the upper limit of the cutoff value. On the other hand, the corresponding principal components are included if any condition indices are smaller than the lower limit. The forward inclusion method is employed to select proper principal components if any condition indices are between the upper limit and the lower limit. The limits are obtained from the linear model which is constructed on the basis of the conjoint analysis. The procedure is evaluated by Monte Carlo simulation in terms of the mean square error of estimator. The simulation results indicate that the proposed procedure is superior to the existing methods.

Prediction of the Water Level of the Tidal River using Artificial Neural Networks and Stationary Wavelets Transform (인공신경망과 정상 웨이블렛 변환을 활용한 감조하천 수위 예측)

  • Lee, Jeongha;Hwang, SeokHwan
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.357-357
    • /
    • 2021
  • 홍수로 인한 침수피해 발생을 최소화하기 위해 정확한 하천의 수위 예측과 리드타임 확보가 매우 중요하다. 특히 조석현상의 영향을 받는 감조하천의 경우 기존의 물리적 수문모형의 적용이 제한되어 하천수위 예측의 정확도가 떨어지기도 한다. 따라서 본 연구에서는 이러한 감조하천 수위 예측의 정확도를 높이기 위해 조석현상을 분리하고 인공신경망을 활용하는 하이브리드 모델을 제안 하였으며 다중 선형회귀분석과 비교 분석하였다. 감조하천에 위치한 교량의 수위데이터에서 Stationary Wavelet Transform으로 조석현상을 분리하였으며, 이외의 수위에 영향을 주는 time series data와 인공신경망(ANN)을 활용하여 1시간, 2시간, 3시간 후의 수위를 예측하였다. 하이브리드 모델은 96% 이상의 정확도를 보였으며 다중 선형회귀 분석과 비교하여도 높은 정확성을 보여주었다.

  • PDF

Calculation of Surface Broadband Emissivity by Multiple Linear Regression Model (다중선형회귀모형에 의한 지표면 광대역 방출율 산출)

  • Jo, Eun-Su;Lee, Kyu-Tae;Jung, Hyun-Seok;Kim, Bu-Yo;Zo, Il-Sung
    • Journal of the Korean earth science society
    • /
    • v.38 no.4
    • /
    • pp.269-282
    • /
    • 2017
  • In this study, the surface broadband emissivity ($3.0-14.0{\mu}m$) was calculated using the multiple linear regression model with narrow bands (channels 29, 30, and 31) emissivity data of the Moderate Resolution Imaging Spectroradiometer (MODIS) on Earth Observing System Terra satellite. The 307 types of spectral emissivity data (123 soil types, 32 vegetation types, 19 types of water bodies, 43 manmade materials, and 90 rock) with MODIS University of California Santa Barbara emissivity library and Advanced Spaceborne Thermal Emission & Reflection Radiometer spectral library were used as the spectral emissivity data for the derivation and verification of the multiple linear regression model. The derived determination coefficient ($R^2$) of multiple linear regression model had a high value of 0.95 (p<0.001) and the root mean square error between these model calculated and theoretical broadband emissivities was 0.0070. The surface broadband emissivity from our multiple linear regression model was comparable with that by Wang et al. (2005). The root mean square error between surface broadband emissivities calculated by models in this study and by Wang et al. (2005) during January was 0.0054 in Asia, Africa, and Oceania regions. The minimum and maximum differences of surface broadband emissivities between two model results were 0.0027 and 0.0067 respectively. The similar statistical results were also derived for August. The surface broadband emissivities by our multiple linear regression model could thus be acceptable. However, the various regression models according to different land covers need be applied for the more accurate calculation of the surface broadband emissivities.