• Title/Summary/Keyword: 다중선형 회귀모형

Search Result 135, Processing Time 0.03 seconds

Application of multiple linear regression and artificial neural network models to forecast long-term precipitation in the Geum River basin (다중회귀모형과 인공신경망모형을 이용한 금강권역 강수량 장기예측)

  • Kim, Chul-Gyum;Lee, Jeongwoo;Lee, Jeong Eun;Kim, Hyeonjun
    • Journal of Korea Water Resources Association
    • /
    • v.55 no.10
    • /
    • pp.723-736
    • /
    • 2022
  • In this study, monthly precipitation forecasting models that can predict up to 12 months in advance were constructed for the Geum River basin, and two statistical techniques, multiple linear regression (MLR) and artificial neural network (ANN), were applied to the model construction. As predictor candidates, a total of 47 climate indices were used, including 39 global climate patterns provided by the National Oceanic and Atmospheric Administration (NOAA) and 8 meteorological factors for the basin. Forecast models were constructed by using climate indices with high correlation by analyzing the teleconnection between the monthly precipitation and each climate index for the past 40 years based on the forecast month. In the goodness-of-fit test results for the average value of forecasts of each month for 1991 to 2021, the MLR models showed -3.3 to -0.1% for the percent bias (PBIAS), 0.45 to 0.50 for the Nash-Sutcliffe efficiency (NSE), and 0.69 to 0.70 for the Pearson correlation coefficient (r), whereas, the ANN models showed PBIAS -5.0~+0.5%, NSE 0.35~0.47, and r 0.64~0.70. The mean values predicted by the MLR models were found to be closer to the observation than the ANN models. The probability of including observations within the forecast range for each month was 57.5 to 83.6% (average 72.9%) for the MLR models, and 71.5 to 88.7% (average 81.1%) for the ANN models, indicating that the ANN models showed better results. The tercile probability by month was 25.9 to 41.9% (average 34.6%) for the MLR models, and 30.3 to 39.1% (average 34.7%) for the ANN models. Both models showed long-term predictability of monthly precipitation with an average of 33.3% or more in tercile probability. In conclusion, the difference in predictability between the two models was found to be relatively small. However, when judging from the hit rate for the prediction range or the tercile probability, the monthly deviation for predictability was found to be relatively small for the ANN models.

Hadi와 Simonoff의 다중이상점 식별방법의 개선과 여러 다중이상점 식별방법의 효율성 비교

  • 유종영;김현철
    • Communications for Statistical Applications and Methods
    • /
    • v.3 no.3
    • /
    • pp.11-23
    • /
    • 1996
  • 본 연구에서는 선형회귀분석에서 Hadi와 Simonoff의 다중이상점 식별방법을 수정하여 새로운 알고리즘을 제시하였다. Hadi와 Simonoff의 알고리즘 첫 단계에서 이상점일 가능성이 없는 점들의 집합을 추출할 때 가장효과와 편승효과에 영향을 받을 수 있음으로, 이 첫 단계를 수정하였다. 우리는 잔차가 일정한 분산을 갖는 정규분포에 다르다는 가정하에서 잔차의 신뢰구간을 생각하고, 이 구간안에서 잔차의 MAD가 최소인 새로운 모형을 탐색하고, 이를 이상점일 가능성이 없는 점들의 집합을 추출하는데 일용하는 새로운 알로리즘을 제시하였다. 제시된 방법은 실제자료에서 다른 방법에 비해 효율적으로 이상점을 식별할 수 있었다.

  • PDF

Bayesian analysis of latent factor regression model (내재된 인자회귀모형의 베이지안 분석법)

  • Kyung, Minjung
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.4
    • /
    • pp.365-377
    • /
    • 2020
  • We discuss latent factor regression when constructing a common structure inherent among explanatory variables to solve multicollinearity and use them as regressors to construct a linear model of a response variable. Bayesian estimation with LASSO prior of a large penalty parameter to construct a significant factor loading matrix of intrinsic interests among infinite latent structures. The estimated factor loading matrix with estimated other parameters can be inversely transformed into linear parameters of each explanatory variable and used as prediction models for new observations. We apply the proposed method to Product Service Management data of HBAT and observe that the proposed method constructs the same factors of general common factor analysis for the fixed number of factors. The calculated MSE of predicted values of Bayesian latent factor regression model is also smaller than the common factor regression model.

Prediction of the Water Level of the Tidal River using Artificial Neural Networks and Stationary Wavelets Transform (인공신경망과 정상 웨이블렛 변환을 활용한 감조하천 수위 예측)

  • Lee, Jeongha;Hwang, SeokHwan
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.357-357
    • /
    • 2021
  • 홍수로 인한 침수피해 발생을 최소화하기 위해 정확한 하천의 수위 예측과 리드타임 확보가 매우 중요하다. 특히 조석현상의 영향을 받는 감조하천의 경우 기존의 물리적 수문모형의 적용이 제한되어 하천수위 예측의 정확도가 떨어지기도 한다. 따라서 본 연구에서는 이러한 감조하천 수위 예측의 정확도를 높이기 위해 조석현상을 분리하고 인공신경망을 활용하는 하이브리드 모델을 제안 하였으며 다중 선형회귀분석과 비교 분석하였다. 감조하천에 위치한 교량의 수위데이터에서 Stationary Wavelet Transform으로 조석현상을 분리하였으며, 이외의 수위에 영향을 주는 time series data와 인공신경망(ANN)을 활용하여 1시간, 2시간, 3시간 후의 수위를 예측하였다. 하이브리드 모델은 96% 이상의 정확도를 보였으며 다중 선형회귀 분석과 비교하여도 높은 정확성을 보여주었다.

  • PDF

Optimization for Concurrent Spare Part with Simulation and Multiple Regression (시뮬레이션과 다중 회귀모형을 이용한 동시조달수리부속 최적화)

  • Kim, Kyung-Rok;Yong, Hwa-Young;Kwon, Ki-Sang
    • Journal of the Korea Society for Simulation
    • /
    • v.21 no.3
    • /
    • pp.79-88
    • /
    • 2012
  • Recently, the study in efficient operation, maintenance, and equipment-design have been growing rapidly in military industry to meet the required missions. Through out these studies, the importance of Concurrent Spare Parts(CSP) are emphasized. The CSP, which is critical to the operation and maintenance to enhance the availability, is offered together when a equipment is delivered. Despite its significance, th responsibility for determining the range and depth of CSP are done from administrative decision rather than engineering analysis. The purpose of the paper is to optimize the number of CSP per item using simulation and multiple regression. First, the result, as the change of operational availability, was gained from changing the number of change in simulation model. Second, mathematical regression was computed from the input and output data, and the number of CSP was optimized by multiple regression and linear programming; the constraint condition is the cost for optimization. The advantage of this study is to respond with the transition of constraint condition quickly. The cost per item is consistently altered in the development state of equipment. The speed of analysis, that simulation method is continuously performed whenever constraint condition is repeatedly altered, would be down. Therefore, this study is suitable for real development environment. In the future, the study based on the above concept improves the accuracy of optimization by the technical progress of multiple regression.

Estimating soil moisture using machine learning approach: A Case Study to Yongdam watershed (기계학습 기반의 토양함수 예측 기법 개발 (용담댐 시험유역을 중심으로))

  • Huy, Nguyen Dinh;Kwon, Hyun-Han
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2018.05a
    • /
    • pp.167-167
    • /
    • 2018
  • 토양수분은 토양에 포함된 평균 수분량을 나타내며 수문 순환 관점에서 매우 중요한 수문변량 중 하나이다. 본 연구에서는 대표적인 기계학습 방법인 Support Vector Machine (SVM)을 이용한 토양 함수 예측 기법을 개발하고자 하며, 예측인자로서 원격 탐측 기반의 토양함수자료, 강수량, 온도 등을 활용하고자 한다. SVM은 Kernel 함수를 이용하여 복잡한 비선형 관계를 선형 가정을 통해서 해석하는 기계학습 방법으로서 전역모델(global model)로서 다양한 수문기상분야에 적용이 이루어지고 있다. SVM의 장점은 일정 부분의 오차를 허용함으로서 모형의 일반화 측면에서 기존 인공신경망(artificial neural network, ANN)에 비해 우수한 성능을 나타내며, 특히 예측모형으로서 적용성이 매우 크다. 본 연구에서는 과거 토양 함수 자료와 강수, 온도, 위성 관측 기반 정보 등을 이용하여 모형을 적합시키고 이를 미계측 유역으로 확장하는데 연구의 목적이 있으며, 본 연구를 통해 제안된 모형은 용담댐 시험유역을 대상으로 적용되며 기존 ANN 모형 및 다중회귀분석 결과와 비교를 통해 모형의 적합성을 평가하고자한다.

  • PDF

Prediction of movie audience numbers using hybrid model combining GLS and Bass models (GLS와 Bass 모형을 결합한 하이브리드 모형을 이용한 영화 관객 수 예측)

  • Kim, Bokyung;Lim, Changwon
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.4
    • /
    • pp.447-461
    • /
    • 2018
  • Domestic film industry sales are increasing every year. Theaters are the primary sales channels for movies and the number of audiences using the theater affects additional selling rights. Therefore, the number of audiences using the theater is an important factor directly linked to movie industry sales. In this paper we consider a hybrid model that combines a multiple linear regression model and the Bass model to predict the audience numbers for a specific day. By combining the two models, the predictive value of the regression analysis was corrected to that of the Bass model. In the analysis, three films with different release dates were used. All subset regression method is used to generate all possible combinations and 5-fold cross validation to estimate the model 5 times. In this case, the predicted value is obtained from the model with the smallest root mean square error and then combined with the predicted value of the Bass model to obtain the final predicted value. With the existence of past data, it was confirmed that the weight of the Bass model increases and the compensation is added to the predicted value.

A Causation Study for car crashes at Rural 4-legged Signalized Intersections Using Nonlinear Regression and Structural Equation Methods (비선형 회귀분석과 구조방정식을 이용한 지방부 4지 신호교차로의 사고요인분석)

  • Oh, Ju Taek;Kweon, Ihl;Hwang, Jeong Won
    • Journal of Korean Society of Transportation
    • /
    • v.31 no.1
    • /
    • pp.65-76
    • /
    • 2013
  • Traffic accidents at signalized intersections have been increased annually so that it is required to examine the causation to reduce the accidents. However, the current existing accident models were developed mainly by using non-linear regression models such as Poisson methods. These non-linear regression methods lack to reveal the complicated causation for traffic accidents, though they are the right choice to study randomness and non-linearity of accidents. Therefore, it is required to utilize another statistical method to make up for the lack of the non-linear regression methods. This study developed accident prediction models for 4 legged signalized intersections with Poisson methods and compared them with structural equation models. This study used structural equation methods to reveal the complicated causation of traffic accidents, because the structural equation method has merits to explain more causational factors for accidents than others.

Analyzing Spatial and Temporal Variation of Ground Surface Temperature in Korea (국내 지면온도의 시공간적 변화 분석)

  • Koo Min-Ho;Song Yoon-Ho;Lee Jun-Hak
    • Economic and Environmental Geology
    • /
    • v.39 no.3 s.178
    • /
    • pp.255-268
    • /
    • 2006
  • Recent 22-year (1981-2002) meteorological data of 58 Korea Meteorological Adminstration (KMA) station were analyzed to investigate spatial and temporal variation of surface air temperature (SAT) and ground surface temperature (GST) in Korea. Based on the KMA data, multiple linear regression (MLR) models, having two regression variables of latitude and altitude, were presented to predict mean surface air temperature (MSAT) and mean ground surface temperature (MGST). Both models showed a high accuracy of prediction with $R^2$ values of 0.92 and 0.94, respectively. The prediction of MGST is particularly important in the areas of geothermal energy utilization, since it is a critical parameter of input for designing the ground source heat pump system. Thus, due to a good performance of the MGST regression model, it is expected that the model can be a useful tool for preliminary evaluation of MGST in the area of interest with no reliable data. By a simple linear regression, temporal variation of SAT was analyzed to examine long-term increase of SAT due to the global warming and the urbanization effect. All of the KMA stations except one showed an increasing trend of SAT with a range between 0.005 and $0.088^{\circ}C/yr$ and a mean of $0.043^{\circ}C/yr$. In terms of meteorological factors controlling variation of GST, the effects of solar radiation, terrestrial radiation, precipitation, and snow cover were also discussed based on quantitative and qualitative analysis of the meteorological data.

Development of Empirical Formulas for Storage Function Method (저류함수법의 매개변수 산정식 개발)

  • Choi, Jong-Nam;Ahn, Won-Shik;Kim, Tae-Gyun;Chung, Gun-Hui
    • Journal of the Korean Society of Hazard Mitigation
    • /
    • v.9 no.5
    • /
    • pp.125-130
    • /
    • 2009
  • Storage function method which considers the non-linearity of the relationship between rainfall and runoff has been frequently used to predict runoff in a basin and a flood pattern. However, it is time-consuming to estimate appropriate parameters of every basin and rainfall event, which requires the empirical parameter equation applicable in Korea. In this study, multiple regression analysis is used to develop empirical equations to estimate parameters of Storage Function method using basin characteristics. The basin area, maximum stream length, and stream slope are considered as the basin characteristics as the result of the regression analysis. Collinearity is removed and trial-and-error method is used to choose the most descriptive parameters to the dependent variables in Han River basin which is divided into 30 subbasins. The developed equations are validated using the rainfall events in MunMak gauging station and named as 'Han River equation'. The equation could provide the useful information about Storage Function method parameter to calculate runoff from a basin and predict river stage.