• 제목/요약/키워드: Stepwise regression model

검색결과 381건 처리시간 0.064초

군집분석 기법과 단계별 회귀모델을 결합한 예측 방법 (A Prediction Method Combining Clustering Method and Stepwise Regression)

  • 정일교;전치혁
    • 한국경영과학회:학술대회논문집
    • /
    • 대한산업공학회/한국경영과학회 2002년도 춘계공동학술대회
    • /
    • pp.949-952
    • /
    • 2002
  • A regression model is used in predicting the response variable given predictor variables However, in case of large number of predictor variables, a regression model has some problems such as multicollinearity, interpretation of the functional relationship between the response and predictors and prediction accuracy. A clustering method and stepwise regression could be used to reduce the amount of data by grouping predictors having similar properties and by selecting the subset of predictors. respectively. This paper proposes a prediction method combining clustering method and stepwise regression. The proposed method fits a global model and local models and predicts responses given new observations by using both models. The paper also compares the performance of proposed method with stepwise regression via a real data of ample obtained in a steel process.

  • PDF

段階的 多變量 線型回歸에 관하여 (Alternative Derivation of Stepwise Multivariate Linear Regression)

  • 申敏雄;金周成
    • Journal of the Korean Statistical Society
    • /
    • 제7권2호
    • /
    • pp.105-108
    • /
    • 1978
  • Freund, Vail, and Ross, Goldberger and Jochems and Goldberger have given some results for the stepwise estimation of the parameters of a univariate regression model, D.G. Kabe gave similar results for a multivariate linear regression model. We give here alternative derivation of some results derived by D.G. Kabe.

  • PDF

Analysis of Client Propensity in Cyber Counseling Using Bayesian Variable Selection

  • Pi, Su-Young
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제6권4호
    • /
    • pp.277-281
    • /
    • 2006
  • Cyber counseling, one of the most compatible type of consultation for the information society, enables people to reveal their mental agonies and private problems anonymously, since it does not require face-to-face interview between a counsellor and a client. However, there are few cyber counseling centers which provide high quality and trustworthy service, although the number of cyber counseling center has highly increased. Therefore, this paper is intended to enable an appropriate consultation for each client by analyzing client propensity using Bayesian variable selection. Bayesian variable selection is superior to stepwise regression analysis method in finding out a regression model. Stepwise regression analysis method, which has been generally used to analyze individual propensity in linear regression model, is not efficient since it is hard to select a proper model for its own defects. In this paper, based on the case database of current cyber counseling centers in the web, we will analyze clients' propensities using Bayesian variable selection to enable individually target counseling and to activate cyber counseling programs.

단계적 회귀분석과 인공신경망 모형을 이용한 광양항 석탄·철광석 물동량 예측력 비교 분석 (A Comparative Analysis of the Forecasting Performance of Coal and Iron Ore in Gwangyang Port Using Stepwise Regression and Artificial Neural Network Model)

  • 조상호;남형식;류기진;류동근
    • 한국항해항만학회지
    • /
    • 제44권3호
    • /
    • pp.187-194
    • /
    • 2020
  • 항만의 주요 정책 및 향후 운영계획 수립 시 정확한 물동량 예측에 관한 연구는 매우 중요하며 이러한 중요성으로 인해 관련 연구가 활발히 수행되고 있다. 본 논문에서는 국내 최대 석탄 및 철광석 처리 항만인 광양항을 대상으로 단계적 회귀분석과 인공신경망모형을 활용하여 모형간 예측력을 비교하였다. 2009년 1월부터 2019년 1월까지 총 121개월의 월별자료를 활용하였으며 석탄 및 철광석 물동량에 영향을 주는 요인을 선정하여 공급관련요인과 시장·경제관련요인으로 분류하였다. 단계적 회귀분석 결과, 광양항 석탄 물동량 예측모형의 경우, 입항선박 톤수, 석탄가격 및 대미환율이 최종변수로 선정되었고 철광석 물동량 예측모형의 경우, 입항선박 톤수, 철광석가격이 최종변수로 선정되었다. 인공신경망모형의 경우, 모델 성능에 영향을 미치는 다양한 Hyper-parameters를 조정하며 최적 모델을 선정하는 시행착오법을 사용하였다. 분석결과 인공신경망모형이 단계적 회귀분석에 비해 우수한 예측성능을 나타내었으며 예측 모형별 예측값과 실측값을 그래프 상 비교 시에도 인공신경망모형이 단계적 회귀분석에 비해 고·저점을 유사하게 나타냈다.

A Climate Prediction Method Based on EMD and Ensemble Prediction Technique

  • Bi, Shuoben;Bi, Shengjie;Chen, Xuan;Ji, Han;Lu, Ying
    • Asia-Pacific Journal of Atmospheric Sciences
    • /
    • 제54권4호
    • /
    • pp.611-622
    • /
    • 2018
  • Observed climate data are processed under the assumption that their time series are stationary, as in multi-step temperature and precipitation prediction, which usually leads to low prediction accuracy. If a climate system model is based on a single prediction model, the prediction results contain significant uncertainty. In order to overcome this drawback, this study uses a method that integrates ensemble prediction and a stepwise regression model based on a mean-valued generation function. In addition, it utilizes empirical mode decomposition (EMD), which is a new method of handling time series. First, a non-stationary time series is decomposed into a series of intrinsic mode functions (IMFs), which are stationary and multi-scale. Then, a different prediction model is constructed for each component of the IMF using numerical ensemble prediction combined with stepwise regression analysis. Finally, the results are fit to a linear regression model, and a short-term climate prediction system is established using the Visual Studio development platform. The model is validated using temperature data from February 1957 to 2005 from 88 weather stations in Guangxi, China. The results show that compared to single-model prediction methods, the EMD and ensemble prediction model is more effective for forecasting climate change and abrupt climate shifts when using historical data for multi-step prediction.

다중선형회귀모형에서의 변수선택기법 평가 (Evaluating Variable Selection Techniques for Multivariate Linear Regression)

  • 류나현;김형석;강필성
    • 대한산업공학회지
    • /
    • 제42권5호
    • /
    • pp.314-326
    • /
    • 2016
  • The purpose of variable selection techniques is to select a subset of relevant variables for a particular learning algorithm in order to improve the accuracy of prediction model and improve the efficiency of the model. We conduct an empirical analysis to evaluate and compare seven well-known variable selection techniques for multiple linear regression model, which is one of the most commonly used regression model in practice. The variable selection techniques we apply are forward selection, backward elimination, stepwise selection, genetic algorithm (GA), ridge regression, lasso (Least Absolute Shrinkage and Selection Operator) and elastic net. Based on the experiment with 49 regression data sets, it is found that GA resulted in the lowest error rates while lasso most significantly reduces the number of variables. In terms of computational efficiency, forward/backward elimination and lasso requires less time than the other techniques.

통계모형을 이용한 NO2 농도 예측에 관한 연구 (A study on Estimation of NO2 concentration by Statistical model)

  • 장난심
    • 한국환경과학회지
    • /
    • 제14권11호
    • /
    • pp.1049-1056
    • /
    • 2005
  • [ $NO_2$ ] concentration characteristics of Busan metropolitan city was analysed by statistical method using hourly $NO_2$ concentration data$(1998\~2000)$ collected from air quality monitoring sites of the metropolitan city. 4 representative regions were selected among air quality monitoring sites of Ministry of environment. Concentration data of $NO_2$, 5 air pollutants, and data collected at AWS was used. Both Stepwise Multiple Regression model and ARIMA model for prediction of $NO_2$ concentrations were adopted, and then their results were compared with observed concentration. While ARIMA model was useful for the prediction of daily variation of the concentration, it was not satisfactory for the prediction of both rapid variation and seasonal variation of the concentration. Multiple Regression model was better estimated than ARIMA model for prediction of $NO_2$ concentration.

한국 프로스포츠 선수들의 연봉에 대한 다변량적 분석 (A Multivariate Analysis of Korean Professional Players Salary)

  • 송종우
    • 응용통계연구
    • /
    • 제21권3호
    • /
    • pp.441-453
    • /
    • 2008
  • 프로스포츠 선수들의 연봉은 선수들의 개인 성적과 팀에 대한 기여도 등으로 결정된다는 가정하에 프로농구와 프로야구 선수들의 전년도 성적으로 다음해 연봉을 예측 분석하였다. 분석에 있어서 data visualization 기법을 통해 변수사이의 관계, 이상점 발견, 모형진단등을 하였다. 다중선형회귀 모형(Multiple Linear Regression)과 트리모형(Regression Tree)을 이용해서 자료를 분석하고 모델간 비교를 했으며, Cross-Validation을 이용해서 최적모델을 선택하였다. 특히, 자동으로 변수선택을 하는 stepwise regression방법을 그냥 사용하기보다는 먼저 설명변수들 사이의 관계나 설명변수와 반응변수 사이의 관계등을 조사하고 나서 이를 통해 선택된 변수들을 가지고 stepwise regression과 regression tree 방법론을 이용해서 적절한 변수 및 최종 모형을 선택하였다. 분석결과, 프로농구의 경우에는 경기당 득점, 어시스트, 자유투 성공수, 경력 등이 중요한 변수였고, 프로야구 투수의 경우에는 경력, 9이닝 당 삼진 수, 방어율, 피홈런 수 등이 중요한 변수였고, 프로야구 타자의 경우에는 경력, 안타 수, FA(자유계약)유무 여부 등이 중요한 변수였다.

저류함수모형의 매개변수 보정과 홍수예측 (2) 홍수예측방법의 비교 연구 (Parameter Calibration of Storage Function Model and Flood Forecasting (2) Comparative Study on the Flood Forecasting Methods)

  • 김범준;송재현;김형수;홍일표
    • 대한토목학회논문집
    • /
    • 제26권1B호
    • /
    • pp.39-50
    • /
    • 2006
  • 홍수를 예측하기 위해서 국내 5대강 유역의 홍수통제소는 저류함수모형을 사용하고 있으며 현재까지 홍수예측에 대한 많은 연구가 이루어지고 있다. 이에 본 논문에서는 현재 홍수통제소에서 사용되고 있는 저류함수모형과 과거의 강우-수위 관계를 이용한 회귀분석(regression analysis), 그리고 인공신경망(artificial neural network)을 이용하여 홍수를 예측하고 이를 비교, 분석하고자 하였다. 저류함수모형의 경우는 홍수통제소의 대표매개변수와 보정된 최적(평균)매개변수를 적용하였다. 그리고 회귀분석과 인공신경망은 1995~2001년까지의 홍수사상 중 4개의 홍수사상을 선택하여 회귀계수를 구하고 역전파(backpropagation) 알고리즘을 사용하여 학습을 시켰다. 그 결과 저류함수모형의 경우 최적 매개변수를 이용하였을 때 기존의 홍수통제소에서 사용하고 있는 대표매개변수보다 예측이 개선되었으며, 회귀분석의 방법인 다중회귀분석, Robust 회귀분석, Stepwise 회귀분석을 이용한 홍수예측은 비교적 정확한 결과를 얻을 수 있었다. 역전파 알고리즘을 사용한 인공신경망의 경우도 회귀분석을 이용한 홍수예측보다는 다소 못하였지만 정확한 결과를 얻을 수 있었다.

Identifying Factors for Corn Yield Prediction Models and Evaluating Model Selection Methods

  • Chang Jiyul;Clay David E.
    • 한국작물학회지
    • /
    • 제50권4호
    • /
    • pp.268-275
    • /
    • 2005
  • Early predictions of crop yields call provide information to producers to take advantages of opportunities into market places, to assess national food security, and to provide early food shortage warning. The objectives of this study were to identify the most useful parameters for estimating yields and to compare two model selection methods for finding the 'best' model developed by multiple linear regression. This research was conducted in two 65ha corn/soybean rotation fields located in east central South Dakota. Data used to develop models were small temporal variability information (STVI: elevation, apparent electrical conductivity $(EC_a)$, slope), large temporal variability information (LTVI : inorganic N, Olsen P, soil moisture), and remote sensing information (green, red, and NIR bands and normalized difference vegetation index (NDVI), green normalized difference vegetation index (GDVI)). Second order Akaike's Information Criterion (AICc) and Stepwise multiple regression were used to develop the best-fitting equations in each system (information groups). The models with $\Delta_i\leq2$ were selected and 22 and 37 models were selected at Moody and Brookings, respectively. Based on the results, the most useful variables to estimate corn yield were different in each field. Elevation and $EC_a$ were consistently the most useful variables in both fields and most of the systems. Model selection was different in each field. Different number of variables were selected in different fields. These results might be contributed to different landscapes and management histories of the study fields. The most common variables selected by AICc and Stepwise were different. In validation, Stepwise was slightly better than AICc at Moody and at Brookings AICc was slightly better than Stepwise. Results suggest that the Alec approach can be used to identify the most useful information and select the 'best' yield models for production fields.