• 제목/요약/키워드: 단계별 회귀

Search Result 285, Processing Time 0.022 seconds

Fast robust variable selection using VIF regression in large datasets (대형 데이터에서 VIF회귀를 이용한 신속 강건 변수선택법)

  • Seo, Han Son
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.4
    • /
    • pp.463-473
    • /
    • 2018
  • Variable selection algorithms for linear regression models of large data are considered. Many algorithms are proposed focusing on the speed and the robustness of algorithms. Among them variance inflation factor (VIF) regression is fast and accurate due to the use of a streamwise regression approach. But a VIF regression is susceptible to outliers because it estimates a model by a least-square method. A robust criterion using a weighted estimator has been proposed for the robustness of algorithm; in addition, a robust VIF regression has also been proposed for the same purpose. In this article a fast and robust variable selection method is suggested via a VIF regression with detecting and removing potential outliers. A simulation study and an analysis of a dataset are conducted to compare the suggested method with other methods.

Comparison of Different Multiple Linear Regression Models for Real-time Flood Stage Forecasting (실시간 수위 예측을 위한 다중선형회귀 모형의 비교)

  • Choi, Seung Yong;Han, Kun Yeun;Kim, Byung Hyun
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.32 no.1B
    • /
    • pp.9-20
    • /
    • 2012
  • Recently to overcome limitations of conceptual, hydrological and physics based models for flood stage forecasting, multiple linear regression model as one of data-driven models have been widely adopted for forecasting flood streamflow(stage). The objectives of this study are to compare performance of different multiple linear regression models according to regression coefficient estimation methods and determine most effective multiple linear regression flood stage forecasting models. To do this, the time scale was determined through the autocorrelation analysis of input data and different flood stage forecasting models developed using regression coefficient estimation methods such as LS(least square), WLS(weighted least square), SPW(stepwise) was applied to flood events in Jungrang stream. To evaluate performance of established models, fours statistical indices were used, namely; Root mean square error(RMSE), Nash Sutcliffe efficiency coefficient (NSEC), mean absolute error (MAE), adjusted coefficient of determination($R^{*2}$). The results show that the flood stage forecasting model using SPW(stepwise) parameter estimation can carry out the river flood stage prediction better in comparison with others, and the flood stage forecasting model using LS(least square) parameter estimation is also found to be slightly better than the flood stage forecasting model using WLS(weighted least square) parameter estimation.

Trip Generation Model based on Geographically Weighted Regression (공간가중회귀분석을 이용한 통행발생모형)

  • Kim, Jin-Hui;Park, Il-Seop;Jeong, Jin-Hyeok
    • Journal of Korean Society of Transportation
    • /
    • v.29 no.2
    • /
    • pp.101-109
    • /
    • 2011
  • In most of the urbanized cities, socio-economic attributes tend to cluster as patterns of similarity in space, namely spatial autocorrelation, by agglomeration forces. The classical linear regression model, the most frequently adopted in the trip generation step, cannot sufficiently represent this effect. In order to take into account the effect properly, we need a model which adequately deals with the spatial dependence patterns. In this study, the Geographically Weighted Regression (GWR) model is adopted as an alternative method for the local analysis of relationships in multivariate data sets; that is GWR extends this traditional regression framework by estimating local rather than global parameters. This study shows the existence of spatial effects in the production and attraction of home base/non-home based trips through the GWR model using travel data collected in Daegu metropolitan area. Furthermore, LISA is employed to verify the fact that the local spatial autocorrelation exists.

The Estimation of Software Development Effort Using Multiple Regression Method (다중회귀 분석을 이용한 소프트웨어 개발노력추정)

  • Jung Hye-Jung;Yang Hae-Sool;Shin Seok-Kyoo;Lee Sang-Un
    • The KIPS Transactions:PartD
    • /
    • v.11D no.7 s.96
    • /
    • pp.1483-1490
    • /
    • 2004
  • To accomplish a project successfuly, we have to estimate develpment effort accurately. But, development effort is different to software size and operation environment. Usually, we made use of function point for estimating development effort. In this paper. we make use of 789 project data. It is related to development projects in 1990`s. We investigate the variable affecting development effort. Also, we exedcute multiple liner regression analysis for looking linear relation about variables. We find the regression equation for multistage by dividing PDR that influ-enced development effort step by step.

An educational tool for regression models with dummy variables using Excel VBA (엑셀 VBA을 이용한 가변수 회귀모형 교육도구 개발)

  • Choi, Hyun Seok;Park, Cheolyong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.3
    • /
    • pp.593-601
    • /
    • 2013
  • We often need to include categorial variables as explanatory variables in regression models. The categorial variables in regression models can be quantified through dummy variables. In this study, we provide an education tool using Excel VBA for displaying regression lines along with test results for regression models with a continuous explanatory variable and one or two categorical explanatory variables. The regression lines with test results are provided step by step for the model(s) with interaction(s), the model(s) without interaction(s) but with dummy variables, and the model without dummy variable(s). With this tool, we can easily understand the meaning of dummy variables and interaction effect through graphics and further decide which model is more suited to the data on hand.

Time series analysis for Korean COVID-19 confirmed cases: HAR-TP-T model approach (한국 COVID-19 확진자 수에 대한 시계열 분석: HAR-TP-T 모형 접근법)

  • Yu, SeongMin;Hwang, Eunju
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.2
    • /
    • pp.239-254
    • /
    • 2021
  • This paper studies time series analysis with estimation and forecasting for Korean COVID-19 confirmed cases, based on the approach of a heterogeneous autoregressive (HAR) model with two-piece t (TP-T) distributed errors. We consider HAR-TP-T time series models and suggest a step-by-step method to estimate HAR coefficients as well as TP-T distribution parameters. In our proposed step-by-step estimation, the ordinary least squares method is utilized to estimate the HAR coefficients while the maximum likelihood estimation (MLE) method is adopted to estimate the TP-T error parameters. A simulation study on the step-by-step method is conducted and it shows a good performance. For the empirical analysis on the Korean COVID-19 confirmed cases, estimates in the HAR-TP-T models of order p = 2, 3, 4 are computed along with a couple of selected lags, which include the optimal lags chosen by minimizing the mean squares errors of the models. The estimation results by our proposed method and the solely MLE are compared with some criteria rules. Our proposed step-by-step method outperforms the MLE in two aspects: mean squares error of the HAR model and mean squares difference between the TP-T residuals and their densities. Moreover, forecasting for the Korean COVID-19 confirmed cases is discussed with the optimally selected HAR-TP-T model. Mean absolute percentage error of one-step ahead out-of-sample forecasts is evaluated as 0.0953% in the proposed model. We conclude that our proposed HAR-TP-T time series model with optimally selected lags and its step-by-step estimation provide an accurate forecasting performance for the Korean COVID-19 confirmed cases.

Modelling the Subway Demand Estimation by Station Using the Multiple Regression Analysis by Category (카테고리별 다중회귀분석 방법을 이용한 지하철역별 수요 추정 모형 개발)

  • Shon, Eui-Young;Kwon, Byoung-Woo;Lee, Man-Ho
    • Journal of Korean Society of Transportation
    • /
    • v.22 no.1 s.72
    • /
    • pp.33-42
    • /
    • 2004
  • 지하철역별 수요는 개통 후 경과 연도에 따라서 S자 형태로 증가한다. 즉 개통 초기에는 잠재되어 있던 지하철 수요가 시간의 경과에 따라 계속적으로 증가하다가, 개통 후 10$\sim$13년 정도가 경과하면 최대를 나타낸 후 거의 정체하는 현상을 보인다. 그러나 지금까지 지하철 수요를 추정하기 위해서 이용되었던 4단계 모형은 이러한 지하철 수요의 증가 추세를 반영할 수 없기 때문에 실제 수요와 많은 차이를 보였다. 따라서 본 연구에서는 이러한 문제를 해결해 보고자 서울시 지하철 2$\sim$8호선의 실제 수요를 토대로 지하철역별 수요, 특히 순수한 승차인원을 추정하는 모형을 개발하였다. 모형에 적용되는 함수식은 실제 지하철역별 수요와 가장 유사한 형태를 보이고 있는 로지스틱 함수식을 이용하였다. 또한 각각의 지하철역별로 나타나는 상이한 특성은 카테고리로 분류하여 모형에 반영하였다. 카테고리는 토지이용도, 사회경제활동의 규모, 그리고 지하철역의 특성에 따라 분류하였다. 각 카테고리별 특성을 대표하는 독립 변수로 인구 종사자수, 학생수와 개통 후 경과 연도 등을 선정하였다. 그 결과 카테고리별로 추정된 지하철역별 수요는 통계적으로 매우 유의한 것으로 나타났다. 본 연구는 지하철역별로 승차하는 순수한 수요를 보다 정확하게 추정하기 위한 모형을 개발하는 것이 주된 목적이다. 반면에 본 모형을 이용하여 지하철역별 하차 수요 및 횐승 수요를 추정하는 것은 어렵다. 따라서 기존에 지하철 수요를 추정하는 데에 가장 많이 사용된 4단계 모형과 접목하여야 하며, 이에 대한 방안도 본 연구에서 제시하였다.

The correlation and regression analyses based on variable selection for the university evaluation index (대학 평가지표들에 대한 상관분석과 변수선택에 의한 선형모형추정)

  • Song, Pil-Jun;Kim, Jong-Tae
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.3
    • /
    • pp.457-465
    • /
    • 2012
  • The purpose of this study is to analyze the association between indicators and to find statistical models based on important indicators at 'College Notifier' in Korea Council for University Education. First, Pearson correlation coefficients are used to find statistically significant correlations. By variable selection method, the important indicators are selected and their coefficients are estimated. As variable selection method, backward and stepwise methods are employed.

문제해결과정의 단계별 회귀가 문제해결시간에 미치는 영향에 관한 연구

  • 손달호;최무진
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 1993.04a
    • /
    • pp.73-82
    • /
    • 1993
  • Over the last decades, interest in the application of decision support systems(DSS) in organizations has increased rapidly. Desipte the growing number of investigations examining decision support system, relatively few empirical studies have evaluated the effects of DSS on problem-solving processes. This study examined, using a computer simulation technique, the effect of recursion in problem-solving processes about the problem-solving time. Results indicate that the recursion at the early stage of problem-solving processes scarcely influenced the problem-solving time, which is contrasted with the case of the recursion at the final stage.

  • PDF

Evaluation of the major sources of atmospheric pollution in jilin city by regression diagnostics (대기 오염이 암에 의한 사망률에 미치는 영향에 관한 연구)

  • 한지농;우치수
    • The Korean Journal of Applied Statistics
    • /
    • v.2 no.2
    • /
    • pp.47-51
    • /
    • 1989
  • We study by stepwise regression method, the influence which atmospheric pollution in Jilin City has on the rate of death accorrding to cancer. The extremes are discovered and we know that $SO_2$ and smoking are important factors too.

  • PDF