• Title/Summary/Keyword: 설명변수

Search Result 2,598, Processing Time 0.028 seconds

A Deep Learning Model for Identifying The Time Lag Between Explanatory Variables and Response Variable in Regression Analysis (회귀분석에서 설명변수와 반응변수 간의 시차를 파악하는 딥러닝 모델)

  • Kim, Chaehyeon;Ryoo, Euirim;Lee, Ki Yong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.868-871
    • /
    • 2021
  • 기후, 경영, 경제 등 여러 분야의 회귀분석에서 설명변수가 반응변수에 일정 시차를 두고 영향을 미치는 경우들이 많다. 하지만 지금까지 대부분의 회귀분석은 설명변수가 반응변수에 즉각적으로 영향을 미치는 경우만을 가정하고 있으며, 설명변수와 반응변수 간에 존재하는 시차를 탐색하는 연구는 거의 이루어지지 않았다. 그러나 보다 정확한 회귀분석을 위해서는 설명변수와 반응변수 간에 존재하는 시차를 파악하는 것이 중요하다. 본 논문은 회귀분석 데이터가 주어졌을 때 설명변수와 반응변수 간에 존재하는 시차를 파악하는 딥러닝 모델을 제안한다. 제안하는 딥러닝 모델은 설명변수의 과거 값들 중 어떤 값이 현재 반응변수에 가장 큰 영향을 미치는지를 노드 간 가중치로 표현하고, 회귀모델의 오차를 최소화하는 가중치를 탐색한다. 훈련이 끝나면 이 가중치들을 사용하여 각 설명변수와 반응변수 간에 존재하는 시차를 파악한다. 실험을 통해 제안 방법은 시차를 고려하지 않는 기존 회귀모델에 비해 시차까지 고려함으로써 오차가 1/100 수준에 불과한 더 정확한 회귀모델을 찾을 수 있음을 확인하였다.

A Study on the Modal Split Model Using Zonal Data (존 데이터 기반 수단분담모형에 관한 연구)

  • Ryu, Si-Kyun;Rho, Jeong-Hyun;Kim, Ji-Eun
    • Journal of Korean Society of Transportation
    • /
    • v.30 no.1
    • /
    • pp.113-123
    • /
    • 2012
  • This study introduces a new type of a modal split model that use zonal data instead of cost data as independent variables. It has been indicated that the ones using cost data have deficiencies in the multicollinearity of travel time and cost variables and unpredictability of independent variables. The zonal data employed in this study include (1) socioeconomic data, (2) land use data and (3) transportation system data. The test results showed that the proposed modal split model using zonal data performs better than the other does.

Subset Selection in the Poisson Models - A Normal Predictors case - (포아송 모형에서의 설명변수 선택문제 - 정규분포 설명변수하에서 -)

  • 박종선
    • The Korean Journal of Applied Statistics
    • /
    • v.11 no.2
    • /
    • pp.247-255
    • /
    • 1998
  • In this paper, a new subset selection problem in the Poisson model is considered under the normal predictors. It turns out that the subset model has bigger valiance than that of the Poisson model with random predictors and this has been used to derive new subset selection method similar to Mallows'$C_p$.

  • PDF

A study on the comparison of descriptive variables reduction methods in decision tree induction: A case of prediction models of pension insurance in life insurance company (생명보험사의 개인연금 보험예측 사례를 통해서 본 의사결정나무 분석의 설명변수 축소에 관한 비교 연구)

  • Lee, Yong-Goo;Hur, Joon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.1
    • /
    • pp.179-190
    • /
    • 2009
  • In the financial industry, the decision tree algorithm has been widely used for classification analysis. In this case one of the major difficulties is that there are so many explanatory variables to be considered for modeling. So we do need to find effective method for reducing the number of explanatory variables under condition that the modeling results are not affected seriously. In this research, we try to compare the various variable reducing methods and to find the best method based on the modeling accuracy for the tree algorithm. We applied the methods on the pension insurance of a insurance company for getting empirical results. As a result, we found that selecting variables by using the sensitivity analysis of neural network method is the most effective method for reducing the number of variables while keeping the accuracy.

  • PDF

Variable Selection in PLS Regression with Penalty Function (벌점함수를 이용한 부분최소제곱 회귀모형에서의 변수선택)

  • Park, Chong-Sun;Moon, Guy-Jong
    • Communications for Statistical Applications and Methods
    • /
    • v.15 no.4
    • /
    • pp.633-642
    • /
    • 2008
  • Variable selection algorithm for partial least square regression using penalty function is proposed. We use the fact that usual partial least square regression problem can be expressed as a maximization problem with appropriate constraints and we will add penalty function to this maximization problem. Then simulated annealing algorithm can be used in searching for optimal solutions of above maximization problem with penalty functions added. The HARD penalty function would be suggested as the best in several aspects. Illustrations with real and simulated examples are provided.

Efficient variable selection method using conditional mutual information (조건부 상호정보를 이용한 분류분석에서의 변수선택)

  • Ahn, Chi Kyung;Kim, Donguk
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.5
    • /
    • pp.1079-1094
    • /
    • 2014
  • In this paper, we study efficient gene selection methods by using conditional mutual information. We suggest gene selection methods using conditional mutual information based on semiparametric methods utilizing multivariate normal distribution and Edgeworth approximation. We compare our suggested methods with other methods such as mutual information filter, SVM-RFE, Cai et al. (2009)'s gene selection (MIGS-original) in SVM classification. By these experiments, we show that gene selection methods using conditional mutual information based on semiparametric methods have better performance than mutual information filter. Furthermore, we show that they take far less computing time than Cai et al. (2009)'s gene selection but have similar performance.

Sample-spacing Approach for the Estimation of Mutual Information (SAMPLE-SPACING 방법에 의한 상호정보의 추정)

  • Huh, Moon-Yul;Cha, Woon-Ock
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.2
    • /
    • pp.301-312
    • /
    • 2008
  • Mutual information is a measure of association of explanatory variable for predicting target variable. It is used for variable ranking and variable subset selection. This study is about the Sample-spacing approach which can be used for the estimation of mutual information from data consisting of continuous explanation variables and categorical target variable without estimating a joint probability density function. The results of Monte-Carlo simulation and experiments with real-world data show that m = 1 is preferable in using Sample-spacing.

Effects of Multicollinearity in Logit Model (로짓모형에 있어서 다중공선성의 영향에 관한 연구)

  • Ryu, Si-Kyun
    • Journal of Korean Society of Transportation
    • /
    • v.26 no.1
    • /
    • pp.113-126
    • /
    • 2008
  • This research aims to explore the effects of multicollinearity on the reliability and goodness of fit of logit model. To investigate the effects of multicollinearity on the multinominal logit model, numerical experiments are performed. The exploratory variables(attributes of utility functions) which have a certain degree of correlations from (rho=) 0.0 to (rho=) 0.9 are generated and rho-squares and t-statistics which are the indices of goodness of fit and reliability of logit model are traced. From the well designed numerical experiments, following findings are validated : 1) When a new exploratory variable is added, some of rho-squares increase while the others decrease. 2) The higher relations between generic variables lead a logit model worse with respect to goodness of fit. 3) Multicollinearity has a tendency to produce over-evaluated parameters. 4) The reliability of the estimated parameter has a tendency to decrease when the correlations between attributes are high. These results suggest that we have to examine the existence of multicollinearity and perform the proper treatments to diminish multicollinearity when we develop logit model.

회귀분성에서의 3차원 편잔차그림

  • 강명욱;이정아
    • The Korean Journal of Applied Statistics
    • /
    • v.13 no.1
    • /
    • pp.133-143
    • /
    • 2000
  • 비선형성이 존재하는 두 개의 설명변수가 모형에 선형으로 포함되는 경우 두 설명변수가 연관성이 약하면 각각의 변수에 대한 2차원 편잔차 그림이 비선형성의 존재와 형태를 잘 나타낸다. 그러나 두 변수가 연관성이 강하면 3차원 편잔차 그림이 필요하며 2차원 편잔차 그림으로는 알아낼 수 없는 비선형성에 대한 탐지가 가능하다.

  • PDF

깁스표본기법을 이용한 설명변수 선택문제에서 사전분포의 설정-선형회귀모형을 중심으로-

  • 박종선;남궁평;한숙영
    • Communications for Statistical Applications and Methods
    • /
    • v.4 no.2
    • /
    • pp.333-343
    • /
    • 1997
  • 선형회귀분석에서 변수의 선택문제는 최적의 모형을 찾는데 아주 중요한 부분을 차지한다. George와 McCulloch(1993)는 계층적 베이즈 모형과 깁스표본법을 이용하여 선형회귀모형에서 변수를 선택하는 문제를 고려하였다. 이 논문에서는 George와 McCulloch의 모형을 바탕으로 각각의 설명변수가 모형에 포함될 사전확률을 객관적인 기준에 의하여 결정하는 문제를 고려하여 보았다.

  • PDF