• Title/Summary/Keyword: 단계별 변수선택

Search Result 51, Processing Time 0.023 seconds

Variable selection in partial linear regression using the least angle regression (부분선형모형에서 LARS를 이용한 변수선택)

  • Seo, Han Son;Yoon, Min;Lee, Hakbae
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.6
    • /
    • pp.937-944
    • /
    • 2021
  • The problem of selecting variables is addressed in partial linear regression. Model selection for partial linear models is not easy since it involves nonparametric estimation such as smoothing parameter selection and estimation for linear explanatory variables. In this work, several approaches for variable selection are proposed using a fast forward selection algorithm, least angle regression (LARS). The proposed procedures use t-test, all possible regressions comparisons or stepwise selection process with variables selected by LARS. An example based on real data and a simulation study on the performance of the suggested procedures are presented.

The correlation and regression analyses based on variable selection for the university evaluation index (대학 평가지표들에 대한 상관분석과 변수선택에 의한 선형모형추정)

  • Song, Pil-Jun;Kim, Jong-Tae
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.3
    • /
    • pp.457-465
    • /
    • 2012
  • The purpose of this study is to analyze the association between indicators and to find statistical models based on important indicators at 'College Notifier' in Korea Council for University Education. First, Pearson correlation coefficients are used to find statistically significant correlations. By variable selection method, the important indicators are selected and their coefficients are estimated. As variable selection method, backward and stepwise methods are employed.

Fast robust variable selection using VIF regression in large datasets (대형 데이터에서 VIF회귀를 이용한 신속 강건 변수선택법)

  • Seo, Han Son
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.4
    • /
    • pp.463-473
    • /
    • 2018
  • Variable selection algorithms for linear regression models of large data are considered. Many algorithms are proposed focusing on the speed and the robustness of algorithms. Among them variance inflation factor (VIF) regression is fast and accurate due to the use of a streamwise regression approach. But a VIF regression is susceptible to outliers because it estimates a model by a least-square method. A robust criterion using a weighted estimator has been proposed for the robustness of algorithm; in addition, a robust VIF regression has also been proposed for the same purpose. In this article a fast and robust variable selection method is suggested via a VIF regression with detecting and removing potential outliers. A simulation study and an analysis of a dataset are conducted to compare the suggested method with other methods.

Logistic Regressions with Sensory Evaluation Data about Hanwoo Steer Beef (한우 거세우 고기 관능평가 데이터의 로지스틱 회귀분석)

  • Lee, Hye-Jung;Kim, Jae-Hee
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.5
    • /
    • pp.857-870
    • /
    • 2010
  • This study was conducted to investigate the relationship between the socio-demographic factors and the Korean consumers palatability evaluation grades with Hanwoo sensory evaluation data from 2006 to 2008 by National Institute of Animal Science. The dichotomy logistic regression model and the multinomial logistic regression model are fitted with the independent variables such as the consumer living location, age, gender occupation, monthly income, beef cut and the the palatability grade as the categorical dependent variable and tenderness, 리avor and juiciness as the continuous dependent variable. Stepwise variable selection procedure is incorporated to find the final model and odds ratios are calculated to nd the associations between categories.

Analysis of the impact on quitting one's first job using the stepwise sequence - based on graduates occupatinal mobility survey (단계별 순서를 응용한 첫 일자리에서의 조기퇴직에 대한 영향력 분석 -2009년 대졸자 이동경로조사로부터)

  • Chung, Woo-Ho;Lee, Sung-Im
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.6
    • /
    • pp.1191-1201
    • /
    • 2010
  • In this paper, we analyze the impact on quitting one's first job based on "Graduates Occupational Mobility Survey" data given by Korea Employment Information Service. According to the survey, there are a large number of questionnaires on quitting one's first job and so it is not easy to choose among them. We will investigate model selection criteria and apply the procedure proposed by Shtatland et al. (2003) to identify the final model.

Applying regional regression analysis of the hydrologic model parameters for assessing climate change impacts in the ungaged watershed (미계측 유역의 기후변화 영향평가를 위한 수문모형 매개변수의 지역회귀분석 적용)

  • Kim, Youngil;Seo, Seung Beom;Kim, Sung Jin;Kim, Young-Oh
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2017.05a
    • /
    • pp.219-219
    • /
    • 2017
  • 상대적으로 유역의 관측 자료가 충분하지 못하거나 검증되지 않았을 경우 미계측 유역으로 정의되며 수문모형의 매개변수 검정을 할 수 없으므로 다른 방법을 고안해야 한다. 이를 위해 기존 연구에서는 지역적 특성을 고려한 지역회기분석을 통해 미계측 유역의 유량을 산정하였는데, 대부분 유역의 특성과 연 평균 유출량 자료의 관계를 이용한 회귀식으로 실시간 유량의 변화를 고려하기 어려웠다. 본 연구에서는 개념적 강우-유출모형으로 많이 사용되고 있는 개념적 수문모형인 GR4J의 매개변수에 대해 미계측 유역의 특성을 고려한 변수들을 이용하여 회귀식을 구하고 그 적용성을 평가하였다. 이를 통해 미계측 유역의 유량 시계열 자료를 생성할 수 있었다. 또한 IPCC에서 발간한 AR5의 RCP 4.5 시나리오를 적용하여 미래 유출량을 산정하였다. 우선 지역회귀분석을 적용하기 위해 수문모형을 이용한 계측 유역의 유출량을 구하였으며 22개의 전국 댐 상류 지점을 기준으로 SCE 알고리즘을 이용하여 GR4J의 최적 매개변수를 구하고 각 유역별로 물리적, 지형적, 기상학적 특성을 고려하여 11개의 변수를 선택하였다. 각 변수간 다중공선성(Multicollinearity)를 고려하기 위해 VIF(Variation Inflation Factor) test를 적용하여 최종 7개의 변수를 선정하고 단계별 회귀방법(Stepwise regression)을 이용하여 GR4J의 매개변수별 회귀식을 생성하였다.

  • PDF

Categorical data analysis of sensory evaluation data with Hanwoo bull beef (한우 수소 고기 관능평가 데이터에 대한 범주형 자료 분석)

  • Lee, Hye-Jung;Cho, Soo-Hyun;Kim, Jae-Hee
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.5
    • /
    • pp.819-827
    • /
    • 2009
  • This study was conducted to investigate the relationship between the sociodemographic factors and the Korean consumers palatability evaluation grades with Hanwoo sensory evaluation data. The dichotomy logistic regression model and the multinomial logistic regression model are fitted with the independent variables such as the consumer living location, age, gender, occupation, monthly income, and beef cut and the the palatability grade as the dependent variable. Stepwise variable selection procedure is incorporated to find the final model and odds ratios are calculated to find the associations between categories.

  • PDF

Validation Comparison of Credit Rating Models for Categorized Financial Data (범주형 재무자료에 대한 신용평가모형 검증 비교)

  • Hong, Chong-Sun;Lee, Chang-Hyuk;Kim, Ji-Hun
    • Communications for Statistical Applications and Methods
    • /
    • v.15 no.4
    • /
    • pp.615-631
    • /
    • 2008
  • Current credit evaluation models based on only financial data except non-financial data are used continuous data and produce credit scores for the ranking. In this work, some problems of the credit evaluation models based on transformed continuous financial data are discussed and we propose improved credit evaluation models based on categorized financial data. After analyzing and comparing goodness-of-fit tests of two models, the availability of the credit evaluation models for categorized financial data is explained.

Lasso Regression of RNA-Seq Data based on Bootstrapping for Robust Feature Selection (안정적 유전자 특징 선택을 위한 유전자 발현량 데이터의 부트스트랩 기반 Lasso 회귀 분석)

  • Jo, Jeonghee;Yoon, Sungroh
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.9
    • /
    • pp.557-563
    • /
    • 2017
  • When large-scale gene expression data are analyzed using lasso regression, the estimation of regression coefficients may be unstable due to the highly correlated expression values between associated genes. This irregularity, in which the coefficients are reduced by L1 regularization, causes difficulty in variable selection. To address this problem, we propose a regression model which exploits the repetitive bootstrapping of gene expression values prior to lasso regression. The genes selected with high frequency were used to build each regression model. Our experimental results show that several genes were consistently selected in all regression models and we verified that these genes were not false positives. We also identified that the sign distribution of the regression coefficients of the selected genes from each model was correlated to the real dependent variables.

An Empirical Study on the Travel Behavior and Destination Choice according to the Family Life Cycle (가족생활주기에 따른 관광지 선택행동의 실증분석)

  • Sim, Sang-Wha;Kim, Wol-Ho
    • Korean Business Review
    • /
    • v.11
    • /
    • pp.149-171
    • /
    • 1998
  • The most important thing in the Tourist Market Segmentation is to find descriptive variables which can describe the changes of tourist demand properly. There are many descriptive variables. Among them, vital statistical variables were proved to be effective. The strongest variable but which was studied much less is the Family Life Cycle. This study will focus on the relation between Family Life Cycle and Travel Behavior of Destination Choice. In this study, I will verify the validity of Family Life Cycle as a descriptive variable of Tourist Market Segmentation, and try to find the meaningful variable at each steps. Therefore, The purpose of this study is to explain the relation between Family Life Cycle and Travel Behavior of Destination Choice, to verify the validity of Family Life Cycle as descriptive variable and to find the strategy to respond to the increase in quantity and diversity of quality of Tourist Market. The studies on the Family Life Cycle should be updated continuously according to the change of family structure and it should be understood as standard for Tourist Market Segmentation in the public and private sphere.

  • PDF