• Title/Summary/Keyword: Variable selection bias

Search Result 40, Processing Time 0.029 seconds

Strengthening Causal Inference in Studies using Non-experimental Data: An Application of Propensity Score and Instrumental Variable Methods (비실험자료를 이용한 연구에서 인과적 추론의 강화: 성향점수와 도구변수 방법의 적용)

  • Kim, Myoung-Hee;Do, Young-Kyung
    • Journal of Preventive Medicine and Public Health
    • /
    • v.40 no.6
    • /
    • pp.495-504
    • /
    • 2007
  • Objectives : This study attempts to show how studies using non-experimental data can strengthen causal inferences by applying propensity score and instrumental variable methods based on the counterfactual framework. For illustrative purposes, we examine the effect of having private health insurance on the probability of experiencing at least one hospital admission in the previous year. Methods : Using data from the 4th wave of the Korea Labor and Income Panel Study, we compared the results obtained using propensity score and instrumental variable methods with those from conventional logistic and linear regression models, respectively. Results : While conventional multiple regression analyses fail to identify the effect, the results estimated using propensity score and instrumental variable methods suggest that having private health insurance has positive and statistically significant effects on hospital admission. Conclusions : This study demonstrates that propensity score and instrumental variable methods provide potentially useful alternatives to conventional regression approaches in making causal inferences using non-experimental data.

A study on equating method based on regression analysis (회귀분석에 기초한 균등화 방법에 관한 연구)

  • Cho, Jang-Sik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.3
    • /
    • pp.513-521
    • /
    • 2010
  • Most of universities have carried out course evaluation to apply the performance appraisal for professor. But, course evaluation depends on characteristics of each class such as class size, type of lecture, evaluator's grade and so on. As the results, such characteristics of each class lead to serious bias which makes lecturers distrust the course evaluation results. Hence, we propose a equating method for the course evaluation by regression analysis which use stepwise variable selection. And we compare proposed method with the other method by Cho et al. (2009) with respect to efficiencies. Also we give the example to which the method is applied.

Fast Training of Structured SVM Using Fixed-Threshold Sequential Minimal Optimization

  • Lee, Chang-Ki;Jang, Myung-Gil
    • ETRI Journal
    • /
    • v.31 no.2
    • /
    • pp.121-128
    • /
    • 2009
  • In this paper, we describe a fixed-threshold sequential minimal optimization (FSMO) for structured SVM problems. FSMO is conceptually simple, easy to implement, and faster than the standard support vector machine (SVM) training algorithms for structured SVM problems. Because FSMO uses the fact that the formulation of structured SVM has no bias (that is, the threshold b is fixed at zero), FSMO breaks down the quadratic programming (QP) problems of structured SVM into a series of smallest QP problems, each involving only one variable. By involving only one variable, FSMO is advantageous in that each QP sub-problem does not need subset selection. For the various test sets, FSMO is as accurate as an existing structured SVM implementation (SVM-Struct) but is much faster on large data sets. The training time of FSMO empirically scales between O(n) and O($n^{1.2}$), while SVM-Struct scales between O($n^{1.5}$) and O($n^{1.8}$).

  • PDF

Penalized quantile regression tree (벌점화 분위수 회귀나무모형에 대한 연구)

  • Kim, Jaeoh;Cho, HyungJun;Bang, Sungwan
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.7
    • /
    • pp.1361-1371
    • /
    • 2016
  • Quantile regression provides a variety of useful statistical information to examine how covariates influence the conditional quantile functions of a response variable. However, traditional quantile regression (which assume a linear model) is not appropriate when the relationship between the response and the covariates is a nonlinear. It is also necessary to conduct variable selection for high dimensional data or strongly correlated covariates. In this paper, we propose a penalized quantile regression tree model. The split rule of the proposed method is based on residual analysis, which has a negligible bias to select a split variable and reasonable computational cost. A simulation study and real data analysis are presented to demonstrate the satisfactory performance and usefulness of the proposed method.

Multivariate quantile regression tree (다변량 분위수 회귀나무 모형에 대한 연구)

  • Kim, Jaeoh;Cho, HyungJun;Bang, Sungwan
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.3
    • /
    • pp.533-545
    • /
    • 2017
  • Quantile regression models provide a variety of useful statistical information by estimating the conditional quantile function of the response variable. However, the traditional linear quantile regression model can lead to the distorted and incorrect results when analysing real data having a nonlinear relationship between the explanatory variables and the response variables. Furthermore, as the complexity of the data increases, it is required to analyse multiple response variables simultaneously with more sophisticated interpretations. For such reasons, we propose a multivariate quantile regression tree model. In this paper, a new split variable selection algorithm is suggested for a multivariate regression tree model. This algorithm can select the split variable more accurately than the previous method without significant selection bias. We investigate the performance of our proposed method with both simulation and real data studies.

A Study on Selecting Principle Component Variables Using Adaptive Correlation (적응적 상관도를 이용한 주성분 변수 선정에 관한 연구)

  • Ko, Myung-Sook
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.3
    • /
    • pp.79-84
    • /
    • 2021
  • A feature extraction method capable of reflecting features well while mainaining the properties of data is required in order to process high-dimensional data. The principal component analysis method that converts high-level data into low-dimensional data and express high-dimensional data with fewer variables than the original data is a representative method for feature extraction of data. In this study, we propose a principal component analysis method based on adaptive correlation when selecting principal component variables in principal component analysis for data feature extraction when the data is high-dimensional. The proposed method analyzes the principal components of the data by adaptively reflecting the correlation based on the correlation between the input data. I want to exclude them from the candidate list. It is intended to analyze the principal component hierarchy by the eigen-vector coefficient value, to prevent the selection of the principal component with a low hierarchy, and to minimize the occurrence of data duplication inducing data bias through correlation analysis. Through this, we propose a method of selecting a well-presented principal component variable that represents the characteristics of actual data by reducing the influence of data bias when selecting the principal component variable.

Estimating the Intergenerational Income Mobility in Korea (한국의 세대 간 소득이동성 추정)

  • Yang, Jung-Seung
    • Journal of Labour Economics
    • /
    • v.35 no.2
    • /
    • pp.79-115
    • /
    • 2012
  • In the study, we try to get reliable estimates of intergenerational income mobility in Korea. At first, we show that the low estimates of previous studies are mainly due to sample selection problem. The direct estimations using OLS after correcting this problem show higher values than previous estimates. We also compute the attenuation bias by decomposing the variances of earnings into the variances of permanent and transitory components of earnings by the results of the regression. Additionally, we try to estimate the range of intergenerational mobility by comparing the OLS results with the results of the two samples instrumental variable estimation and the three samples instrumental variable estimation. The results of these estimations are a little higher than or similar to OLS results.

  • PDF

Performance study of propensity score methods against regression with covariate adjustment

  • Park, Jincheol
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.1
    • /
    • pp.217-227
    • /
    • 2015
  • In observational study, handling confounders is a primary issue in measuring treatment effect of interest. Historically, a regression with covariate adjustment (covariate-adjusted regression) has been the typical approach to estimate treatment effect incorporating potential confounders into model. However, ever since the introduction of the propensity score, covariate-adjusted regression has been gradually replaced in medical literatures with various balancing methods based on propensity score. On the other hand, there is only a paucity of researches assessing propensity score methods compared with the covariate-adjusted regression. This paper examined the performance of propensity score methods in estimating risk difference and compare their performance with the covariate-adjusted regression by a Monte Carlo study. The study demonstrated in general the covariate-adjusted regression with variable selection procedure outperformed propensity-score-based methods in terms both of bias and MSE, suggesting that the classical regression method needs to be considered, rather than the propensity score methods, if a performance is a primary concern.

Analyzing the Impact of Emission Control Area (ECA) Enforcement on Ferry Companies' Financial Performance : Network SBM DEA and BTR model (배출규제해역(ECA) 시행이 페리 선사의 재무성과에 미치는 영향: Network SBM DEA 및 BTR 모형 분석)

  • Lee, Suhyung;Lim, Hyunwoo
    • Journal of Korea Port Economic Association
    • /
    • v.38 no.3
    • /
    • pp.29-51
    • /
    • 2022
  • The International Maritime Organization (IMO) designated the Emission Control Area (ECA) in Northern Europe to reduce the NOx and SOx emissions from ships in the coastal areas. This study used Network slack-based measure (SBM) Data Envelopment Model (DEM) and Bootstrop Truncated Regression (BTR) model to analyze the ECA's impact on ferry companies' financial performances based on the financial data from eight ferry carriers in Northern Europe, the Mediterranean and North America from 2004 to 2017. To alleviate the problem of arbitrary variable selection in DEA, the variable selection criteria proposed by Dyson et al. (2001) were applied; the size of the company was considered through the Network SBM DEA model; and the company's profit-generating process was divided into stages to measure financial performance in more detail. In addition, the BTR model was applied to derive results that minimize the bias of the data. The study found that ECA regulations did not always negatively affect the shipping companies' financial performance. Rather, a steady increase in efficiency was observed for Northern European ferry companies which were subject to the strongest regulations. For North American ferry companies, government subsidies were found to have a significant impact on efficiency, and relatively small impact on efficiency due to the ECA and oil prices. For the Mediterranean ferry companies, efficiency values have decreased since the implementation of ECA regulation despite the lowest level of regulation in the region.

Estimation of Cut-off Stratum in the Highly Skewed Population (왜도가 심한 모집단의 절사층 추정)

  • 한근식
    • Survey Research
    • /
    • v.5 no.1
    • /
    • pp.93-101
    • /
    • 2004
  • In business survey, cut-off sampling is usual, The contribution from cut-off part of the population is at least small in comparison with the remaining population. In this case, part of the target population is excluded from the selection and parameter estimations are only based on Take-all and Take-some stratum. It may be tempting not to use resources on enterprises that contribute little to the overall results of the survey. And this reduces the response burden for these small enterprises. But, the size of cut-off stratum has been increased as a way to manage reduced budgets. This leads to additional bias. In this study, the population have been separated as three stratum, cut -off, take-some, take-all, and we will estimate cut-off part using auxiliary variable.

  • PDF