• Title/Summary/Keyword: Variable selection bias

Search Result 40, Processing Time 0.027 seconds

An Analysis of Job Selection, Major-Job Match and Wage Level of College Graduates (대학 졸업생의 직업선택과 임금 수준)

  • Park, Jae-Min
    • Journal of Korea Technology Innovation Society
    • /
    • v.14 no.1
    • /
    • pp.22-39
    • /
    • 2011
  • This study examines the wage level from a viewpoint of major-job match as part of an analysis on the skill mismatch problem in 4-year college graduates. The empirical analysis explicitly incorporate the sample selection bias as an econometric problem not only suggested but merely introduced in the earlier studies. This study also set up a major-job match variable, which was usually handled as a binary variable for analytical convenience, as a polychotomous choice variable in selection equation as provided by the survey. In particular, it considered multi-cohort survey on graduates of the years 1982, 1992, and 2002 for the empirical analysis. As a result of empirical analysis, the wage premium of a major-job match was identified. This result was consistent after the consideration of a sample selection bias and also after modeling the major-job match variable as polychotomously selective. Through an analysis classified by the major, this study identified a relatively high wage premium among Social Science, Engineering, and Science majors. However, there was a difference in the effect of selection among these majors. Also, by assessing cohort effects this study found that the skill mismatch had rapidly progressed in 1992, while difference between 1992 and 2002 cohorts are insignificant. The analysis suggests that wage level is better understood within the context of both sample selection and major-job match, and regardless of model specification the major-job match affects wage strongly.

  • PDF

A study on bias effect of LASSO regression for model selection criteria (모형 선택 기준들에 대한 LASSO 회귀 모형 편의의 영향 연구)

  • Yu, Donghyeon
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.4
    • /
    • pp.643-656
    • /
    • 2016
  • High dimensional data are frequently encountered in various fields where the number of variables is greater than the number of samples. It is usually necessary to select variables to estimate regression coefficients and avoid overfitting in high dimensional data. A penalized regression model simultaneously obtains variable selection and estimation of coefficients which makes them frequently used for high dimensional data. However, the penalized regression model also needs to select the optimal model by choosing a tuning parameter based on the model selection criterion. This study deals with the bias effect of LASSO regression for model selection criteria. We numerically describes the bias effect to the model selection criteria and apply the proposed correction to the identification of biomarkers for lung cancer based on gene expression data.

A two-step approach for variable selection in linear regression with measurement error

  • Song, Jiyeon;Shin, Seung Jun
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.1
    • /
    • pp.47-55
    • /
    • 2019
  • It is important to identify informative variables in high dimensional data analysis; however, it becomes a challenging task when covariates are contaminated by measurement error due to the bias induced by measurement error. In this article, we present a two-step approach for variable selection in the presence of measurement error. In the first step, we directly select important variables from the contaminated covariates as if there is no measurement error. We then apply, in the following step, orthogonal regression to obtain the unbiased estimates of regression coefficients identified in the previous step. In addition, we propose a modification of the two-step approach to further enhance the variable selection performance. Various simulation studies demonstrate the promising performance of the proposed method.

Impact of Diverse Configuration in Multivariate Bias Correction Methods on Large-Scale Climate Variable Simulations under Climate Change

  • de Padua, Victor Mikael N.;Ahn Kuk-Hyun
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.161-161
    • /
    • 2023
  • Bias correction of values is a necessary step in downscaling coarse and systematically biased global climate models for use in local climate change impact studies. In addition to univariate bias correction methods, many multivariate methods which correct multiple variables jointly - each with their own mathematical designs - have been developed recently. While some literature have focused on the inter-comparison of these multivariate bias correction methods, none have focused extensively on the effect of diverse configurations (i.e., different combinations of input variables to be corrected) of climate variables, particularly high-dimensional ones, on the ability of the different methods to remove biases in uni- and multivariate statistics. This study evaluates the impact of three configurations (inter-variable, inter-spatial, and full dimensional dependence configurations) on four state-of-the-art multivariate bias correction methods in a national-scale domain over South Korea using a gridded approach. An inter-comparison framework evaluating the performance of the different combinations of configurations and bias correction methods in adjusting various climate variable statistics was created. Precipitation, maximum, and minimum temperatures were corrected across 306 high-resolution (0.2°) grid cells and were evaluated. Results show improvements in most methods in correcting various statistics when implementing high-dimensional configurations. However, some instabilities were observed, likely tied to the mathematical designs of the methods, informing that some multivariate bias correction methods are incompatible with high-dimensional configurations highlighting the potential for further improvements in the field, as well as the importance of proper selection of the correction method specific to the needs of the user.

  • PDF

Time-Balanced Quota Sampling for Telephone Survey (전화조사를 위한 시간균형할당표본추출)

  • Huh, Myung-Hoe;Hwang, Jin-Mo
    • Survey Research
    • /
    • v.7 no.2
    • /
    • pp.39-52
    • /
    • 2006
  • Most of Korean survey institutions adopt quota sampling for telephone surveys based on region, gender and age-band. In weekdays, it is well blown that there exist substantial differences in day time in-house rate by individual's socio-demographic attributes. So, quota sampling may induce systematic respondent selection bias. To solve the problem, we propose "time-balanced quota sampling" in which interviewer's call time-band is added as an quota variable. Furthermore, we propose "time-balanced quasi-quota sampling" which is derived by partially relaxing evening time quotas in time-balanced quota sampling. We compare the conventional and the newly proposed quota sampling schemes by drawing Monte Carlo samples from the hypothetical population for which the Korea 2004 time use survey data is assumed.

  • PDF

Covariate selection criteria for controlling confounding bias in a causal study (인과연구에서 중첩편향을 제거하기 위한 공변량선택기준)

  • Thepepomma, Seethad;Kim, Ji-Hyun
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.5
    • /
    • pp.849-858
    • /
    • 2016
  • It is important to control confounding bias when estimating the causal effect of treatment in an observational study. We illustrated that the covariate selection in the causal inference is different from the variable selection in the ANCOVA model. We then investigated the three criteria of covariate selection for controlling confounding bias, which can be used when we have inadequate information to draw a complete causal graph. VanderWeele and Shpitser (2011) proposed one of them and claimed it was better than the other two. We show by example that their criterion also has limitations and some disadvantages. There is no clear winner; however, their criterion is better (if some correction is made on its condition) than the other two because it can remove the confounding bias.

A Study on Determinants Affecting At-home Laver Consumption Expenditures : Type II Tobit Model Treating Sample Selection Bias (김 가정 소비 지출의 결정 요인 분석 : 선택 편의를 고려한 Type II 토빗 모형을 이용하여)

  • Lee, Min-Kyu;Park, Eun-Young
    • The Journal of Fisheries Business Administration
    • /
    • v.40 no.3
    • /
    • pp.147-167
    • /
    • 2009
  • The objective of this study is to analyze the determinants of at-home laver consumption expenditures using the data from a survey of households implemented in 2009. It happened that non-response ratios of monthly expenditures on dry laver and flavored laver among sampled households are 18.8% and 25.6%. Accordingly, this study tries to analyze the determinants affecting at-home laver consumption expenditures by using type II tobit model, one of sample selection models, to deal with sample selection bias caused from non-response data. Analysis results show the age variable positively affects expenditures on dry laver but negatively contributes to expenditures on flavored laver. In addition, the household size, the household's income, the degree of preference for laver have positive relationships with both expenditures. Household size elasticity and income elasticity of the expenditure on dry laver are estimated as 0.220 and 0.251. In the case of flavored laver, these elasticities are estimated as 0.484 and 0.261. Such analysis results can provide information on division of the at-home laver consumption market into groups with high willingness to expense and implementation of detailed marketing strategies to increase at-home laver consumption. The methodology of this study can be applied to consumer preference analysis on other marine products and other analyses on sample with non-response data in the fishery research.

  • PDF

Nearest-neighbor Rule based Prototype Selection Method and Performance Evaluation using Bias-Variance Analysis (최근접 이웃 규칙 기반 프로토타입 선택과 편의-분산을 이용한 성능 평가)

  • Shim, Se-Yong;Hwang, Doo-Sung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.10
    • /
    • pp.73-81
    • /
    • 2015
  • The paper proposes a prototype selection method and evaluates the generalization performance of standard algorithms and prototype based classification learning. The proposed prototype classifier defines multidimensional spheres with variable radii within class areas and generates a small set of training data. The nearest-neighbor classifier uses the new training set for predicting the class of test data. By decomposing bias and variance of the mean expected error value, we compare the generalization errors of k-nearest neighbor, Bayesian classifier, prototype selection using fixed radius and the proposed prototype selection method. In experiments, the bias-variance changing trends of the proposed prototype classifier are similar to those of nearest neighbor classifiers with all training data and the prototype selection rates are under 27.0% on average.

Tree-structured Clustering for Mixed Data (혼합형 데이터에 대한 나무형 군집화)

  • Yang Kyung-Sook;Huh Myung-Hoe
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.2
    • /
    • pp.271-282
    • /
    • 2006
  • The aim of this study is to propose a tree-structured clustering for mixed data. We suggest a scaling method to reduce the variable selection bias among categorical variables. In numerical examples such as credit data, German credit data, we note several differences between tree-structured clustering and K-means clustering.

Variable Selection Theorem for the Analysis of Covariance Model (공분산분석 모형에서의 변수선택 정리)

  • Yoon, Sang-Hoo;Park, Jeong-Soo
    • Communications for Statistical Applications and Methods
    • /
    • v.15 no.3
    • /
    • pp.333-342
    • /
    • 2008
  • Variable selection theorem in the linear regression model is extended to the analysis of covariance model. When some of regression variables are omitted from the model, it reduces the variance of the estimators but introduces bias. Thus an appropriate balance between a biased model and one with large variances is recommended.