• 제목/요약/키워드: Selection Bias

검색결과 331건 처리시간 0.022초

On the Bias of Bootstrap Model Selection Criteria

  • Kee-Won Lee;Songyong Sim
    • Journal of the Korean Statistical Society
    • /
    • 제25권2호
    • /
    • pp.195-203
    • /
    • 1996
  • A bootstrap method is used to correct the apparent downward bias of a naive plug-in bootstrap model selection criterion, which is shown to enjoy a high degree of accuracy. Comparison of bootstrap method with the asymptotic method is made through an illustrative example.

  • PDF

변수선택 편향이 없는 회귀나무를 만들기 위한 알고리즘 (Regression Trees with. Unbiased Variable Selection)

  • 김진흠;김민호
    • 응용통계연구
    • /
    • 제17권3호
    • /
    • pp.459-473
    • /
    • 2004
  • 본 논문에서는 Breiman 등(1984)의 전체탐색법이 갖고 있는 변수선택 편향을 극복할 수 있는 알고리즘을 제안하였다. 제안한 알고리즘은 노드의 분리 변수를 선택하는 단계와 그 선택된 변수에 대해서만 이진분리를 위한 분리점을 찾는 단계로 나뉘어져 있다. 예측변수가 연속형 일 때는 스피어만의 순위상관계수에 의한 검정을 수행하고, 범주형일 때는 크루스칼-왈리스의 통계량에 의한 검정을 수행하여 통계적으로 가장 유의한 변수를 분리변수로 선택하였고 Breiman 등(1984)의 전체탐색법을 그 변수에만 적용하여 노드의 분리기준을 정하였다 모의실험 연구를 통해 Breiman등(19히)의 CART와 제안한 알고리즘을 변수선택 편의, 변수선택력파 평균제곱오차 측면에서 서로 비교하였다. 아울러 두 알고리즘을 실제 자료에 적용하여 효율을 서로 비교하였다.

모형 선택 기준들에 대한 LASSO 회귀 모형 편의의 영향 연구 (A study on bias effect of LASSO regression for model selection criteria)

  • 유동현
    • 응용통계연구
    • /
    • 제29권4호
    • /
    • pp.643-656
    • /
    • 2016
  • 고차원 자료(high dimensional data)는 변수의 수가 표본의 수보다 많은 자료로 다양한 분야에서 관측 또는 생성되고 있다. 일반적으로, 고차원 자료에 대한 회귀 모형에서는 모수의 추정과 과적합을 피하기 위하여 변수 선택이 이루어진다. 벌점화 회귀 모형(penalized regression model)은 변수 선택과 회귀 계수의 추정을 동시에 수행하는 장점으로 인하여 고차원 자료에 빈번하게 적용되고 있다. 하지만, 벌점화 회귀 모형에서도 여전히 조율 모수 선택(tuning parameter selection)을 통한 최적의 모형 선택이 요구된다. 본 논문에서는 벌점화 회귀 모형 중에서 대표적인 LASSO 회귀 모형을 기반으로 모형 선택의 기준들에 대한 LASSO 회귀 추정량의 편의가 어떠한 영향을 미치는지 모의실험을 통하여 수치적으로 연구하였고 편의의 보정의 필요성에 대하여 나타내었다. 실제 자료 분석에서의 영향을 나타내기 위하여, 폐암 환자의 유전자 발현량(gene expression) 자료를 기반으로 바이오마커 식별(biomarker identification) 문제에 적용하였다.

Selection of Data-adaptive Polynomial Order in Local Polynomial Nonparametric Regression

  • Jo, Jae-Keun
    • Communications for Statistical Applications and Methods
    • /
    • 제4권1호
    • /
    • pp.177-183
    • /
    • 1997
  • A data-adaptive order selection procedure is proposed for local polynomial nonparametric regression. For each given polynomial order, bias and variance are estimated and the adaptive polynomial order that has the smallest estimated mean squared error is selected locally at each location point. To estimate mean squared error, empirical bias estimate of Ruppert (1995) and local polynomial variance estimate of Ruppert, Wand, Wand, Holst and Hossjer (1995) are used. Since the proposed method does not require fitting polynomial model of order higher than the model order, it is simpler than the order selection method proposed by Fan and Gijbels (1995b).

  • PDF

Propensity Score Matching 방법을 이용한 간호중재 효과 평가 (The Use of Propensity Score Matching for Evaluation of the Effects of Nursing Interventions)

  • 이숙정;유지수;신미경;박창기;이현철;최은진
    • 대한간호학회지
    • /
    • 제37권3호
    • /
    • pp.414-421
    • /
    • 2007
  • Background: Nursing intervention studies often suffer from a selection bias introduced by failure of random assignment. Evaluation with selection bias could under or over-estimate any intervention's effects. PS matching (PSM) can reduce a selection bias through matching similar Propensity Scores (PS). PS is defined as the conditional probability of being treated given the individual's covariates and it can be reused to balance the covariates of two groups. Purpose: This study was done to assess the significance of PSM as an alternative evaluation method of nursing interventions. Method: An intervention study for patients with some baseline individual characteristic differences between two groups was used for this demonstration. The result of a t-test with PSM was compared with a t-test without matching. Results: The level of HbA1c at 12 months after baseline was different between the two groups in terms of matching or not. Conclusion: This study demonstrated the effects of a quasi-random assignment. Evaluation using PSM can reduce a selection bias impact that affects the result of the nursing intervention. Analyzing nursing research more objectively to reduce selection bias using PSM is needed.

Effects of preselection of genotyped animals on reliability and bias of genomic prediction in dairy cattle

  • Togashi, Kenji;Adachi, Kazunori;Kurogi, Kazuhito;Yasumori, Takanori;Tokunaka, Kouichi;Ogino, Atsushi;Miyazaki, Yoshiyuki;Watanabe, Toshio;Takahashi, Tsutomu;Moribe, Kimihiro
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제32권2호
    • /
    • pp.159-169
    • /
    • 2019
  • Objective: Models for genomic selection assume that the reference population is an unselected population. However, in practice, genotyped individuals, such as progeny-tested bulls, are highly selected, and the reference population is created after preselection. In dairy cattle, the intensity of selection is higher in males than in females, suggesting that cows can be added to the reference population with less bias and loss of accuracy. The objective is to develop formulas applied to any genomic prediction studies or practice with preselected animals as reference population. Methods: We developed formulas for calculating the reliability and bias of genomically enhanced breeding values (GEBV) in the reference population where individuals are preselected on estimated breeding values. Based on the formulas presented, deterministic simulation was conducted by varying heritability, preselection percentage, and the reference population size. Results: The number of bulls equal to a cow regarding the reliability of GEBV was expressed through a simple formula for the reference population consisting of preselected animals. The bull population was vastly superior to the cow population regarding the reliability of GEBV for low-heritability traits. However, the superiority of reliability from the bull reference population over the cow population decreased as heritability increased. Bias was greater for bulls than cows. Bias and reduction in reliability of GEBV due to preselection was alleviated by expanding reference population. Conclusion: Cows are easier in expanding reference population size compared with bulls and alleviate bias and reduction in reliability of GEBV of bulls which are highly preselected than cows by expanding the cow reference population.

Sensitivity analysis in Bayesian nonignorable selection model for binary responses

  • Choi, Seong Mi;Kim, Dal Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • 제25권1호
    • /
    • pp.187-194
    • /
    • 2014
  • We consider a Bayesian nonignorable selection model to accommodate the selection bias. Markov chain Monte Carlo methods is known to be very useful to fit the nonignorable selection model. However, sensitivity to prior assumptions on parameters for selection mechanism is a potential problem. To quantify the sensitivity to prior assumption, the deviance information criterion and the conditional predictive ordinate are used to compare the goodness-of-fit under two different prior specifications. It turns out that the 'MLE' prior gives better fit than the 'uniform' prior in viewpoints of goodness-of-fit measures.

A Study on Split Variable Selection Using Transformation of Variables in Decision Trees

  • Chung, Sung-S.;Lee, Ki-H.;Lee, Seung-S.
    • Journal of the Korean Data and Information Science Society
    • /
    • 제16권2호
    • /
    • pp.195-205
    • /
    • 2005
  • In decision tree analysis, C4.5 and CART algorithm have some problems of computational complexity and bias on variable selection. But QUEST algorithm solves these problems by dividing the step of variable selection and split point selection. When input variables are continuous, QUEST algorithm uses ANOVA F-test under the assumption of normality and homogeneity of variances. In this paper, we investigate the influence of violation of normality assumption and effect of the transformation of variables in the QUEST algorithm. In the simulation study, we obtained the empirical powers of variable selection and the empirical bias of variable selection after transformation of variables having various type of underlying distributions.

  • PDF

의사결정나무에서 분리 변수 선택에 관한 연구 (A Study on Selection of Split Variable in Constructing Classification Tree)

  • 정성석;김순영;임한필
    • 응용통계연구
    • /
    • 제17권2호
    • /
    • pp.347-357
    • /
    • 2004
  • 의사결정나무에서 분리 변수를 선택하는 것은 매우 중요한 일이다. C4.5는 변수 선택에 있어 연속형 변수로의 변수 선택 편의가 심각하고, QUEST는 연속형 변수와 관련해서 정규성 가정이 위반될 경우 변수 선택력이 떨어진다. 본 논문에서는 통계적 로버스트 검정 알고리즘을 제안하고, 모의 실험을 통하여 C4.5, QUEST그러고 제안된 알고리즘의 효율성을 비교하였다. 실험 결과 제안된 알고리즘이 변수 선택 편의와 변수 선택력 측면에서 로버스트함을 알 수 있었다.

A Study on Bias Effect on Model Selection Criteria in Graphical Lasso

  • Choi, Young-Geun;Jeong, Seyoung;Yu, Donghyeon
    • Quantitative Bio-Science
    • /
    • 제37권2호
    • /
    • pp.133-141
    • /
    • 2018
  • Graphical lasso is one of the most popular methods to estimate a sparse precision matrix, which is an inverse of a covariance matrix. The objective function of graphical lasso imposes an ${\ell}_1$-penalty on the (vectorized) precision matrix, where a tuning parameter controls the strength of the penalization. The selection of the tuning parameter is practically and theoretically important since the performance of the estimation depends on an appropriate choice of tuning parameter. While information criteria (e.g. AIC, BIC, or extended BIC) have been widely used, they require an asymptotically unbiased estimator to select optimal tuning parameter. Thus, the biasedness of the ${\ell}_1$-regularized estimate in the graphical lasso may lead to a suboptimal tuning. In this paper, we propose a two-staged bias-correction procedure for the graphical lasso, where the first stage runs the usual graphical lasso and the second stage reruns the procedure with an additional constraint that zero estimates at the first stage remain zero. Our simulation and real data example show that the proposed bias correction improved on both edge recovery and estimation error compared to the single-staged graphical lasso.