• 제목/요약/키워드: Variable selection

검색결과 882건 처리시간 0.023초

A two-step approach for variable selection in linear regression with measurement error

  • Song, Jiyeon;Shin, Seung Jun
    • Communications for Statistical Applications and Methods
    • /
    • 제26권1호
    • /
    • pp.47-55
    • /
    • 2019
  • It is important to identify informative variables in high dimensional data analysis; however, it becomes a challenging task when covariates are contaminated by measurement error due to the bias induced by measurement error. In this article, we present a two-step approach for variable selection in the presence of measurement error. In the first step, we directly select important variables from the contaminated covariates as if there is no measurement error. We then apply, in the following step, orthogonal regression to obtain the unbiased estimates of regression coefficients identified in the previous step. In addition, we propose a modification of the two-step approach to further enhance the variable selection performance. Various simulation studies demonstrate the promising performance of the proposed method.

가변적 웨딩드레스 디자인 개발을 위한 연구 (A Study for the Development of a Variable Wedding Dress Design)

  • 전미진;문선정;정삼호
    • 한국의류산업학회지
    • /
    • 제15권5호
    • /
    • pp.694-703
    • /
    • 2013
  • A variable dress design can be an alternative to satisfy a consumer need for diverse expression and self-realization at a lower cost factor. In the area of wedding dress, the change in the trend of wedding culture (which tends to demand more units of wedding dress) makes the cost factor more important in the purchase selection. A variable design has a clear advantage for wedding dresses and the wedding industry. This is the first research on a variable design that focuses on wedding dresses. This research develops a variable wedding design which respects consumer preferences independent of a variable wedding dress design that presents a new shape of silhouette or the development ofa new wedding dress materials. A survey on the supply side was conducted to examine market preferences by first browsing the Naver portal site and then checking the websites of major wedding dress suppliers. A questionnaire survey was conducted with a sample of 348 brides-to-be that inquired on wedding dress selection factors and purchase patterns. The survey shows that consumers prefer mermaid and A-line silhouettes, silk material, white-ivory color, and tube top necklines. The result conforms to the types commonly found in the designs of suppliers. We apply a detachable design to a basic mermaid silhouette and implemented change for 7 kinds of styles -based on the result of the survey. We suggest a variable wedding dress design as a new means to solve the cost concern and the customer need for diverse expression. The research represents a new life style for wedding culture and facilitates the development of the wedding industry.

평균-분산 가속화 실패시간 모형에서 벌점화 변수선택 (Penalized variable selection in mean-variance accelerated failure time models)

  • 권지훈;하일도
    • 응용통계연구
    • /
    • 제34권3호
    • /
    • pp.411-425
    • /
    • 2021
  • 가속화 실패시간모형은 로그 생존시간과 공변량간의 선형적 관계를 묘사해 준다. 가속화 실패시간모형에서 생존시간의 평균뿐만 아니라 변동성에도 영향을 미치는 공변량 효과를 추론하는 것은 흥미가 있다. 이를 위해 생존시간의 평균뿐만 아니라 분산을 모형화 하는 것이 필요하며, 이러한 모형을 평균-분산 가속화 실패시간모형이라 부른다. 본 논문에서는 벌점 가능도함수를 이용하여 평균-분산 가속화 실패시간모형에서 회귀모수에 대한 변수선택 절차를 제안한다. 여기서 벌점함수로서 LASSO, ALASSO, SCAD 그리고 HL (계층가능도)와 같은 네 가지 벌점함수를 연구한다. 제안된 변수선택 절차를 통해 중요한 공변량의 선택 뿐만 아니라 회귀모수의 추정을 동시에 제공할 수 있다. 제안된 방법의 성능은 모의실험을 통해 평가하고, 하나의 임상 예제자료를 통해 제안된 방법을 예증하고자 한다.

Genomic Selection for Adjacent Genetic Markers of Yorkshire Pigs Using Regularized Regression Approaches

  • Park, Minsu;Kim, Tae-Hun;Cho, Eun-Seok;Kim, Heebal;Oh, Hee-Seok
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제27권12호
    • /
    • pp.1678-1683
    • /
    • 2014
  • This study considers a problem of genomic selection (GS) for adjacent genetic markers of Yorkshire pigs which are typically correlated. The GS has been widely used to efficiently estimate target variables such as molecular breeding values using markers across the entire genome. Recently, GS has been applied to animals as well as plants, especially to pigs. For efficient selection of variables with specific traits in pig breeding, it is required that any such variable selection retains some properties: i) it produces a simple model by identifying insignificant variables; ii) it improves the accuracy of the prediction of future data; and iii) it is feasible to handle high-dimensional data in which the number of variables is larger than the number of observations. In this paper, we applied several variable selection methods including least absolute shrinkage and selection operator (LASSO), fused LASSO and elastic net to data with 47K single nucleotide polymorphisms and litter size for 519 observed sows. Based on experiments, we observed that the fused LASSO outperforms other approaches.

데이터마이닝 패키지에서 변수선택 편의에 관한 연구 (A Study on Variable Selection Bias in Data Mining Software Packages)

  • 송문섭;윤영주
    • 응용통계연구
    • /
    • 제14권2호
    • /
    • pp.475-486
    • /
    • 2001
  • 데이터마이닝 패키지에 구현된 분류나무 알고리즘 가운데 CART, CHAID, QUEST, C4.5에서 변수 선택법을 비교하였다. CART의 전체탐색법이 편의를 갖는다는 사실은 잘알려졌으며, 여기서는 상품화된 패키지들에서 이들 알고리즘의 편의와 선택력을 모의실험 연구를 통하여 비교하였다. 상용 패키지로는 CART, Enterprise Miner, AnswerTree, Clementine을 사용하였다. 본 논문의 제한된 모의실험 연구 결과에 의하면 C4.5와 CART는 모두 변수선택에서 심각한 편의를 갖고 있으며, CHAID와 QUEST는 비교적 안정된 결과를 보여주고 있었다.

  • PDF

Variable selection for multiclassi cation by LS-SVM

  • Hwang, Hyung-Tae
    • Journal of the Korean Data and Information Science Society
    • /
    • 제21권5호
    • /
    • pp.959-965
    • /
    • 2010
  • For multiclassification, it is often the case that some variables are not important while some variables are more important than others. We propose a novel algorithm for selecting such relevant variables for multiclassification. This algorithm is base on multiclass least squares support vector machine (LS-SVM), which uses results of multiclass LS-SVM using one-vs-all method. Experimental results are then presented which indicate the performance of the proposed method.

A Penalized Principal Components using Probabilistic PCA

  • Park, Chong-Sun;Wang, Morgan
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2003년도 춘계 학술발표회 논문집
    • /
    • pp.151-156
    • /
    • 2003
  • Variable selection algorithm for principal component analysis using penalized likelihood method is proposed. We will adopt a probabilistic principal component idea to utilize likelihood function for the problem and use HARD penalty function to force coefficients of any irrelevant variables for each component to zero. Consistency and sparsity of coefficient estimates will be provided with results of small simulated and illustrative real examples.

  • PDF

Interval Regression Models Using Variable Selection

  • Choi Seung-Hoe
    • Communications for Statistical Applications and Methods
    • /
    • 제13권1호
    • /
    • pp.125-134
    • /
    • 2006
  • This study confirms that the regression model of endpoint of interval outputs is not identical with that of the other endpoint of interval outputs in interval regression models proposed by Tanaka et al. (1987) and constructs interval regression models using the best regression model given by variable selection. Also, this paper suggests a method to minimize the sum of lengths of a symmetric difference among observed and predicted interval outputs in order to estimate interval regression coefficients in the proposed model. Some examples show that the interval regression model proposed in this study is more accuracy than that introduced by Inuiguchi et al. (2001).

Laplace-Metropolis알고리즘에 의한 다항로짓모형의 변수선택에 관한 연구 (Laplace-Metropolis Algorithm for Variable Selection in Multinomial Logit Model)

  • 김혜중;이애경
    • 품질경영학회지
    • /
    • 제29권1호
    • /
    • pp.11-23
    • /
    • 2001
  • This paper is concerned with suggesting a Bayesian method for variable selection in multinomial logit model. It is based upon an optimal rule suggested by use of Bayes rule which minimizes a risk induced by selecting the multinomial logit model. The rule is to find a subset of variables that maximizes the marginal likelihood of the model. We also propose a Laplace-Metropolis algorithm intended to suggest a simple method forestimating the marginal likelihood of the model. Based upon two examples, artificial data and empirical data examples, the Bayesian method is illustrated and its efficiency is examined.

  • PDF

Bayesian Variable Selection in the Proportional Hazard Model

  • Lee, Kyeong-Eun
    • Journal of the Korean Data and Information Science Society
    • /
    • 제15권3호
    • /
    • pp.605-616
    • /
    • 2004
  • In this paper we consider the proportional hazard models for survival analysis in the microarray data. For a given vector of response values and gene expressions (covariates), we address the issue of how to reduce the dimension by selecting the significant genes. In our approach, rather than fixing the number of selected genes, we will assign a prior distribution to this number. To implement our methodology, we use a Markov Chain Monte Carlo (MCMC) method.

  • PDF