• 제목/요약/키워드: regression variable selection procedures

검색결과 11건 처리시간 0.022초

Unified methods for variable selection and outlier detection in a linear regression

  • Seo, Han Son
    • Communications for Statistical Applications and Methods
    • /
    • 제26권6호
    • /
    • pp.575-582
    • /
    • 2019
  • The problem of selecting variables in the presence of outliers is considered. Variable selection and outlier detection are not separable problems because each observation affects the fitted regression equation differently and has a different influence on each variable. We suggest a simultaneous method for variable selection and outlier detection in a linear regression model. The suggested procedure uses a sequential method to detect outliers and uses all possible subset regressions for model selections. A simplified version of the procedure is also proposed to reduce the computational burden. The procedures are compared to other variable selection methods using real data sets known to contain outliers. Examples show that the proposed procedures are effective and superior to robust algorithms in selecting the best model.

부분선형모형에서 LARS를 이용한 변수선택 (Variable selection in partial linear regression using the least angle regression)

  • 서한손;윤민;이학배
    • 응용통계연구
    • /
    • 제34권6호
    • /
    • pp.937-944
    • /
    • 2021
  • 본 연구는 부분선형모형에서 변수선택의 문제를 다룬다. 부분선형모형은 평활화모수 추정과 같은 비모수 추정과 선형설명변수에 대한 추정의 문제를 함께 포함하고 있어 변수선택이 쉽지 않다. 본 연구에서는 빠른 전진선택법인 LARS 를 이용한 변수선택법을 제시한다. 제안된 방법은 LARS에 의하여 선별된 변수들에 대하여 t-검정, 가능한 모든 회귀모형 비교 또는 단계별 선택법을 적용한다. 제안된 방법들의 효율성을 비교하기 위하여 실제데이터에 적용한 예제와 모의실험 결과가 제시된다.

Variable selection in Poisson HGLMs using h-likelihoood

  • Ha, Il Do;Cho, Geon-Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • 제26권6호
    • /
    • pp.1513-1521
    • /
    • 2015
  • Selecting relevant variables for a statistical model is very important in regression analysis. Recently, variable selection methods using a penalized likelihood have been widely studied in various regression models. The main advantage of these methods is that they select important variables and estimate the regression coefficients of the covariates, simultaneously. In this paper, we propose a simple procedure based on a penalized h-likelihood (HL) for variable selection in Poisson hierarchical generalized linear models (HGLMs) for correlated count data. For this we consider three penalty functions (LASSO, SCAD and HL), and derive the corresponding variable-selection procedures. The proposed method is illustrated using a practical example.

회귀변수 선택절차를 이용한 인터넷통신 네트워크 품질특성과 고객만족도의 관계 실증분석 (Empirical Analysis of Relationship between Internet Communication Network Quality Characteristics and Customer Satisfaction using Regression Variable Selection Procedures)

  • 박성민;박영준
    • 산업공학
    • /
    • 제18권3호
    • /
    • pp.253-267
    • /
    • 2005
  • Customer satisfaction becomes one of the important managerial concerns associated with corporate competency in current competitive environment for Internet communication service companies. Hence, it is demanding to improve a company's customer satisfaction through the total quality management perspective. In practice, engineers as well as the management hope to find major quality characteristics with Internet communication network that is closely related to customer satisfaction, consequently aiming to the raise of their company's customer satisfaction. This paper presents an empirical relationship analysis between network quality characteristics and customer satisfaction on Internet communication. Methodologically, the relationship analysis framework is based on the regression variable selection procedures. In this framework, it is implemented that; 1) iterative model building; and 2) consistent criteria application to statistical tests for selecting significant variables. A case study shows that; 1) the customer satisfaction on the network connection seems to be more closely related to the network quality characteristics compared with the customer satisfaction on the network speed; and 2) the download disconnection rate has relatively evident relationship with the customer satisfaction on the network connection.

Logistic Regression Classification by Principal Component Selection

  • Kim, Kiho;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • 제21권1호
    • /
    • pp.61-68
    • /
    • 2014
  • We propose binary classification methods by modifying logistic regression classification. We use variable selection procedures instead of original variables to select the principal components. We describe the resulting classifiers and discuss their properties. The performance of our proposals are illustrated numerically and compared with other existing classification methods using synthetic and real datasets.

Principal Component Regression by Principal Component Selection

  • Lee, Hosung;Park, Yun Mi;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • 제22권2호
    • /
    • pp.173-180
    • /
    • 2015
  • We propose a selection procedure of principal components in principal component regression. Our method selects principal components using variable selection procedures instead of a small subset of major principal components in principal component regression. Our procedure consists of two steps to improve estimation and prediction. First, we reduce the number of principal components using the conventional principal component regression to yield the set of candidate principal components and then select principal components among the candidate set using sparse regression techniques. The performance of our proposals is demonstrated numerically and compared with the typical dimension reduction approaches (including principal component regression and partial least square regression) using synthetic and real datasets.

회귀변수선택절차를 이용한 인터넷통신 네트워크 품질특성과 고객만족도와의 관계 실증분석 (Empirical analysis of relationship between Internet communication network quality characteristics and customer satisfaction using regression variable selection procedures)

  • 박성민;박영준
    • 한국경영과학회:학술대회논문집
    • /
    • 한국경영과학회/대한산업공학회 2005년도 춘계공동학술대회 발표논문
    • /
    • pp.822-828
    • /
    • 2005
  • Customer satisfaction becomes one of the important managerial concerns associated with corporate competency in current competitive environment for Internet communication service companies. Hence, it is demanding to improve a company's customer satisfaction through the total quality management perspective. In practice, engineers as well as the management hope to find major quality characteristics with Internet communication network that is closely related to customer satisfaction, consequently aiming to the raise of their company's customer satisfaction. This paper presents an empirical relationship analysis between network quality characteristics and customer satisfaction on Internet communication. Methodologically, the relationship analysis framework is based on the regression variable selection procedures. In this framework, it is implemented that; 1) iterative model building; and 2) consistent criteria application to statistical tests for selecting significant variables. A case study shows that; 1) the customer satisfaction on the network connection seems to be more closely related to the network quality characteristics compared with the customer satisfaction on the network speed; and 2) the download disconnection rate has relatively evident relationship with the customer satisfaction on the network connection.

  • PDF

The Regional Homogeneity in the Presence of Heteroskedasticity

  • Chung, Kyoun-Sup;Lee, Sang-Yup
    • 한국시스템다이내믹스연구
    • /
    • 제8권2호
    • /
    • pp.25-49
    • /
    • 2007
  • An important assumption of the classical linear regression model is that the disturbances appearing in the population regression function are homoskedastic; that is, they all have the same variance. If we persist in using the usual testing procedures despite heteroskedasticity, what ever conclusions we draw or inferences we make be very misleading. The contribution of this paper will be to the concrete procedure of the proper estimation when the heteroskedasticity does exist in the data, because the quality of dependent variable predictions, i.e., the estimated variance of the dependent variable, can be improved by giving consideration to the issues of regional homogeneity and/or heteroskedasticity across the research area. With respect to estimation, specific attention should be paid to the selection of the appropriate strategy in terms of the auxiliary regression model. The paper shows that by testing for heteroskedasticity, and by using robust methods in the presence of with and without heteroskedasticity, more efficient statistical inferences are provided.

  • PDF

A note on standardization in penalized regressions

  • Lee, Sangin
    • Journal of the Korean Data and Information Science Society
    • /
    • 제26권2호
    • /
    • pp.505-516
    • /
    • 2015
  • We consider sparse high-dimensional linear regression models. Penalized regressions have been used as effective methods for variable selection and estimation in high-dimensional models. In penalized regressions, it is common practice to standardize variables before fitting a penalized model and then fit a penalized model with standardized variables. Finally, the estimated coefficients from a penalized model are recovered to the scale on original variables. However, these procedures produce a slightly different solution compared to the corresponding original penalized problem. In this paper, we investigate issues on the standardization of variables in penalized regressions and formulate the definition of the standardized penalized estimator. In addition, we compare the original penalized estimator with the standardized penalized estimator through simulation studies and real data analysis.

Factors Influencing the Choices of Accounting Policies in Small and Medium Enterprises in Vietnam

  • PHAM, Cuong Duc;PHI, Trong Van
    • The Journal of Asian Finance, Economics and Business
    • /
    • 제7권10호
    • /
    • pp.687-696
    • /
    • 2020
  • Accounting policies are principles and practices by which an entity uses to recognize, measure and report economic transactions. Improper application of accounting policies can lead to misrepresentation of firms' financial position and performance which consequently results in incorrect accounting information to the users. This paper aims to investigate the factors influencing the choices of accounting policies in small and medium enterprises (SMEs) in Vietnam by reviewing relevant literature to build a research model. The research model comprises of one dependent variable that is income-decreasing accounting procedures and six independent variables namely the firm size, financial leverage, incentives, auditor, accountants, and tax policies. After this, the authors collected primary data from more than 200 questionnaires sent to directors and chief accountants of the SMEs for the period 2018 to 2019. We then used Ordinary Least Squares regression method (OLS) to analyze the data. The results showed that four factors influenced selection of accounting policies in which auditors are associated with income-increasing accounting policies; and there are three factors associated with income-decreasing accounting policies which are, company size, tax and accountant. Especially, the research results indicate that company size has a significant influence on the selection of accounting policies in the SMEs. Based on the results, we propose instructive suggestions for regulators and lawmakers improve choices of accounting policies in the SMEs.