• 제목/요약/키워드: model selection

검색결과 4,026건 처리시간 0.033초

Simultaneous outlier detection and variable selection via difference-based regression model and stochastic search variable selection

  • Park, Jong Suk;Park, Chun Gun;Lee, Kyeong Eun
    • Communications for Statistical Applications and Methods
    • /
    • 제26권2호
    • /
    • pp.149-161
    • /
    • 2019
  • In this article, we suggest the following approaches to simultaneous variable selection and outlier detection. First, we determine possible candidates for outliers using properties of an intercept estimator in a difference-based regression model, and the information of outliers is reflected in the multiple regression model adding mean shift parameters. Second, we select the best model from the model including the outlier candidates as predictors using stochastic search variable selection. Finally, we evaluate our method using simulations and real data analysis to yield promising results. In addition, we need to develop our method to make robust estimates. We will also to the nonparametric regression model for simultaneous outlier detection and variable selection.

SCM 시스템 선정을 위한 의사 결정 모델 (A Decision-making Model for SCM System Selection)

  • 서광규
    • 대한안전경영과학회지
    • /
    • 제7권4호
    • /
    • pp.165-177
    • /
    • 2005
  • Supply Chain Management(SCM) system is a critical investment that can affect future competitiveness and performance of a company. Selection of a right SCM system is one of the critical issues. This paper provides the characteristic factors of SCM system selection and the SCM system evaluation and selection model based on Analytic Hierarchy Process(AHP). The proposed model can systematically construct the objectives of SCM system selection to support the business goals. A empirical example demonstrates the feasibility of the proposed model and the model can help a company to make better decision-making in selecting SCM system.

Ensemble Gene Selection Method Based on Multiple Tree Models

  • Mingzhu Lou
    • Journal of Information Processing Systems
    • /
    • 제19권5호
    • /
    • pp.652-662
    • /
    • 2023
  • Identifying highly discriminating genes is a critical step in tumor recognition tasks based on microarray gene expression profile data and machine learning. Gene selection based on tree models has been the subject of several studies. However, these methods are based on a single-tree model, often not robust to ultra-highdimensional microarray datasets, resulting in the loss of useful information and unsatisfactory classification accuracy. Motivated by the limitations of single-tree-based gene selection, in this study, ensemble gene selection methods based on multiple-tree models were studied to improve the classification performance of tumor identification. Specifically, we selected the three most representative tree models: ID3, random forest, and gradient boosting decision tree. Each tree model selects top-n genes from the microarray dataset based on its intrinsic mechanism. Subsequently, three ensemble gene selection methods were investigated, namely multipletree model intersection, multiple-tree module union, and multiple-tree module cross-union, were investigated. Experimental results on five benchmark public microarray gene expression datasets proved that the multiple tree module union is significantly superior to gene selection based on a single tree model and other competitive gene selection methods in classification accuracy.

다중선형회귀모형에서의 변수선택기법 평가 (Evaluating Variable Selection Techniques for Multivariate Linear Regression)

  • 류나현;김형석;강필성
    • 대한산업공학회지
    • /
    • 제42권5호
    • /
    • pp.314-326
    • /
    • 2016
  • The purpose of variable selection techniques is to select a subset of relevant variables for a particular learning algorithm in order to improve the accuracy of prediction model and improve the efficiency of the model. We conduct an empirical analysis to evaluate and compare seven well-known variable selection techniques for multiple linear regression model, which is one of the most commonly used regression model in practice. The variable selection techniques we apply are forward selection, backward elimination, stepwise selection, genetic algorithm (GA), ridge regression, lasso (Least Absolute Shrinkage and Selection Operator) and elastic net. Based on the experiment with 49 regression data sets, it is found that GA resulted in the lowest error rates while lasso most significantly reduces the number of variables. In terms of computational efficiency, forward/backward elimination and lasso requires less time than the other techniques.

Behrens-Fisher Problem from a Model Selection Point of View

  • Jeon, Jong-Woo;Lee, Kee-Won
    • Journal of the Korean Statistical Society
    • /
    • 제20권2호
    • /
    • pp.99-107
    • /
    • 1991
  • Behrens-Fisher problem is viewed from a model selection approach. Normal distribution is regarded as an approximating model, A criterion, called TIC, is derived and is compared with selection criteria such as AIC and a bootstrap estimator. Stochastic approximation is used since no closed form expression is available for the bootstrap estimator.

  • PDF

A study of selection operator using distance information between individuals in genetic algorithm

  • Ito, Minoru;Sugisaka, Masanori
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 2003년도 ICCAS
    • /
    • pp.1521-1524
    • /
    • 2003
  • In this paper, we propose a "Distance Correlation Selection operator (DCS)" as a new selection operator. For Genetic Algorithm (GA), many improvements have been proposed. The MGG (Minimal Generation Gap) model proposed by Satoh et.al. shows good performance. The MGG model has all advantages of conventional models and the ability of avoiding the premature convergence and suppressing the evolutionary stagnation. The proposed method is an extension of selection operator in the original MGG model. Generally, GA has two types of selection operators, one is "selection for reproduction", and the other is "selection for survival"; the former is for crossover and the latter is the individuals which survive to the next generation. The proposed method is an extension of the former. The proposed method utilizes distance information between individuals. From this extension, the proposed method aims to expand a search area and improve ability to search solution. The performance of the proposed method is examined with several standard test functions. The experimental results show good performance better than the original MGG model.

  • PDF

Formwork System Selection Model for Tall Building Construction Using the Adaboost Algorithm

  • Shin, Yoon-Seok
    • 한국건축시공학회지
    • /
    • 제11권5호
    • /
    • pp.523-529
    • /
    • 2011
  • In a tall building construction with reinforced concrete structures, the selection of an appropriate formwork system is a crucial factor for the success of the project. Thus, selecting an appropriate formwork system affects the entire construction duration and cost, as well as subsequent construction activities. However, in practice, the selection of an appropriate formwork system has depended mainly on the intuitive and subjective opinion of working level employees with restricted experience. Therefore, in this study, a formwork system selection model using the Adaboost algorithm is proposed to support the selection of a formwork system that is suitable for the construction site conditions. To validate the applicability of the proposed model, the selection models Adaboost and ANN were both applied to actual case data of tall building construction in Korea. The Adaboost model showed slightly better accuracy than that of the ANN model. The Adaboost model can assist engineers to determine the appropriate formwork system at the inception of future projects.

연구과제 선정.평가 체계설계에 관한 연구 (Project Selection & Evaluation System Design and Implementation-Literature Review and Case Study-)

  • 용세중;최덕출;한종우;정용훈;이원영
    • 기술혁신연구
    • /
    • 제2권1호
    • /
    • pp.116-141
    • /
    • 1994
  • This paper presents a model for R&D project selection and evaluation system design developed through literature review. The model emphasizes the fitness between the five elements of the system : evaluation phase and purpose, personnel and organization, evaluation critiria and decision model, evaluation form and procedure, and projects. The model was applied in real situation as a test case. The important findings are that a good project selection and evaluation model contributes only partially to the effectiveness of the project selection and that system development and implementation activity is a dynamic and multi-facetted learning process.

  • PDF

절대 유사 임계값 기반 사례기반추론과 유전자 알고리즘을 활용한 시스템 트레이딩 (System Trading using Case-based Reasoning based on Absolute Similarity Threshold and Genetic Algorithm)

  • 한현웅;안현철
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제26권3호
    • /
    • pp.63-90
    • /
    • 2017
  • Purpose This study proposes a novel system trading model using case-based reasoning (CBR) based on absolute similarity threshold. The proposed model is designed to optimize the absolute similarity threshold, feature selection, and instance selection of CBR by using genetic algorithm (GA). With these mechanisms, it enables us to yield higher returns from stock market trading. Design/Methodology/Approach The proposed CBR model uses the absolute similarity threshold varying from 0 to 1, which serves as a criterion for selecting appropriate neighbors in the nearest neighbor (NN) algorithm. Since it determines the nearest neighbors on an absolute basis, it fails to select the appropriate neighbors from time to time. In system trading, it is interpreted as the signal of 'hold'. That is, the system trading model proposed in this study makes trading decisions such as 'buy' or 'sell' only if the model produces a clear signal for stock market prediction. Also, in order to improve the prediction accuracy and the rate of return, the proposed model adopts optimal feature selection and instance selection, which are known to be very effective in enhancing the performance of CBR. To validate the usefulness of the proposed model, we applied it to the index trading of KOSPI200 from 2009 to 2016. Findings Experimental results showed that the proposed model with optimal feature or instance selection could yield higher returns compared to the benchmark as well as the various comparison models (including logistic regression, multiple discriminant analysis, artificial neural network, support vector machine, and traditional CBR). In particular, the proposed model with optimal instance selection showed the best rate of return among all the models. This implies that the application of CBR with the absolute similarity threshold as well as the optimal instance selection may be effective in system trading from the perspective of returns.

표본 선택 모형을 이용한 국내 여성 임금 데이터 분석 (Korean women wage analysis using selection models)

  • 정미량;김미정
    • Journal of the Korean Data and Information Science Society
    • /
    • 제28권5호
    • /
    • pp.1077-1085
    • /
    • 2017
  • 본 연구에서는 한국노동연구원의 "2015년 한국노동패널조사 (KLIPS)" 자료를 활용하여 국내 여성의 임금 결정요인을 분석하기 한다. 일반적으로 임금 자료는 랜덤 추출이 불가능하기 때문에 분석하기가 쉽지 않다. 표본 선택 편의 (sampling bias)가 있는 자료를 분석하는 방법으로 Heckman 표본 선택 모형이 가장 널리 알려져 있다. Heckman은 크게 두 가지 모형을 제안했는데, 그 중 하나는 최대 우도 방법을 이용하는 것이고, 다른 하나는 2단계 표본 선택 모형이다. 이 중 Heckman 2단계 표본 선택 모형은 주된 결과 모형 (outcome model)과 경제 활동 여부를 결정짓는 선택 모형 (selection model)을 포함한 모형으로써, 이 모형이 최대 우도 방법을 이용한 모형에 비해 이변수 오차의 정규분포 가정에 덜 민감하다고 알려져 있다. 그럼에도 불구하고 이변수 오차에 대한 정규 분포 가정은 꽤 강한 가정이라고 볼 수 있는데, 최근에 이 모형의 단점을 보완하는 모형으로 Marchenko와 Genton (2012)의Heckman 표본 선택 t 모형이 제시되었다. Heckman 2단계 모형과 Heckman 표본 선택 t 모형을 이용하여 국내 여성의 임금 결정 요인을 분석하고 비교하도록 한다.