• Title/Summary/Keyword: Subset selection

Search Result 203, Processing Time 0.026 seconds

An Exploration on the Use of Data Envelopment Analysis for Product Line Selection

  • Lin, Chun-Yu;Okudan, Gul E.
    • Industrial Engineering and Management Systems
    • /
    • v.8 no.1
    • /
    • pp.47-53
    • /
    • 2009
  • We define product line (or mix) selection problem as selecting a subset of potential product variants that can simultaneously minimize product proliferation and maintain market coverage. Selecting the most efficient product mix is a complex problem, which requires analyses of multiple criteria. This paper proposes a method based on Data Envelopment Analysis (DEA) for product line selection. Data Envelopment Analysis (DEA) is a linear programming based technique commonly used for measuring the relative performance of a group of decision making units with multiple inputs and outputs. Although DEA has been proved to be an effective evaluation tool in many fields, it has not been applied to solve the product line selection problem. In this study, we construct a five-step method that systematically adopts DEA to solve a product line selection problem. We then apply the proposed method to an existing line of staplers to provide quantitative evidence for managers to generate desirable decisions to maximize the company profits while also fulfilling market demands.

Prediction model of osteoporosis using nutritional components based on association (연관성 규칙 기반 영양소를 이용한 골다공증 예측 모델)

  • Yoo, JungHun;Lee, Bum Ju
    • The Journal of the Convergence on Culture Technology
    • /
    • v.6 no.3
    • /
    • pp.457-462
    • /
    • 2020
  • Osteoporosis is a disease that occurs mainly in the elderly and increases the risk of fractures due to structural deterioration of bone mass and tissues. The purpose of this study are to assess the relationship between nutritional components and osteoporosis and to evaluate models for predicting osteoporosis based on nutrient components. In experimental method, association was performed using binary logistic regression, and predictive models were generated using the naive Bayes algorithm and variable subset selection methods. The analysis results for single variables indicated that food intake and vitamin B2 showed the highest value of the area under the receiver operating characteristic curve (AUC) for predicting osteoporosis in men. In women, monounsaturated fatty acids showed the highest AUC value. In prediction model of female osteoporosis, the models generated by the correlation based feature subset and wrapper based variable subset methods showed an AUC value of 0.662. In men, the model by the full variable obtained an AUC of 0.626, and in other male models, the predictive performance was very low in sensitivity and 1-specificity. The results of these studies are expected to be used as the basic information for the treatment and prevention of osteoporosis.

Association-based Unsupervised Feature Selection for High-dimensional Categorical Data (고차원 범주형 자료를 위한 비지도 연관성 기반 범주형 변수 선택 방법)

  • Lee, Changki;Jung, Uk
    • Journal of Korean Society for Quality Management
    • /
    • v.47 no.3
    • /
    • pp.537-552
    • /
    • 2019
  • Purpose: The development of information technology makes it easy to utilize high-dimensional categorical data. In this regard, the purpose of this study is to propose a novel method to select the proper categorical variables in high-dimensional categorical data. Methods: The proposed feature selection method consists of three steps: (1) The first step defines the goodness-to-pick measure. In this paper, a categorical variable is relevant if it has relationships among other variables. According to the above definition of relevant variables, the goodness-to-pick measure calculates the normalized conditional entropy with other variables. (2) The second step finds the relevant feature subset from the original variables set. This step decides whether a variable is relevant or not. (3) The third step eliminates redundancy variables from the relevant feature subset. Results: Our experimental results showed that the proposed feature selection method generally yielded better classification performance than without feature selection in high-dimensional categorical data, especially as the number of irrelevant categorical variables increase. Besides, as the number of irrelevant categorical variables that have imbalanced categorical values is increasing, the difference in accuracy between the proposed method and the existing methods being compared increases. Conclusion: According to experimental results, we confirmed that the proposed method makes it possible to consistently produce high classification accuracy rates in high-dimensional categorical data. Therefore, the proposed method is promising to be used effectively in high-dimensional situation.

The Credit Information Feature Selection Method in Default Rate Prediction Model for Individual Businesses (개인사업자 부도율 예측 모델에서 신용정보 특성 선택 방법)

  • Hong, Dongsuk;Baek, Hanjong;Shin, Hyunjoon
    • Journal of the Korea Society for Simulation
    • /
    • v.30 no.1
    • /
    • pp.75-85
    • /
    • 2021
  • In this paper, we present a deep neural network-based prediction model that processes and analyzes the corporate credit and personal credit information of individual business owners as a new method to predict the default rate of individual business more accurately. In modeling research in various fields, feature selection techniques have been actively studied as a method for improving performance, especially in predictive models including many features. In this paper, after statistical verification of macroeconomic indicators (macro variables) and credit information (micro variables), which are input variables used in the default rate prediction model, additionally, through the credit information feature selection method, the final feature set that improves prediction performance was identified. The proposed credit information feature selection method as an iterative & hybrid method that combines the filter-based and wrapper-based method builds submodels, constructs subsets by extracting important variables of the maximum performance submodels, and determines the final feature set through prediction performance analysis of the subset and the subset combined set.

Bankruptcy prediction using an improved bagging ensemble (개선된 배깅 앙상블을 활용한 기업부도예측)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.121-139
    • /
    • 2014
  • Predicting corporate failure has been an important topic in accounting and finance. The costs associated with bankruptcy are high, so the accuracy of bankruptcy prediction is greatly important for financial institutions. Lots of researchers have dealt with the topic associated with bankruptcy prediction in the past three decades. The current research attempts to use ensemble models for improving the performance of bankruptcy prediction. Ensemble classification is to combine individually trained classifiers in order to gain more accurate prediction than individual models. Ensemble techniques are shown to be very useful for improving the generalization ability of the classifier. Bagging is the most commonly used methods for constructing ensemble classifiers. In bagging, the different training data subsets are randomly drawn with replacement from the original training dataset. Base classifiers are trained on the different bootstrap samples. Instance selection is to select critical instances while deleting and removing irrelevant and harmful instances from the original set. Instance selection and bagging are quite well known in data mining. However, few studies have dealt with the integration of instance selection and bagging. This study proposes an improved bagging ensemble based on instance selection using genetic algorithms (GA) for improving the performance of SVM. GA is an efficient optimization procedure based on the theory of natural selection and evolution. GA uses the idea of survival of the fittest by progressively accepting better solutions to the problems. GA searches by maintaining a population of solutions from which better solutions are created rather than making incremental changes to a single solution to the problem. The initial solution population is generated randomly and evolves into the next generation by genetic operators such as selection, crossover and mutation. The solutions coded by strings are evaluated by the fitness function. The proposed model consists of two phases: GA based Instance Selection and Instance based Bagging. In the first phase, GA is used to select optimal instance subset that is used as input data of bagging model. In this study, the chromosome is encoded as a form of binary string for the instance subset. In this phase, the population size was set to 100 while maximum number of generations was set to 150. We set the crossover rate and mutation rate to 0.7 and 0.1 respectively. We used the prediction accuracy of model as the fitness function of GA. SVM model is trained on training data set using the selected instance subset. The prediction accuracy of SVM model over test data set is used as fitness value in order to avoid overfitting. In the second phase, we used the optimal instance subset selected in the first phase as input data of bagging model. We used SVM model as base classifier for bagging ensemble. The majority voting scheme was used as a combining method in this study. This study applies the proposed model to the bankruptcy prediction problem using a real data set from Korean companies. The research data used in this study contains 1832 externally non-audited firms which filed for bankruptcy (916 cases) and non-bankruptcy (916 cases). Financial ratios categorized as stability, profitability, growth, activity and cash flow were investigated through literature review and basic statistical methods and we selected 8 financial ratios as the final input variables. We separated the whole data into three subsets as training, test and validation data set. In this study, we compared the proposed model with several comparative models including the simple individual SVM model, the simple bagging model and the instance selection based SVM model. The McNemar tests were used to examine whether the proposed model significantly outperforms the other models. The experimental results show that the proposed model outperforms the other models.

ON THE THEORY OF SELECTIONS

  • LEE, SEUNG WOO
    • Honam Mathematical Journal
    • /
    • v.19 no.1
    • /
    • pp.125-130
    • /
    • 1997
  • In this paper, we give a characterization of collectionwise normality using continuous functions. More precisely, we give a new and short proof of the Dowker's theorem using selection theory that a $T_1$ space X is collectionwise normal if every continuous mapping of every closed subset F of X into a Banach space can be continuously extended over X. This is also a generalization of Tietze's extension theorem.

  • PDF

Selection Conditional on Associated Measurements

  • Yeo, Woon-Bang
    • Journal of the Korean Statistical Society
    • /
    • v.12 no.2
    • /
    • pp.110-114
    • /
    • 1983
  • In this paper, a random subset selection procedure for the choice of the k best objects out of n primary measurements $Y_t$ is considered when only the associated measurements $X_t$ are available. In contrast to Yeo and David (1992), where only the ranks of the X's are needed, the present uses the observed X-values. The approach is illustrated numerically when X and Y are bivariate normal and the standard deviation of X is known.

  • PDF

Laplace-Metropolis Algorithm for Variable Selection in Multinomial Logit Model (Laplace-Metropolis알고리즘에 의한 다항로짓모형의 변수선택에 관한 연구)

  • 김혜중;이애경
    • Journal of Korean Society for Quality Management
    • /
    • v.29 no.1
    • /
    • pp.11-23
    • /
    • 2001
  • This paper is concerned with suggesting a Bayesian method for variable selection in multinomial logit model. It is based upon an optimal rule suggested by use of Bayes rule which minimizes a risk induced by selecting the multinomial logit model. The rule is to find a subset of variables that maximizes the marginal likelihood of the model. We also propose a Laplace-Metropolis algorithm intended to suggest a simple method forestimating the marginal likelihood of the model. Based upon two examples, artificial data and empirical data examples, the Bayesian method is illustrated and its efficiency is examined.

  • PDF

A Study on Nonparametric Selection Procedures for Scale Parameters

  • Song, Moon-Sup;Chung, Han-Young;Kim, Dong-Jae
    • Journal of the Korean Statistical Society
    • /
    • v.14 no.1
    • /
    • pp.39-47
    • /
    • 1985
  • In this paper, we propose some nonparametric subset selection procedures for scale parameters based on rank-likes. The proposed procedures are compared to the Gupta-Sobel's parametric prcedure through a small-sample Monte Carlo study. The results show that the nonparametric procedures are quite robust for heavy-tailed distributions, but they have somewhat low efficiencies.

  • PDF

Variable selection in the kernel Cox regression

  • Shim, Joo-Yong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.4
    • /
    • pp.795-801
    • /
    • 2011
  • In machine learning and statistics it is often the case that some variables are not important, while some variables are more important than others. We propose a novel algorithm for selecting such relevant variables in the kernel Cox regression. We employ the weighted version of ANOVA decomposition kernels to choose optimal subset of relevant variables in the kernel Cox regression. Experimental results are then presented which indicate the performance of the proposed method.