• Title/Summary/Keyword: Subset selection problem

Search Result 40, Processing Time 0.023 seconds

Subset selection in multiple linear regression: An improved Tabu search

  • Bae, Jaegug;Kim, Jung-Tae;Kim, Jae-Hwan
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.40 no.2
    • /
    • pp.138-145
    • /
    • 2016
  • This paper proposes an improved tabu search method for subset selection in multiple linear regression models. Variable selection is a vital combinatorial optimization problem in multivariate statistics. The selection of the optimal subset of variables is necessary in order to reliably construct a multiple linear regression model. Its applications widely range from machine learning, timeseries prediction, and multi-class classification to noise detection. Since this problem has NP-complete nature, it becomes more difficult to find the optimal solution as the number of variables increases. Two typical metaheuristic methods have been developed to tackle the problem: the tabu search algorithm and hybrid genetic and simulated annealing algorithm. However, these two methods have shortcomings. The tabu search method requires a large amount of computing time, and the hybrid algorithm produces a less accurate solution. To overcome the shortcomings of these methods, we propose an improved tabu search algorithm to reduce moves of the neighborhood and to adopt an effective move search strategy. To evaluate the performance of the proposed method, comparative studies are performed on small literature data sets and on large simulation data sets. Computational results show that the proposed method outperforms two metaheuristic methods in terms of the computing time and solution quality.

Subset Selection in the Poisson Models - A Normal Predictors case - (포아송 모형에서의 설명변수 선택문제 - 정규분포 설명변수하에서 -)

  • 박종선
    • The Korean Journal of Applied Statistics
    • /
    • v.11 no.2
    • /
    • pp.247-255
    • /
    • 1998
  • In this paper, a new subset selection problem in the Poisson model is considered under the normal predictors. It turns out that the subset model has bigger valiance than that of the Poisson model with random predictors and this has been used to derive new subset selection method similar to Mallows'$C_p$.

  • PDF

The Performance Analysis and Comparison of The MIMO-OFDM Scheme Applied to Pre-coding, Antenna Subset Selection and AMC for 4G Communication System (4G 통신시스템 기반의 Pre-coding과 Antenna Subset Selection, AMC 기법을 적용한 각 MIMO-OFDM 기법의 성능 분석 및 비교)

  • Cho, In-Sik;Seo, Chang-Woo;Yoon, Gil-Sang;Lee, Jeong-Hwan;Hwang, In-Tae
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.47 no.3
    • /
    • pp.31-38
    • /
    • 2010
  • In this paper, we have analyzed and compared the BER and the throughput performance through the computer simulation, after applying several MIMO schemes on the MIMO-OFDM system. Then, the throughput performance of the proposed system, Adaptive-MCM, is analyzed. As a result, the MIMO-OFDM Adaptive-MCM system proposed has a higher average data rate than Non Adaptive-MCM system through the improvement of Trade-off problem between throughput and SNR.

A two-stage damage detection approach based on subset selection and genetic algorithms

  • Yun, Gun Jin;Ogorzalek, Kenneth A.;Dyke, Shirley J.;Song, Wei
    • Smart Structures and Systems
    • /
    • v.5 no.1
    • /
    • pp.1-21
    • /
    • 2009
  • A two-stage damage detection method is proposed and demonstrated for structural health monitoring. In the first stage, the subset selection method is applied for the identification of the multiple damage locations. In the second stage, the damage severities of the identified damaged elements are determined applying SSGA to solve the optimization problem. In this method, the sensitivities of residual force vectors with respect to damage parameters are employed for the subset selection process. This approach is particularly efficient in detecting multiple damage locations. The SEREP is applied as needed to expand the identified mode shapes while using a limited number of sensors. Uncertainties in the stiffness of the elements are also considered as a source of modeling errors to investigate their effects on the performance of the proposed method in detecting damage in real-life structures. Through a series of illustrative examples, the proposed two-stage damage detection method is demonstrated to be a reliable tool for identifying and quantifying multiple damage locations within diverse structural systems.

An Exploration on the Use of Data Envelopment Analysis for Product Line Selection

  • Lin, Chun-Yu;Okudan, Gul E.
    • Industrial Engineering and Management Systems
    • /
    • v.8 no.1
    • /
    • pp.47-53
    • /
    • 2009
  • We define product line (or mix) selection problem as selecting a subset of potential product variants that can simultaneously minimize product proliferation and maintain market coverage. Selecting the most efficient product mix is a complex problem, which requires analyses of multiple criteria. This paper proposes a method based on Data Envelopment Analysis (DEA) for product line selection. Data Envelopment Analysis (DEA) is a linear programming based technique commonly used for measuring the relative performance of a group of decision making units with multiple inputs and outputs. Although DEA has been proved to be an effective evaluation tool in many fields, it has not been applied to solve the product line selection problem. In this study, we construct a five-step method that systematically adopts DEA to solve a product line selection problem. We then apply the proposed method to an existing line of staplers to provide quantitative evidence for managers to generate desirable decisions to maximize the company profits while also fulfilling market demands.

Performance Comparison between Genetic Algorithms and Dynamic Programming in the Subset-Sum Problem (부분집합 합 문제에서의 유전 알고리즘과 동적 계획법의 성능 비교)

  • Cho, Hwi-Yeon;Kim, Yong-Hyuk
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.8 no.4
    • /
    • pp.259-267
    • /
    • 2018
  • The subset-sum problem is to find out whether or not the element sum of a subset within a finite set of numbers is equal to a given value. The problem is a well-known NP-complete problem, which is difficult to solve within a polynomial time. Genetic algorithm is a method for finding the optimal solution of a given problem through operations such as selection, crossover, and mutation. Dynamic programming is a method of solving a given problem from one or several subproblems. In this paper, we design and implement a genetic algorithm that solves the subset-sum problem, and experimentally compared the time performance to find the answer with the case of dynamic programming method. We selected a total of 17 test cases considering the difficulty in a set with 63 elements of positive number, and compared the performance of the two algorithms. The presented genetic algorithms showed time performance improved by 84% on 13 of 17 problems when compared with dynamic programming.

Comparisons of some subset selection procedures for K normal populations with unequal sample size (표본크기가 다른 정규모집단의 평균에 대한 부분집합선택절차론의 성질과 비교연구)

  • 손중권;김소연;김영훈
    • The Korean Journal of Applied Statistics
    • /
    • v.3 no.1
    • /
    • pp.79-87
    • /
    • 1990
  • The problem of selecting a nonempty subset of K(>2) normal means with unknown variances has been studies by many authors. But the comparisions of the properties and the efficiencies of the proposed subset selection procedures have not been carried out. Thus we investigate properties of the proposed procedures and compare their performances for various cases.

  • PDF

On a Robust Subset Selection Procedure for the Slopes of Regression Equations

  • Song, Moon-Sup;Oh, Chang-Hyuck
    • Journal of the Korean Statistical Society
    • /
    • v.10
    • /
    • pp.105-121
    • /
    • 1981
  • The problem of selection of a subset containing the largest of several slope parameters of regression equations is considered. The proposed selection procedure is based on the weighted median estimators for regression parameters and the median of rescaled absolute residuals for scale parameters. Those estimators are compared with the classical least squares estimators by a simulation study. A Monte Carlo comparison is also made between the new procedure based on the weighted median estiamtors and the procedure based on the least squares estimators. The results show that the proposed procedure is quite robust with respect to the heaviness of distribution tails.

  • PDF

Feature Selection Using Submodular Approach for Financial Big Data

  • Attigeri, Girija;Manohara Pai, M.M.;Pai, Radhika M.
    • Journal of Information Processing Systems
    • /
    • v.15 no.6
    • /
    • pp.1306-1325
    • /
    • 2019
  • As the world is moving towards digitization, data is generated from various sources at a faster rate. It is getting humungous and is termed as big data. The financial sector is one domain which needs to leverage the big data being generated to identify financial risks, fraudulent activities, and so on. The design of predictive models for such financial big data is imperative for maintaining the health of the country's economics. Financial data has many features such as transaction history, repayment data, purchase data, investment data, and so on. The main problem in predictive algorithm is finding the right subset of representative features from which the predictive model can be constructed for a particular task. This paper proposes a correlation-based method using submodular optimization for selecting the optimum number of features and thereby, reducing the dimensions of the data for faster and better prediction. The important proposition is that the optimal feature subset should contain features having high correlation with the class label, but should not correlate with each other in the subset. Experiments are conducted to understand the effect of the various subsets on different classification algorithms for loan data. The IBM Bluemix BigData platform is used for experimentation along with the Spark notebook. The results indicate that the proposed approach achieves considerable accuracy with optimal subsets in significantly less execution time. The algorithm is also compared with the existing feature selection and extraction algorithms.

Operating characteristics of a subset selection procedure for selecting the best normal population with common unknown variance (최고의 정규 모집단을 뽑기 위한 부분집합선택절차론의 운용특성에 관한 연구)

  • ;Shanti S. Gupta
    • The Korean Journal of Applied Statistics
    • /
    • v.3 no.1
    • /
    • pp.59-78
    • /
    • 1990
  • The subset selection approach introduced by Gupta plays an important role in the multiple decision procedures. For the normal means problem with common unknown variance, some operating characteristics of the selection procedure have been investigated via Monte Carlo simulation. Also some properties including efficiencies of the selection procedure are examined when the data are contaminated.

  • PDF