Browse > Article
http://dx.doi.org/10.5916/jkosme.2016.40.2.138

Subset selection in multiple linear regression: An improved Tabu search  

Bae, Jaegug (Department of Data Information, Korea Maritime and Ocean University)
Kim, Jung-Tae (Department of Data Information, Korea Maritime and Ocean University)
Kim, Jae-Hwan (Department of Data Information, Korea Maritime and Ocean University)
Abstract
This paper proposes an improved tabu search method for subset selection in multiple linear regression models. Variable selection is a vital combinatorial optimization problem in multivariate statistics. The selection of the optimal subset of variables is necessary in order to reliably construct a multiple linear regression model. Its applications widely range from machine learning, timeseries prediction, and multi-class classification to noise detection. Since this problem has NP-complete nature, it becomes more difficult to find the optimal solution as the number of variables increases. Two typical metaheuristic methods have been developed to tackle the problem: the tabu search algorithm and hybrid genetic and simulated annealing algorithm. However, these two methods have shortcomings. The tabu search method requires a large amount of computing time, and the hybrid algorithm produces a less accurate solution. To overcome the shortcomings of these methods, we propose an improved tabu search algorithm to reduce moves of the neighborhood and to adopt an effective move search strategy. To evaluate the performance of the proposed method, comparative studies are performed on small literature data sets and on large simulation data sets. Computational results show that the proposed method outperforms two metaheuristic methods in terms of the computing time and solution quality.
Keywords
Metaheuristics; Improved tabu search; Subset selection problem;
Citations & Related Records
연도 인용수 순위
  • Reference
1 I. Guyon and A. Elisseeff, "An introduction to variable and feature selection," Journal of Machine Leaning Research, vol. 3, pp. 1157-1182, 2003.
2 G. M. Furnival and R.W. Wilson, "Regression by leaps and bounds," Technometrics, vol. 16, pp. 416-423, 1974.
3 A. P. D. Silva, "Efficient variable screening for multivariate analysis," Journal of Multivariate Analysis, vol.76, pp. 35-62, 2001.   DOI
4 A. P. Duarte-Silva, "Discarding variables in a principal component analysis: algorithms for all-subsets comparisons," Computational Statistics, vol. 17 pp. 251-271, 2002.   DOI
5 C. Gatu and E. J. Kontoghiorghes, "Branch-and-bound algorithms for computing the best-subset regression models," Journal of Computational and Graphical Statistics, vol. 15, no. 1, pp. 139-156, 2006.   DOI
6 M. Hofmann, C. Gatu, and E. J. Kontoghiorghes, "Efficient algorithms for computing the best subset regression models for large-scale problems," Computational Statistics and Data Analysis, vol. 52, no. 1, pp. 16-29, 2007.   DOI
7 M. J. Brusco, D. Steinley, and J. D. Cradit, "An exact algorithm for hierarchically well-formulated subsets in second-order polynomial regression," Technometrics, vol. 51, no. 3, pp. 306-315, 2009.   DOI
8 J. Pacheco, S. Casado, and S. Porras, "Exact methods for variable selection in principal component analysis: Guide functions and pre-selection," Computational Statistics and Data Analysis, vol. 57, no. 1, pp. 95-111, 2013.   DOI
9 Z. Drezner and G. A. Marcoulides, "Tabu seach model selection in multiple regression analysis," Communications in Statistics - Simulation and Computation, vol. 28, no. 9, pp. 349-367, 1999.   DOI
10 H. Hasan, "Subset selection in multiple linear regression models: A hybrid of genetic and simulated annealing algorithms," Applied Mathematics and Computation, vol. 219, no. 23, pp. 11018-11028, 2013.   DOI
11 N. R. Draper and H. Smith, Applied Regression Analysis, 3th Edition, NewYork: Wiley, 1998.
12 D. G. Montgomery and E. A Peck, Introduction to Linear Regression Analysis, 2nd Edition, NewYork: Wiley, 1992.
13 F. Glover, "Heuristics for integer programming using surrogate constraints", Decision Sciences, vol. 8, no. 1, pp. 156-166, 1977.   DOI
14 F. Glover, "Future paths for integer programming and links to artificial intelligence," Computers and Operations Research, vol. 13, no. 5, pp. 533-549, 1986.   DOI
15 S. Oliveira and G. Stroud, "A parallel version of tabu search and the assignment problem," Heuristics for Combinatorial Optimization, vol. 4, pp. 1-24, 1989.
16 D. D. Werra and A. Herz, "Tabu search techniques: a tutorial and an application to neural networks," OR Spektrum, vol. 11, pp. 131-141, 1989.   DOI
17 M. Laguna, J. W. Barnes, and F. Glover, "Tabu search methods for a single machine scheduling problem", Journal of Intelligent Manufacturing, vol. 2, no. 2, pp. 63-74, 1991.   DOI
18 M. Laguna and J. L. G. Velarde, "A search heuristic for just-in-time scheduling in parallel machines," Journal of Intelligent Manufacturing, vol. 2, no. 4, pp. 253-260, 1991.   DOI
19 F. T. Lin, C. Y. Kao, and C. C. Hsu, "Applying the genetic approach to simulated annealing in solving some NP-hard problems," IEEE Transactions on System Man Cybernetics, vol. 23, no. 6, pp. 1752-1767, 1993.   DOI
20 J. A. Bland and G. P. Dawson, "Tabu search and design optimization," Computer Aided Design, vol. 23, no. 3, pp. 195-202, 1991.   DOI
21 J. H. Holland, "Adaptaion in natural and artificial systems," University of Michigan Press, 1975.
22 S. Kirpatirck, C. D. Gelatt, and M. P. Vecchi, "Optimization by simulated annealing," Science, vol. 220, pp. 671-680, 1983.   DOI
23 M. Widmer and A. Hertz, "A new heuristic method for the flow shop sequencing problem," European Journal of Operational Research, vol. 41, no. 2, pp. 186-193, 1989.   DOI
24 E. Tailard, "Some efficient heuristic methods for the flow shop sequencing problem," European Journal of Operational Research, vol. 47, no. 1, pp. 65-74, 1990.   DOI