DOI QR코드

DOI QR Code

Subset selection in multiple linear regression: An improved Tabu search

  • Bae, Jaegug (Department of Data Information, Korea Maritime and Ocean University) ;
  • Kim, Jung-Tae (Department of Data Information, Korea Maritime and Ocean University) ;
  • Kim, Jae-Hwan (Department of Data Information, Korea Maritime and Ocean University)
  • 투고 : 2016.01.25
  • 심사 : 2016.02.01
  • 발행 : 2016.02.29

초록

This paper proposes an improved tabu search method for subset selection in multiple linear regression models. Variable selection is a vital combinatorial optimization problem in multivariate statistics. The selection of the optimal subset of variables is necessary in order to reliably construct a multiple linear regression model. Its applications widely range from machine learning, timeseries prediction, and multi-class classification to noise detection. Since this problem has NP-complete nature, it becomes more difficult to find the optimal solution as the number of variables increases. Two typical metaheuristic methods have been developed to tackle the problem: the tabu search algorithm and hybrid genetic and simulated annealing algorithm. However, these two methods have shortcomings. The tabu search method requires a large amount of computing time, and the hybrid algorithm produces a less accurate solution. To overcome the shortcomings of these methods, we propose an improved tabu search algorithm to reduce moves of the neighborhood and to adopt an effective move search strategy. To evaluate the performance of the proposed method, comparative studies are performed on small literature data sets and on large simulation data sets. Computational results show that the proposed method outperforms two metaheuristic methods in terms of the computing time and solution quality.

키워드

참고문헌

  1. I. Guyon and A. Elisseeff, "An introduction to variable and feature selection," Journal of Machine Leaning Research, vol. 3, pp. 1157-1182, 2003.
  2. G. M. Furnival and R.W. Wilson, "Regression by leaps and bounds," Technometrics, vol. 16, pp. 416-423, 1974.
  3. A. P. D. Silva, "Efficient variable screening for multivariate analysis," Journal of Multivariate Analysis, vol.76, pp. 35-62, 2001. https://doi.org/10.1006/jmva.2000.1920
  4. A. P. Duarte-Silva, "Discarding variables in a principal component analysis: algorithms for all-subsets comparisons," Computational Statistics, vol. 17 pp. 251-271, 2002. https://doi.org/10.1007/s001800200105
  5. C. Gatu and E. J. Kontoghiorghes, "Branch-and-bound algorithms for computing the best-subset regression models," Journal of Computational and Graphical Statistics, vol. 15, no. 1, pp. 139-156, 2006. https://doi.org/10.1198/106186006X100290
  6. M. Hofmann, C. Gatu, and E. J. Kontoghiorghes, "Efficient algorithms for computing the best subset regression models for large-scale problems," Computational Statistics and Data Analysis, vol. 52, no. 1, pp. 16-29, 2007. https://doi.org/10.1016/j.csda.2007.03.017
  7. M. J. Brusco, D. Steinley, and J. D. Cradit, "An exact algorithm for hierarchically well-formulated subsets in second-order polynomial regression," Technometrics, vol. 51, no. 3, pp. 306-315, 2009. https://doi.org/10.1198/tech.2009.08022
  8. J. Pacheco, S. Casado, and S. Porras, "Exact methods for variable selection in principal component analysis: Guide functions and pre-selection," Computational Statistics and Data Analysis, vol. 57, no. 1, pp. 95-111, 2013. https://doi.org/10.1016/j.csda.2012.06.014
  9. Z. Drezner and G. A. Marcoulides, "Tabu seach model selection in multiple regression analysis," Communications in Statistics - Simulation and Computation, vol. 28, no. 9, pp. 349-367, 1999. https://doi.org/10.1080/03610919908813553
  10. H. Hasan, "Subset selection in multiple linear regression models: A hybrid of genetic and simulated annealing algorithms," Applied Mathematics and Computation, vol. 219, no. 23, pp. 11018-11028, 2013. https://doi.org/10.1016/j.amc.2013.05.016
  11. N. R. Draper and H. Smith, Applied Regression Analysis, 3th Edition, NewYork: Wiley, 1998.
  12. D. G. Montgomery and E. A Peck, Introduction to Linear Regression Analysis, 2nd Edition, NewYork: Wiley, 1992.
  13. F. Glover, "Heuristics for integer programming using surrogate constraints", Decision Sciences, vol. 8, no. 1, pp. 156-166, 1977. https://doi.org/10.1111/j.1540-5915.1977.tb01074.x
  14. F. Glover, "Future paths for integer programming and links to artificial intelligence," Computers and Operations Research, vol. 13, no. 5, pp. 533-549, 1986. https://doi.org/10.1016/0305-0548(86)90048-1
  15. S. Oliveira and G. Stroud, "A parallel version of tabu search and the assignment problem," Heuristics for Combinatorial Optimization, vol. 4, pp. 1-24, 1989.
  16. D. D. Werra and A. Herz, "Tabu search techniques: a tutorial and an application to neural networks," OR Spektrum, vol. 11, pp. 131-141, 1989. https://doi.org/10.1007/BF01720782
  17. M. Laguna, J. W. Barnes, and F. Glover, "Tabu search methods for a single machine scheduling problem", Journal of Intelligent Manufacturing, vol. 2, no. 2, pp. 63-74, 1991. https://doi.org/10.1007/BF01471219
  18. M. Laguna and J. L. G. Velarde, "A search heuristic for just-in-time scheduling in parallel machines," Journal of Intelligent Manufacturing, vol. 2, no. 4, pp. 253-260, 1991. https://doi.org/10.1007/BF01471113
  19. J. A. Bland and G. P. Dawson, "Tabu search and design optimization," Computer Aided Design, vol. 23, no. 3, pp. 195-202, 1991. https://doi.org/10.1016/0010-4485(91)90089-F
  20. F. T. Lin, C. Y. Kao, and C. C. Hsu, "Applying the genetic approach to simulated annealing in solving some NP-hard problems," IEEE Transactions on System Man Cybernetics, vol. 23, no. 6, pp. 1752-1767, 1993. https://doi.org/10.1109/21.257766
  21. J. H. Holland, "Adaptaion in natural and artificial systems," University of Michigan Press, 1975.
  22. S. Kirpatirck, C. D. Gelatt, and M. P. Vecchi, "Optimization by simulated annealing," Science, vol. 220, pp. 671-680, 1983. https://doi.org/10.1126/science.220.4598.671
  23. M. Widmer and A. Hertz, "A new heuristic method for the flow shop sequencing problem," European Journal of Operational Research, vol. 41, no. 2, pp. 186-193, 1989. https://doi.org/10.1016/0377-2217(89)90383-4
  24. E. Tailard, "Some efficient heuristic methods for the flow shop sequencing problem," European Journal of Operational Research, vol. 47, no. 1, pp. 65-74, 1990. https://doi.org/10.1016/0377-2217(90)90090-X

피인용 문헌

  1. A comparative study of filter methods based on information entropy vol.40, pp.5, 2016, https://doi.org/10.5916/jkosme.2016.40.5.437
  2. A Generalized Additive Model Combining Principal Component Analysis for PM2.5 Concentration Estimation vol.6, pp.8, 2017, https://doi.org/10.3390/ijgi6080248
  3. Process Pattern-Based Near-Infrared Spectroscopy (NIRS) Fault Detection Using a Potential Function vol.73, pp.4, 2019, https://doi.org/10.1177/0003702818809996