DOI QR코드

DOI QR Code

Model selection algorithm in Gaussian process regression for computer experiments

  • Received : 2017.02.09
  • Accepted : 2017.06.30
  • Published : 2017.07.31

Abstract

The model in our approach assumes that computer responses are a realization of a Gaussian processes superimposed on a regression model called a Gaussian process regression model (GPRM). Selecting a subset of variables or building a good reduced model in classical regression is an important process to identify variables influential to responses and for further analysis such as prediction or classification. One reason to select some variables in the prediction aspect is to prevent the over-fitting or under-fitting to data. The same reasoning and approach can be applicable to GPRM. However, only a few works on the variable selection in GPRM were done. In this paper, we propose a new algorithm to build a good prediction model among some GPRMs. It is a post-work of the algorithm that includes the Welch method suggested by previous researchers. The proposed algorithms select some non-zero regression coefficients (${\beta}^{\prime}s$) using forward and backward methods along with the Lasso guided approach. During this process, the fixed were covariance parameters (${\theta}^{\prime}s$) that were pre-selected by the Welch algorithm. We illustrated the superiority of our proposed models over the Welch method and non-selection models using four test functions and one real data example. Future extensions are also discussed.

Keywords

References

  1. Caballero JA and Grossmann IE (2008). Rigorous flowsheet optimization using process simulators and surrogate models, Computer Aided Chemical Engineering, 25, 551-556.
  2. Cox DD, Park JS, and Singer CE (2001). A statistical method for tuning a computer code to a data base, Computational Statistics & Data Analysis, 37, 77-92. https://doi.org/10.1016/S0167-9473(00)00057-8
  3. Deng H, Shao W, Ma Y, and Wei Z (2012). Bayesian metamodeling for computer experiments using the Gaussian Kriging models, Quality and Reliability Engineering International, 28, 455-466. https://doi.org/10.1002/qre.1259
  4. Dubourg V, Sudret B, and Deheeger F (2013). Metamodel-based importance sampling for structural reliability analysis, Probabilistic Engineering Mechanics, 33, 47-57. https://doi.org/10.1016/j.probengmech.2013.02.002
  5. Gomes MVC, Bogle IDL, Biscaia EC, and Odloak D (2008). Using Kriging models for real-time process optimisation, Computer Aided Chemical Engineering, 25, 361-366.
  6. James G, Witten D, Hastie T, and Tibshirani R (2013). An Introduction to Statistical Learning: with Applications in R, Springer, New York.
  7. Johnson JS, Gosling JP, and Kennedy MC (2011). Gaussian process emulation for second-order Monte Carlo simulations, Journal of Statistical Planning and Inference, 141, 1838-1848. https://doi.org/10.1016/j.jspi.2010.11.034
  8. Jung SY and Park C (2015). Variable selection with nonconcave penalty function on reduced-rank regression, Communications for Statistical Applications and Methods, 22, 41-54. https://doi.org/10.5351/CSAM.2015.22.1.041
  9. Kapoor A, Grauman K, Urtasun R, and Darrell T (2010). Gaussian processes for object categorization, International Journal of Computer Vision, 88, 169-188. https://doi.org/10.1007/s11263-009-0268-3
  10. Kennedy MC, Anderson CW, Conti S, and O'Hagan A (2006). Case studies in Gaussian process modelling of computer codes, Reliability Engineering & System Safety, 91, 1301-1309. https://doi.org/10.1016/j.ress.2005.11.028
  11. Kumar A (2015). Sequential tuning of complex computer models, Journal of Statistical Computation and Simulation, 85, 393-404. https://doi.org/10.1080/00949655.2013.823965
  12. Lee JH and Gard K (2014). Vehicle-soil interaction: testing, modeling, calibration and validation, Journal of Terramechanics, 52, 9-21. https://doi.org/10.1016/j.jterra.2013.12.001
  13. Lee S (2015). An additive sparse penalty for variable selection in high-dimensional linear regression model, Communications for Statistical Applications and Methods, 22, 147-157. https://doi.org/10.5351/CSAM.2015.22.2.147
  14. Linkletter C, Bingham D, Hengartner N, Higdon D, and Ye KQ (2006). Variable selection for Gaussian process models in computer experiments, Technometrics, 48, 478-490. https://doi.org/10.1198/004017006000000228
  15. Liu YJ, Chen T, and Yao Y (2013). Nonlinear process monitoring by integrating manifold learning with Gaussian process, Computer Aided Chemical Engineering, 32, 1009-1014.
  16. Marrel A, Iooss B, van Dorpe F, and Volkova E (2008). An efficient methodology for modeling complex computer codes with Gaussian processes, Computational Statistics and Data Analysis, 52, 4731-4744. https://doi.org/10.1016/j.csda.2008.03.026
  17. Moon H (2010). Design and analysis of computer experiments for screening input variables (Doctoral dissertation), Ohio State University, Columbus, OH.
  18. Morris MD and Mitchell TJ (1995). Exploratory designs for computational experiments, Journal of Statistical Planning and Inference, 43, 381-402. https://doi.org/10.1016/0378-3758(94)00035-T
  19. Park JS (1994). Optimal Latin-hypercube designs for computer experiments, Journal of Statistical Planning and Inference, 39, 95-111. https://doi.org/10.1016/0378-3758(94)90115-5
  20. Park JS and Baek J (2001). Efficient computation of maximum likelihood estimators in a spatial linear model with power exponential covariogram, Computers & Geosciences, 27, 1-7. https://doi.org/10.1016/S0098-3004(00)00016-9
  21. Rohmer J and Foerster E (2011). Global sensitivity analysis of large-scale numerical landslide models based on Gaussian-Process meta-modeling, Computers & Geosciences, 37, 917-927. https://doi.org/10.1016/j.cageo.2011.02.020
  22. Rojnik K and Naversnik K (2008). Gaussian process metamodeling in Bayesian value of information analysis: a case of the complex health economic model for breast cancer screening, Value in Health, 11, 240-250. https://doi.org/10.1111/j.1524-4733.2007.00244.x
  23. Sacks J,Welch WJ, Mitchell TJ, andWynn HP (1989). Design and analysis of computer experiments, Statistical Science, 4, 409-423. https://doi.org/10.1214/ss/1177012413
  24. Santner TJ, Williams BJ, and Notz WI (2003). The Design and Analysis of Computer Experiments, Springer, New York.
  25. Silvestrini RT, Montgomery DC, and Jones B (2013). Comparing computer experiments for the Gaussian process model using integrated prediction variance, Quality Engineering, 25, 164-174. https://doi.org/10.1080/08982112.2012.758284
  26. Slonski M (2011). Bayesian neural networks and Gaussian processes in identification of concrete properties, Computer Assisted Mechanics and Engineering Science, 18, 291-302.
  27. Stevenson MD, Oakley J, and Chilcott JB (2004). Gaussian process modeling in conjunction with individual patient simulation modeling: a case study describing the calculation of cost-effectiveness ratios for the treatment of established osteoporosis, Medical Decision Making, 24, 89-100. https://doi.org/10.1177/0272989X03261561
  28. Surjanovic S and Bingham D (2015). Virtual Library of Simulation Experiments: test functions and datasets: emulation/prediction test problems, Retrieved January 20, 2017, from: https://www.sfu.ca/-ssurjano/emulat.html
  29. Tagade PM, Jeong BM, and Choi HL (2013). A Gaussian process emulator approach for rapid contaminant characterization with an integrated multizone-CFD model, Building and Environment, 70, 232-244. https://doi.org/10.1016/j.buildenv.2013.08.023
  30. Welch WJ, Buck RJ, Sacks J, Wynn HP, Mitchell TJ, and Morris MD (1992). Screening, predicting and computer experiments, Technometrics, 34, 15-25. https://doi.org/10.2307/1269548
  31. Zhang J, Taflanidis AA, and Medina JC (2017). Sequential approximate optimization for design under uncertainty problems utilizing Kriging metamodeling in augmented input space, Computer Methods in Applied Mechanics and Engineering, 315, 369-395. https://doi.org/10.1016/j.cma.2016.10.042

Cited by

  1. Genetic association tests when a nuisance parameter is not identifiable under no association vol.24, pp.6, 2017, https://doi.org/10.29220/CSAM.2017.24.6.663