Minimum Message Length and Classical Methods for Model Selection in Univariate Polynomial Regression

  • Received : 2004.12.04
  • Published : 2005.12.31

Abstract

The problem of selection among competing models has been a fundamental issue in statistical data analysis. Good fits to data can be misleading since they can result from properties of the model that have nothing to do with it being a close approximation to the source distribution of interest (for example, overfitting). In this study we focus on the preference among models from a family of polynomial regressors. Three decades of research has spawned a number of plausible techniques for the selection of models, namely, Akaike's Finite Prediction Error (FPE) and Information Criterion (AIC), Schwartz's criterion (SCH), Generalized Cross Validation (GCV), Wallace's Minimum Message Length (MML), Minimum Description Length (MDL), and Vapnik's Structural Risk Minimization (SRM). The fundamental similarity between all these principles is their attempt to define an appropriate balance between the complexity of models and their ability to explain the data. This paper presents an empirical study of the above principles in the context of model selection, where the models under consideration are univariate polynomials. The paper includes a detailed empirical evaluation of the model selection methods on six target functions, with varying sample sizes and added Gaussian noise. The results from the study appear to provide strong evidence in support of the MML- and SRM- based methods over the other standard approaches (FPE, AIC, SCH and GCV).

Keywords

References

  1. Chemometrics and Intelligent Laboratory Systems v.76 no.2 Model Selection through a Statistical Analysis of the Minimum of a Weighted Least Squares Cost Function Anouk de Brauwere;Fjo De Ridder;Rik Pintelon;Marc Elskens;Schoukens, Johan;Willy Baeyens
  2. Annals of the Institute of Statistical Mathematics v.21 Fitting Autoregressive Models for Prediction Akaike, H.
  3. Annals of the Institute of Statistical Mathematics v.22 Statistical Predictor Information Akaike, H.
  4. Ann. Stat. v.6 Estimating the Dimension of a Model Schwarz, G.
  5. Numerical Math. v.31 Smoothing Noisy Data with Spline Functions: Estimating the Correct Degree of Smoothing by the Method of Generalized Crossvalidation Craven, P.;Wahba, G.
  6. Computer Journal v.11 no.2 An Information Measure for Classification Wallace, C.S.;Boulton, D.M.
  7. J. R. Statist. Soc B v.49 no.3 Estimation and Inference by Compact Coding Wallace, C.S.;Freeman, P.R.
  8. Computer J. v.42 no.4 Minimum Message Length and Kolmogorov Complexity Wallace, C.S.;Dowe, D.L.
  9. Estimation of Dependencies Based on Empirical Data Vapnik, V.N.
  10. The Nature of Statistical Learning Theory Vapnik, V.N.
  11. On the Selection of the Order of a Polynomial Model, Technical Report Wallace, C.S.
  12. Proc. 7th Int. Workshop on Artif. Intell. and Stats A Note on the Comparison of Polynomial Selection Methods Viswanathan, M.;Wallace, C.
  13. Learning from Data: Concepts, Theory, and Method Cherkassky, V.;Mulier, M.
  14. Akaike Information Criterion Statistics Sakamoto, Y.(et al.)
  15. Computational Learning and Probabilistic Reasoning, chapter Structure of Statistical Learning Theory Vapnik, V.N.
  16. Automatica v.14 Modeling by Shortest Data Description Rissanen, J.
  17. Stochastic Complexity in Statistical Inquiry Rissanen, J.
  18. The Computer J. v.42 no.4 Hypothesis Selection and Testing by the MDL Principle Rissanen, J.
  19. Classification Society Bulletin v.3 no.3 An Invariant Bayes Method for Point Estimation Wallace, C.S.;Boulton, D.M.
  20. Proc. 4’th IEEE Data Compression Conf. Model Selection in Linear Regression Using the MML Criterion Baxter, R.A.;Dowe, D.L.;Storer, J.A.(ed.);Cohn, M.(ed.)
  21. TR 276
  22. Problems of Information Transmission v.1 Three Approaches to the Quantitative Definition of Information Kolmogorov, A.N.
  23. J.A.C.M. v.13 On the Length of Programs for Computing Finite Sequences Chaitin, G.J.
  24. J. of the Royal Statistical Society (Series B) v.54 Single Factor Analysis by MML Estimation Wallace, C.S.;Freeman, P.R.
  25. Multiple Factor Analysis by MML Estimation, Technical Report 95/218 Wallace, C.S.
  26. MML Estimation of the von Mises Concentration Parameter, Tech Rept TR 93/193 Wallace, C.S.;Dowe, D.L.
  27. Statistics and Computing v.10 no.1 MMLclustering of Multistate, Poisson, von Mises Circular and Gaussian Distributions Wallace, C.S.;Dowe, D.L.
  28. Proc. 7th Conf. Algorithmic Learning Theory (ALT’96), LNAI 1160 MML Estimation of the Parameters of the Spherical Fisher Distribution Dowe, D.L.;Oliver, J.J.;Wallace, C.S.;Sharma, A(et al.)(ed.)
  29. Computer J. v.41 no.8 Intrinsic Classification of Spatially- Correlated Data Wallace, C.S.
  30. Statistics and Computing v.9 MML Markov Classification of Sequential Data Edgoose, T.;Allison, L.
  31. Proc. 12th Australian Joint Conf. on Artificial Intelligence Finding Cutpoints in Noisy Binary Sequences - a Revised Empirical Evaluation Viswanathan, M.;Wallace, C.S.;Dowe, D.L.;Korb, K.
  32. Proc. 2nd Pacific Asian Conf. on Knowledge Discovery and Data Mining (PAKDD’98) Point Estimation Using the Kullback-Leibler Loss Function and MML Dowe, D.L.;Baxter, R.A.;Oliver, J.J.;Wallace, C.S.
  33. J. Roy. Statist. Soc. Series B v.60 Automatic Bayesian Curve Fitting Denison, D.G.T.;Mallick, B.K.;Smith, A.F.M.
  34. Machine Learning v.20 Support Vector Networks Cortes, Corinna;Vapnik, Vladimir
  35. Advances in Neural Information Processing Systems 9 Support Vector Method for Function Approximation, Regression Estimation, and Signal Processing Vapnik, V.;Golowich, S.;Smola, A.;Mozer, M.(ed.);Jordan, M.(ed.);Petsche, T.(ed.)
  36. ETRI J. v.25 no.5 A New Similarity Measure Based on Intraclass Statistics for Biometric Systems Lee, K.;Park, H.
  37. ETRI J. v.15 no.2 Function Approximation Based on a Network with Kernel Functions of Bounds and Locality : an Approach of Nonparametric Estimation Kil, Rhee M.