DOI QR코드

DOI QR Code

A small review and further studies on the LASSO

  • Kwon, Sunghoon (Department of Applied Statistics, Konkuk University) ;
  • Han, Sangmi (Department of Statistics, Seoul National University) ;
  • Lee, Sangin (Department of Statistics, Seoul National University)
  • Received : 2013.07.04
  • Accepted : 2013.08.23
  • Published : 2013.09.30

Abstract

High-dimensional data analysis arises from almost all scientific areas, evolving with development of computing skills, and has encouraged penalized estimations that play important roles in statistical learning. For the past years, various penalized estimations have been developed, and the least absolute shrinkage and selection operator (LASSO) proposed by Tibshirani (1996) has shown outstanding ability, earning the first place on the development of penalized estimation. In this paper, we first introduce a number of recent advances in high-dimensional data analysis using the LASSO. The topics include various statistical problems such as variable selection and grouped or structured variable selection under sparse high-dimensional linear regression models. Several unsupervised learning methods including inverse covariance matrix estimation are presented. In addition, we address further studies on new applications which may establish a guideline on how to use the LASSO for statistical challenges of high-dimensional data analysis.

Keywords

References

  1. Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Proceedings Second International Symposium on Information Theory, 267-281.
  2. Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D. and Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences of the USA, 96, 6745-6750.
  3. Banerjee, O., El Ghaoui, L. and d'Aspremont, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. Journal of Machine Learning Research, 9, 485-516.
  4. Bickel, P. J., Ritov, Y. A. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and dantzig selector. The Annals of Statistics, 37, 1705-1732. https://doi.org/10.1214/08-AOS620
  5. Breiman, L. (1996). Heuristics of instability and stabilization in model selection. The Annals of Statistics, 24, 2350-2383. https://doi.org/10.1214/aos/1032181158
  6. Choi, J., Zou, H. and Oehlert, G. (2010a). A penalized maximum likelihood approach to sparse factor analysis. Statistics and its Interface, 3, 429-436. https://doi.org/10.4310/SII.2010.v3.n4.a1
  7. Choi, N. H., Li, W. and Zhu, J. (2010b). Variable selection with the strong heredity constraint and its oracle property. Journal of the American Statistical Association, 105, 354-364. https://doi.org/10.1198/jasa.2010.tm08281
  8. Donoho, D. and Johnstone, I. (1994). Ideal spatial adaptation via wavelet shrinkages. Biometrika, 81, 425-455. https://doi.org/10.1093/biomet/81.3.425
  9. Dudoit, S., Fridlyand, J. and Speed, T. P. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 97, 77-87. https://doi.org/10.1198/016214502753479248
  10. Edward, D. (2000). Introduction to graphical modelling, Second edition, Springer, New York.
  11. Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. The Annals of Statis- tics, 32, 407-499. https://doi.org/10.1214/009053604000000067
  12. Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348-1360. https://doi.org/10.1198/016214501753382273
  13. Fan, J. and Li, R. (2002). Variable selection for Cox's proportional hazards model and frailty model. The Annals of Statistics, 30, 74-99. https://doi.org/10.1214/aos/1015362185
  14. Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. The Annals of Statistics, 32, 928-961. https://doi.org/10.1214/009053604000000256
  15. Fan, Y. and Tang, C. Y. (2012). Tuning parameter selection in high dimensional penalized likelihood. Journal of the Royal Statistical Society B, 75, 671-683
  16. Friedman, J., Hastie, T., Ho ing, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. The Annals of Applied Statistics, 1, 302-332. https://doi.org/10.1214/07-AOAS131
  17. Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9, 432-441. https://doi.org/10.1093/biostatistics/kxm045
  18. Friedman, J., Hastie, T. and Tibshirani, R. (2010). A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:1001.0736.
  19. Hastie, T., Tibshirani, R. and Friedman, J. (2001). The elements of statistical learning, Springer, New York.
  20. Hirose, K., Tateishi, S. and Konishi, S. (2012). Tuning parameter selection in sparse regression modeling. Computational Statistics and Data Analysis, 59, 28-40.
  21. Huang, J., Breheny, P. and Ma, S. (2012). A selective review of group selection in high-dimensional models. Statistical Science, 27, 481-499. https://doi.org/10.1214/12-STS392
  22. Huang, J., Horowitz, J. L. and Ma, S. (2008a). Asymptotic properties of bridge estimators in sparse highdimensional regression models. The Annals of Statistics, 36, 587-613. https://doi.org/10.1214/009053607000000875
  23. Huang, J., Ma, S. and Zhang, C.-H. (2008b). Adaptive lasso for sparse high-dimensional regression models. Statistica Sinica, 18, 1603-1618.
  24. Hwang, C., Kim, M. S. and Shim, J. (2011). Variable selection in ${\ell}_1$ penalized censored regression. Journal of the Korean Data & Information Science Society, 22, 951-959.
  25. Jolliffe, I. T., Trendafilov, N. T. and Uddin, M. (2003). A modified principal component technique based on the lasso. Journal of Computational and Graphical Statistics, 12, 531-547. https://doi.org/10.1198/1061860032148
  26. Kim, Y., Jun, C.-H. and Lee, H. (2011). A new classification method using penalized partial least squares. Journal of the Korean Data & Information Science Society, 22, 931-940.
  27. Kim, Y. and Kwon, S. (2012). Global optimality of nonconvex penalized estimators. Biometrika, 99, 315-325. https://doi.org/10.1093/biomet/asr084
  28. Kim, Y., Kwon, S. and Choi, H. (2012). Consistent model selection criteria on high dimensions. The Journal of Machine Learning Research, 13, 1037-1057.
  29. Kwon, S. and Kim, Y. (2011). Large sample properties of the scad-penalized maximum likelihood estimation on high dimensions. Statistica Sinica, 22, 629-653.
  30. Lee, S. and Lee, K. (2012). Detecting survival related gene sets in microarray analysis. Journal of the Korean Data & Information Science Society, 23, 1-11. https://doi.org/10.7465/jkdi.2012.23.1.001
  31. Leng, C., Lin, Y. and Wahba, G. (2006). A note on the lasso and related procedures in model selection. Statistica Sinica, 16, 1273-1284.
  32. Meinshausen, N. and Bulmann, P. (2006). High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, 34, 1436-1462. https://doi.org/10.1214/009053606000000281
  33. Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representation for high-dimensional data. The Annals of Statistics, 37, 246-270. https://doi.org/10.1214/07-AOS582
  34. Park, M. and Hastie, T. (2007). L1-regularization path algorithm for generalized linear models. Journal of the Royal Statistical Society B, 69, 659-667. https://doi.org/10.1111/j.1467-9868.2007.00607.x
  35. Peng, J., Wang, P., Zhou, N. and Zhu, J. (2009). Partial correlation estimation by joint sparse regression models. Journal of the American Statistical Association, 104, 735-746. https://doi.org/10.1198/jasa.2009.0126
  36. Pourahmadi, M. (2011). Covariance estimation: The glm and regularization perspectives. Statistical Science, 26, 369-387. https://doi.org/10.1214/11-STS358
  37. Raskutti, G., Wainwright, M. J. and Yu, B. (2011). Minimax rates of estimation for high-dimensional linear regression over $L_q$-balls. IEEE Transactions on Information Theory, 57, 6979-6994.
  38. Rosset, S. and Zhu, J. (2007). Piecewise linear regularized solution paths. The Annals of Statistics, 35, 1012-1030. https://doi.org/10.1214/009053606000001370
  39. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461-464. https://doi.org/10.1214/aos/1176344136
  40. Shao, J. (1997). An asymptotic theory for linear model selection. Statistica Sinica, 7, 221-242.
  41. Shen, X. and Ye, J. (2002). Adaptive model selection. Journal of the American Statistical Association, 97, 210.221. https://doi.org/10.1198/016214502753479356
  42. Tibshirani, R., Saunders, M., Rosset, S. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society B, 67, 91-108. https://doi.org/10.1111/j.1467-9868.2005.00490.x
  43. Tibshirani, R. J. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society B, 58, 267-288.
  44. Tibshirani, R. J. and Taylor, J. (2011). The solution path of the generalized lasso. The Annals of Statistics, 39, 1335-1371. https://doi.org/10.1214/11-AOS878
  45. Wang, H., Li, B. and Leng, C. (2009). Shrinkage tuning parameter selection with a diverging number of parameters. Journal of Royal Statistical Society B, 71, 671-683. https://doi.org/10.1111/j.1467-9868.2008.00693.x
  46. Wang, H., Li, R. and Tsai, C. (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika, 94, 553-568. https://doi.org/10.1093/biomet/asm053
  47. Wang, T. and Zhu, L. (2011). Consistent tuning parameter selection in high dimensional sparse linear regression. Journal of Multivariate Analysis, 102, 1141-1151. https://doi.org/10.1016/j.jmva.2011.03.007
  48. Ye, J. (1998). On measuring and correcting the effects of data mining and model selection. Journal of the American Statistical Association, 93, 120-131. https://doi.org/10.1080/01621459.1998.10474094
  49. Yuan, M. (2008). Efficient computation of l1 regularized estimates in gaussian graphical models. Journal of Computational and Graphical Statistics, 17, 809-826. https://doi.org/10.1198/106186008X382692
  50. Yuan, M. (2010). High dimensional inverse covariance matrix estimation via linear programming. Journal of Machine Learning Research, 99, 2261-2286.
  51. Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society B, 68, 49-67. https://doi.org/10.1111/j.1467-9868.2005.00532.x
  52. Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38, 894-942. https://doi.org/10.1214/09-AOS729
  53. Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the lasso selection in high-dimensional linear regression. The Annals of Statistics, 36, 1567-1594. https://doi.org/10.1214/07-AOS520
  54. Zhang, C.-H. and Zhang, T. (2012). A general theory of concave regularization for high-dimensional sparse estimation problems. Statistical Science, 27, 576-593. https://doi.org/10.1214/12-STS399
  55. Zhang, T. (2009). Some sharp performance bounds for least squares regression with $L_1$ regularization. The Annals of Statistics, 37, 2109-2144. https://doi.org/10.1214/08-AOS659
  56. Zhao, P. and Yu, B. (2006). On model selection consistency of lasso. Journal of Machine Learning Reserach, 7, 2541-2563.
  57. Zhou, S., van de Geer, S. and Bulmann, P. (2009). Adaptive lasso for high dimensional regression and gaussian graphical modeling. arXiv preprint arXiv:0903.2515.
  58. Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418-1429. https://doi.org/10.1198/016214506000000735
  59. Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society B, 67, 301-320. https://doi.org/10.1111/j.1467-9868.2005.00503.x
  60. Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computa- tional and Graphical Statistics, 15, 265-286. https://doi.org/10.1198/106186006X113430
  61. Zou, H. and Zhang, H. (2009). On the adaptive elastic net with a diverging number of parameters. The Annals of Statistics, 37, 1733-1751. https://doi.org/10.1214/08-AOS625

Cited by

  1. A note on standardization in penalized regressions vol.26, pp.2, 2015, https://doi.org/10.7465/jkdi.2015.26.2.505
  2. BCDR algorithm for network estimation based on pseudo-likelihood with parallelization using GPU vol.27, pp.2, 2016, https://doi.org/10.7465/jkdi.2016.27.2.381
  3. 페널티 방법을 이용한 주성분분석 연구 vol.28, pp.4, 2013, https://doi.org/10.7465/jkdi.2017.28.4.721