Browse > Article
http://dx.doi.org/10.7465/jkdi.2017.28.6.1229

ADMM algorithms in statistics and machine learning  

Choi, Hosik (Department of Applied Statistics, Kyonggi University)
Choi, Hyunjip (Department of Applied Statistics, Kyonggi University)
Park, Sangun (Department of Management Information System, Kyonggi University)
Publication Information
Journal of the Korean Data and Information Science Society / v.28, no.6, 2017 , pp. 1229-1244 More about this Journal
Abstract
In recent years, as demand for data-based analytical methodologies increases in various fields, optimization methods have been developed to handle them. In particular, various constraints required for problems in statistics and machine learning can be solved by convex optimization. Alternating direction method of multipliers (ADMM) can effectively deal with linear constraints, and it can be effectively used as a parallel optimization algorithm. ADMM is an approximation algorithm that solves complex original problems by dividing and combining the partial problems that are easier to optimize than original problems. It is useful for optimizing non-smooth or composite objective functions. It is widely used in statistical and machine learning because it can systematically construct algorithms based on dual theory and proximal operator. In this paper, we will examine applications of ADMM algorithm in various fields related to statistics, and focus on two major points: (1) splitting strategy of objective function, and (2) role of the proximal operator in explaining the Lagrangian method and its dual problem. In this case, we introduce methodologies that utilize regularization. Simulation results are presented to demonstrate effectiveness of the lasso.
Keywords
Constraint; optimization; parallel computing; penalty function; regularization;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Boyd, S. and Vandenberghe, L. (2004). Convex optimization. Cambridge University Press.
2 Chen, X., Lin, Q., Kim, S., Carbonell, J. G., and Xing, E. P. (2012). Smoothing proximal gradient method for general structured sparse regression. Annals of Applied Statistics, 12, 719-752.
3 Chi, E. C. and Lange, K. (2015). Splitting methods for convex clustering. Journal of Computational and Graphical Statistics, 24, 994-1013.   DOI
4 Choi, H., Koo, J.-Y., and Park, C. (2015). Fused least absolute shrinkage and selection operator for credit scoring. Journal of Statistical Computation and Simulation, 85, 2135-2147.   DOI
5 Choi, H. and Lee, S. (2017). Convex clustering for binary data. Technical Report.
6 Choi, H. and Park, C. (2016). Clustering analysis of particulate matter data using shrinkage boxplot. Journal of the Korean Data Analysis Society, 18, 2435-2443.
7 Choi, H., Park, H., and Park, C. (2013). Support vector machines for big data analysis. Journal of the Korean data & Information Science Society, 24, 989-998.   DOI
8 Davis, D. and Wotao, Y. (2016). Convergence rate analysis of several splitting schemes, Splitting methods in communication, imaging, science, and engineering, Springer International Publishing, 115-163.
9 Fang, E. X., He, B., Liu, H., and Yuan, X. (2015). Generalized alteranting direction method of multipliers: new theoretical insights and applications. Mathematical Programming Computation, 7, 149-187.   DOI
10 Forero, P. A., Cano, A., and Giannakis, G. B. (2010). Consensus-based distributed support vector machines. Journal of Machine Learning Research, 6, 2873-2898.
11 Hastie, T., Tibshirani, and R. Wainwright, M. (2016). Statistical learning with sparsity: The lasso and generalizations, CRC Press.
12 Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B, 68, 49-67.   DOI
13 Yan, X. and Bien, J. (2015). Hierarchical sparse modeling: A choice of two group lasso formulations. Technical Report.
14 Yu, G. and Liu, Y. (2016). Sparse regression incorporating graphical structure among predictors. Journal of the American Statistical Association, 111, 707-720.   DOI
15 Zhang, X., Burger, M., Bresson, X., and Osher, S. (2010). Bregmanized nonlocal regularization for deconvolution and sparse reconstruction. SIAM Journal of Imaging Science, 3, 253-276.   DOI
16 Zhang, X., Burger, M., and Osher, S. (2011). A unified primal-dual algorithm framework based on Bregman iteration. Journal of Scientific Computing, 46, 20-46.   DOI
17 Zhao, P., Rocha, G., and Yu, B. (2009). The composite absolute penalties family for grouped and hierarchical variable selection. The Annals of Statistics, 37, 3468-3497.   DOI
18 Zhu, Y. (2017). An augmented ADMM algorithm with application to the generalized lasso Problem. Journal of Computational and Graphical Statistics, 26, 195-204.   DOI
19 Hwang, C. and Shim, J. (2017). Geographically weighted least squares-support vector machine. Journal of the Korean Data & Information Science Society, 28, 227-235.   DOI
20 Hallac, D., Leskovec, J., and Boyd, S. (2015). Network lasso: Clustering and optimization in large graphs. Proceeding KDD '15 Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 387-396.
21 He, B. and Yuan, X. (2012). On the o(1/n) convergence rate of teh Douglas-Rachford alternating direction method. SIAM Journal on Numerical Analysis, 50, 700-709.   DOI
22 Jeon, J. and Choi, H. (2016). The sparse Luce model. Applied Intelligence, https://doi.org/10.1007/s10489-016-0861-4, In press.   DOI
23 Jeon, J., Kwon, S., and Choi, H. (2017). Homogeneity detection for the high-dimensional generalized linear model. Computational Statistics and Data Analysis, 114, 61-74.   DOI
24 Kim, S. and Xing, E. P. (2012). Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping. The Annals of Applied Statistics, 6, 1095-1117.   DOI
25 Lange, K. (2016). MM optimization algorithms. SIAM-Society for Industrial and Applied Mathematics.
26 Parekh, A. and Selesnick, I. W. (2017). Improved sparse low-rank matrix estimation. arXiv:1605.00042v2.
27 Parikh, N. and Boyd, S. (2013). Proximal algorithms. Foundations and Trends in Optimization. 1, 123-231.
28 Park, C., Kim, Y., Kim, J., Song, J., and Choi, H. (2015). Data mining using R. 2nd edition, Kyowoosa.
29 Polson, N. G., Scott, J. G., and Willard, B. T. (2015). Proximal algorithms in statistics and machine learning. Statistical Science, 30, 559-581.   DOI
30 Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2, 183-202.   DOI
31 Bertsekas, D. P. (2003). Nonlinear Programming. 2nd edition, Athena Scientific.
32 Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J. (2010). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Optimization, 3, 1-122.
33 Tibshirani, R. J. and Taylor, J. (2011). The solution path of the generalized lasso. The Annals of Statistics, 39, 1335-1371.   DOI
34 Taylor, G., Burmeister, R., Xu, Z., Singh, B., Patel, A., Goldstein, T. (2016). Training neural networks without gradients: A scalable ADMM approach. CoRR, arXiv1605.02026.
35 Tibshirani, R. J., Hoefling, H., and Tibshirani, R. (2011). Nearly-Isotonic regression. Technometrics, 53, 54-61.   DOI
36 Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., and Knight, K. (2005). Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society Series B, 67, 91-108.   DOI
37 Tibshirani, R. (2014). Adaptive piecewise polynomial estimation via trend filtering. The Annals of Statistics, 42, 285-323.   DOI
38 Wen, W., Wu, C., Wang, Y., Chen, Y., and Li, H. (2016). Learning structured sparsity in deep neural networks. In Neural Information Processing Systems, 2074-2082.
39 Xu, Z., Taylor, G., Li, H., Figueiredo, M., Yuan, X., and Goldstein, T. (2017). Adaptive consensus ADMM for distributed optimization. arXiv:1706.02869v2.
40 Ramdas, A. and Tibshirani, R. (2016). Fast and flexible admm algorithms for trend filtering. Journal of Computational and Graphical Statistics, 25, 839-858.   DOI
41 Xu, Y., Yin, W., Wen, Z., and Zhang, Y. (2012). An alternating direction algorithm for matrix completion with nonnegative factors. Frontiers of Mathematics in China, 7, 365-384.   DOI
42 Yang, Y., Sun, J., Li, H., and Xu, Z. (2017). ADMM-Net: A deep learning approach for compressive sensing MRI. CoRR, arXiv:1705.06869.
43 Yin, W., Osher, S., Goldfarb, D., and Darbon, J. (2008). Bregman iterative algorithms for l1-minimization with applications to compressed sensing. SIAM Journal on Imaging Sciences, 1, 143-168.   DOI