Browse > Article
http://dx.doi.org/10.5351/KJAS.2022.35.2.217

Model selection via Bayesian information criterion for divide-and-conquer penalized quantile regression  

Kang, Jongkyeong (Department of Information Statistics, Kangwon National University)
Han, Seokwon (Department of Mathematics, Korea Military Academy)
Bang, Sungwan (Department of Mathematics, Korea Military Academy)
Publication Information
The Korean Journal of Applied Statistics / v.35, no.2, 2022 , pp. 217-227 More about this Journal
Abstract
Quantile regression is widely used in many fields based on the advantage of providing an efficient tool for examining complex information latent in variables. However, modern large-scale and high-dimensional data makes it very difficult to estimate the quantile regression model due to limitations in terms of computation time and storage space. Divide-and-conquer is a technique that divide the entire data into several sub-datasets that are easy to calculate and then reconstruct the estimates of the entire data using only the summary statistics in each sub-datasets. In this paper, we studied on a variable selection method using Bayes information criteria by applying the divide-and-conquer technique to the penalized quantile regression. When the number of sub-datasets is properly selected, the proposed method is efficient in terms of computational speed, providing consistent results in terms of variable selection as long as classical quantile regression estimates calculated with the entire data. The advantages of the proposed method were confirmed through simulation data and real data analysis.
Keywords
Bayesian information criterion; divide-and-conquer; large-scale data; quantile regression; variable selection;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Belloni A and Chernozhukov V (2011). L1-penalized quantile regression in high-dimensional sparse models, The Annals of Statistics, 39, 82-130.   DOI
2 Bertin-Mahieux T, Ellis DP, Whitman B, and Lamere P (2011). The million song dataset. In Proceedings of the 12th International Conference on Music Information Retrieval (IS-MIR).
3 Chen L and Zhou Y (2020). Quantile regression in big data: A divide and conquer based strategy, Computational Statistics & Data Analysis, 144, 106892.   DOI
4 Scott SL, Blocker AW, Bonassi FV, Chipman HA, George EI, and McCulloch RE (2016). Bayes and big data: The consensus Monte Carlo algorithm, International Journal of Management Science and Engineering Management, 11, 78-88.
5 Scott SL (2017). Comparing consensus Monte Carlo strategies for distributed Bayesian computation, Brazilian Journal of Probability and Statistics, 31, 668-685.   DOI
6 Tibshirani R (1996). Regression shrinkage and selection via the LASSO, Journal of the Royal Statistical Society: Series B, 58, 267-288.
7 Fan TH, Lin DK, and Cheng KF (2007). Regression analysis for massive datasets, Data & Knowledge Engineering, 61, 554-562.   DOI
8 Chen X and Xie MG (2014). A split-and-conquer approach for analysis of extraordinarily large data, Statistica Sinica, 165-1684.
9 Cook L (2014). Gendered parenthood penalties and premiums across the earnings distribution in Australia, the United Kingdom, and the United States. European Sociological Review, 30, 360-372.   DOI
10 Dasgupta A, Drineas P, Harb B, Kumar R, and Mahoney MW (2009). Sampling algorithms and corsets for lp regression, SIAM Journal on Computing, 38, 2060-2078.   DOI
11 Jiang R, Qian W, and Zhou Z (2016). Single-index composite quantile regression with heteroscedasticity and general error distributions, Statistical Papers, 57, 185-203.   DOI
12 Yang J, Meng X, and Mahoney MW (2014). Quantile regression for large-scale applications, SIAM Journal on Scientific Computing, 36, 78-110.
13 Srivastava S, Cevher V, Dinh Q, and Dunson D (2015). WASP: Scalable Bayes via barycenters of subset posteriors, In Artificial Intelligence and Statistics, 912-920.
14 Li Y (2008). L1-norm quantile regression, Journal of Computational and Graphical Statistics, 17, 163-185.   DOI
15 Li R, Lin DK, and Li B (2013). Statistical inference in massive data sets, Applied Stochastic Models in Business and Industry, 29, 399-409.   DOI
16 Powell D and Wagner J (2014). The exporter productivity premium along the productivity distribution: evidence from quantile regression with nonadditive firm fixed effects, Review of World Economics, 150, 763-785.   DOI
17 Xu Q, Cai C, Jiang C, Sun F, and Huang X (2020). Block average quantile regression for massive dataset, Statistical Papers, 61, 141-165.   DOI
18 Heagerty P and Pepe M (1999). Semiparametric estimation of regression quantiles with application to standardizing weight for height and age in U.S. children, The Journal of the Royal Statistical Society, Series C (Applied Statistics), 48, 533-551.   DOI
19 Cole T and Green P (1992). Smoothing reference centile curves: The LMS method and penalized likelihood, Statistics in Medicine, 11, 1305-1319.   DOI
20 Fan J and Li R (2001). Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American statistical Association, 96, 1348-1360.   DOI
21 Kang J and Jhun M (2020). Divide-and-conquer random sketched kernel ridge regression for large-scale data analysis, Journal of Korean Data & Information Science Society, 31, 15-23.   DOI
22 Ning Z and Tang L (2014). Estimation and test procedures for composite quantile regression with covariates missing at random, Statistics & Probability Letters, 95, 15-25.   DOI
23 Peng B and Wang L (2015). An iterative coordinate descent algorithm for high-dimensional nonconvex penalized quantile regression, Journal of Computational and Graphical Statistics, 24, 676-694.   DOI
24 Portnoy S and Koenker R (1997). The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators, Statistical Science, 12, 279-300.   DOI
25 Wang H, Li B, and Leng C (2009). Shrinkage tuning parameter selection with a diverging number of parameters, Journal of the Royal Statistical Society, Series B, 71, 671-683.   DOI
26 Wang H, Li R, and Tsai CL (2007). Tuning parameter Selectors for the Smoothly Clipped Absolute Deviation Method, Biometrika, 94, 553-568.   DOI
27 Koenker R and Ng P (2005). A Frisch-Newton algorithm for sparse quantile regression, Acta Mathematicae Applicatae Sinica, 21, 225-236.   DOI
28 Frank I and Friedman J (1993). A statistical view of some chemometrics regression tools, Technometrics, 35, 109-148.   DOI
29 Jung BH and Lim DH (2017). Comparison analysis of big data integration models, Journal of the Korean Data & Information Science Society, 28, 755-768.
30 Kim KH and Shin SJ (2017). Adaptive ridge procedure for L0-penalized weighted support vector machines, Journal of the Korean Data & Information Science Society, 28, 1271-1278.   DOI
31 Xue J and Liang F (2019). Double-parallel Monte Carlo for Bayesian analysis of big data, Statistics and Computing, 29, 23-32   DOI
32 Okada K and Samreth S (2012). The effect of foreign aid on corruption: a quantile regression approach, Economics Letters, 115, 240-243.   DOI
33 Zhang Y, Duchi J, and Wainwright M (2015). Divide and conquer kernel ridge regression: A distributed algorithm with minimax optimal rates. The Journal of Machine Learning Research, 16, 3299-3340.
34 Alhamzawi R (2015). Model selection in quantile regression models, Journal of Applied Statistics, 42, 445-458.   DOI
35 Bang S and Shin S (2016). A comparison study of multiple linear quantile regression using non-crossing constraints, The Korean Journal of Applied Statistics, 29, 773-786.   DOI
36 Wang H and Leng C (2007). Unified Lasso estimation by least squares approximation, Journal of the American Statistical Association, 102, 1418-1429.
37 Lee E, Noh H and Park B (2014). Model selection via Bayesian information criterion for quantile regression models, Journal of the American Statistical Association, 109, 216-229.   DOI
38 Wang L, Wu Y, and Li R (2012). Quantile regression for analyzing Heterogeneity in Ultra-High Dimension, Journal of American Statistical Association, 107, 214-222.   DOI
39 Wu Y and Liu Y (2009). Variable selection in quantile regression. Statistica Sinica, 801-817.