Browse > Article
http://dx.doi.org/10.5351/KJAS.2020.33.5.569

Divide and conquer kernel quantile regression for massive dataset  

Bang, Sungwan (Department of Mathematics, Korea Military Academy)
Kim, Jaeoh (Center for Army Analysis and Simulation, HQs ROKA)
Publication Information
The Korean Journal of Applied Statistics / v.33, no.5, 2020 , pp. 569-578 More about this Journal
Abstract
By estimating conditional quantile functions of the response, quantile regression (QR) can provide comprehensive information of the relationship between the response and the predictors. In addition, kernel quantile regression (KQR) estimates a nonlinear conditional quantile function in reproducing kernel Hilbert spaces generated by a positive definite kernel function. However, it is infeasible to use the KQR in analysing a massive data due to the limitations of computer primary memory. We propose a divide and conquer based KQR (DC-KQR) method to overcome such a limitation. The proposed DC-KQR divides the entire data into a few subsets, then applies the KQR onto each subsets and derives a final estimator by aggregating all results from subsets. Simulation studies are presented to demonstrate the satisfactory performance of the proposed method.
Keywords
divide and conquer; kernel; quadratic programming; quantile regression;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Bang, S., Eo, S-H., Cho, Y., Jhun, M., and Cho, H. (2016). Non-crossing weighted kernel quantile regression with right censored data, Lifetime Data Analysis, 22, 100-121.   DOI
2 Bang, S. and Shin, S. (2016). A comparison study of multiple linear quantile regression using non-crossing constraints, The Korean Journal of Applied Statistics, 29, 773-786.   DOI
3 Bertin-Mahieux, T., Ellis, D., Whitman, B., and Lamere, P. (2011). The million song dataset, In Proceedings of the 12th International Conference on Music Information Retrieval(IS-MIR).
4 Caputo, B., Sim, K., Furesjo, F., and Smola, A. (2002). Appearance-based object recognition using SVMs: Which kernel should I use?. In Proceedings of INPS workshop on Statistical methods for computational Experiments in Visual Processing and Computer Vision, 149-158.
5 Chen, X., Liu, W., and Zhang, Y. (2018). Quantile regression under memory constraint, arXiv preprint arXiv:1810.08264.
6 Chen, L. and Zhou, Y. (2020). Quantile regression in big data: A divide and conquer based strategy, Computational Statistics and Data Analysis, 144, 1-17.
7 Chen, X., and Xie, M. G. (2014). A split-and-conquer approach for analysis of extraordinarily large data, Statistica Sinica, 24, 1655-1684.
8 Cole, T. and Green, P. (1992). Smoothing Reference Centile Curves: The LMS Method and Penalized Likelihood, Statistics in Medicine, 11, 1305-1319.   DOI
9 Dhillon, I., Guan, Y., and Kulis, B. (2004). Kernel k-means, spectral clustering and normalized cuts, KDD 2004, 551-556.
10 Fan, T., Lin, D., and Cheng, K. (2007). Regression analysis for massive datasets, Data and Knowledge Engineering, 61, 554-562.   DOI
11 Heagerty, P. and Pepe, M. (1999). Semiparametric Estimation of Regression Quantiles with Application to Standardizing Weight for Height and Age in U.S. Children, The Journal of the Royal Statistical Society, Series C (Applied Statistics), 48, 533-551.   DOI
12 Jiang, R., Hu, X., Yu, K., and Qian, W. (2018). Composite quantile regression for massive datasets, Statistics, 52, 980-1004.   DOI
13 Kang, J. and Jhun, M. (2020). Divide-and-conquer random sketched kernel ridge regression for large-scale data, Journal of the Korean Data & Information Science Society, 31, 15-23.   DOI
14 Karatzoglou, A., Smola, A., Hornik, K., and Zeileis, A. (2004). kernlab-An S4 package for kernel methods in R, Journal of Statistical Software, 11, 1-20.
15 Kimeldorf, G. and Wahba, G. (1971). Some results on Tchebycheffian spline functions, Journal of Mathematical Analysis and Applications, 33, 82-95.   DOI
16 Koenker, R. and Bassett, G. (1978). Regression quantiles, Econometrica, 4, 33-50.   DOI
17 Koenker, R. and Geling, R. (2001). Reappraising Medfly Longevity: A Quantile Regression Survival Analysis, Journal of the American Statistical Association, 96, 458-468.   DOI
18 Koenker, R. and Hallock, K. (2001). Quantile Regression, Journal of Economic Perspectives, 15, 143-156.   DOI
19 Li, Y., Liu, Y., and Zhu, J. (2007). Quantile regression in reproducing kernel Hilbert spaces, Journal of the American Statistical Association, 102, 255-268.   DOI
20 Li, Y. and Zhu, J. (2008). L1-norm quantile regression, Journal of Computational and Graphical Statistics, 17, 1-23.   DOI
21 Lin, N. and Xi, R. (2011). Aggregated estimating equation estimation, Statistics and Its Interface, 4, 73-83.   DOI
22 Powell, D. and Wagner, J. (2014). The exporter productivity premium along the productivity distribution: evidence from quantile regression with nonadditive firm fixed effects, Review of World Economics, 150, 763-785.   DOI
23 Vapnik, V. N. (1998). Statistical Learning Theory, Wiley, New York.
24 Wang, H. and He, X. (2007). Detecting differential expressions in genechip microarray studies: A quantile approach, Journal of the American Statistical Association, 102, 104-112.   DOI
25 Wu, Y. and Liu, Y. (2009). Stepwise multiple quantile regression estimation using non-crossing constraints, Statistics and Its Interface, 2, 299-310.   DOI
26 Xu, Q., Cai, C., Jiang, C., Sun, F., and Huang, X. (2020). Block average quantile regression for massive dataset, Statistical Papers, 61, 141-165.   DOI
27 Yang, H. and Liu, H. (2016). Penalized weighted composite quantile estimators with missing covariates, Statistical Papers, 57, 69-88.   DOI
28 Zhang, Y., Duchi, J., andWainwright, M. (2015). Divide and conquer kernel ridge regression: A distributed algorithm with minimax optimal rates, Journal of Machine Learning Research, 16, 3299-3340.