Variable selection in censored kernel regression

Choi, Kook-Lyeol;Shim, Jooyong;

doi:10.7465/jkdi.2013.24.1.201

Journal of the Korean Data and Information Science Society

제24권1호
/
Pages.201-209
/
2013
/
1598-9402(pISSN)

한국데이터정보과학회 (The Korean Data and Information Science Society)

DOI QR Code

Variable selection in censored kernel regression

Choi, Kook-Lyeol (Department of Data Science, Inje University) ;
Shim, Jooyong (Department of Data Science, Inje University)

투고 : 2012.12.11
심사 : 2013.01.02
발행 : 2013.01.31

https://doi.org/10.7465/jkdi.2013.24.1.201 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

For censored regression, it is often the case that some input variables are not important, while some input variables are more important than others. We propose a novel algorithm for selecting such important input variables for censored kernel regression, which is based on the penalized regression with the weighted quadratic loss function for the censored data, where the weight is computed from the empirical survival function of the censoring variable. We employ the weighted version of ANOVA decomposition kernels to choose optimal subset of important input variables. Experimental results are then presented which indicate the performance of the proposed variable selection method.

키워드

참고문헌

Brooks, S. P. (1998). Markov Chain Monte Carlo method and its application. The Statistician, 47, 69-100.
Buckley, J. and James, I. (1979). Linear regression with censored data. Biometrika, 66, 429-436. https://doi.org/10.1093/biomet/66.3.429
Cho, D. H., Shim, J. and Seok, K. H. (2010). Doubly penalized kernel method for heteroscedastic autoregressive data. Journal of the Korean Data & Information Science Society, 21, 155-162.
Cox, D. R. (1972). Regression models and life tables (with discussions). Journal of the Royal Statistical Society B, 74, 187-220.
Draper, N. and Smith, H. (1981). Applied regression analysis, 2nd Edition, John Wiley & Sons, Inc., New York.
Gehan, E. A. (1965). Generalized Wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika, 52, 202-223.
Ghosh, K. S. and Ghosal, S. (2006). Semiparametric accelerated failure time models for censored data. Bayesian Statistics and its Applications, 15, 213-229.
Guyon, I., Weston, J., Barnhill, S. and Vapnik, V. (2002). Gene selection for cancer classiﬁcation using support vector machines. Machine Learning, 46, 389-422. https://doi.org/10.1023/A:1012487302797
Hu, S. and Rao, J. S. (2010). Sparse penalization with censoring constraints for estimating high dimensional AFT models with applications to microarray data analysis, Technical Report 07 of Division of Biostatistics, Case Western Reserve University, Ohio.
Huang, J., Ma, S. and Xie, H. (2005). Regularized estimation in the accelerated failure time model with high dimensional covariates, Technical Report No. 349, Department of Statistics and Actuarial Science, The University of Iowa, Iowa.
Hwang, H. (2010a). Fixed size LS-SVM for multiclassiﬁcation problems of large datasets. Journal of the Korean Data & Information Science Society, 21, 561-567.
Hwang, H. (2010b). Variable selection for multiclassiﬁcation by LS-SVM. Journal of the Korean Data & Information Science Society, 21, 959-965.
Jin, Z., Lin, D. Y., Wei, L. J. and Ying, Z. (2003). Rank-based inference for the accelerated failure time model. Biometrika, 90, 341-353. https://doi.org/10.1093/biomet/90.2.341
Kaplan, E. L. and Meier, P. (1958). Nonparametric estimation from incomplete observations. Journal of American Statistical Association, 53, 457-481. https://doi.org/10.1080/01621459.1958.10501452
Kimeldorf, G. S. and Wahba, G. (1971). Some results on Tchebycheﬃan spline functions. Journal of Math-ematical Analysis and its Applications, 33, 82-95. https://doi.org/10.1016/0022-247X(71)90184-3
Koo, J. Y., Sohn, I., Kim, S. and Lee, J. W. (2006). Structured polychotomous machine diagnosis of multiple cancer types using gene expression. Bioinformatics, 22, 950-990. https://doi.org/10.1093/bioinformatics/btl029
Koul, H., Susarla, V. and Van Ryzin J. (1981). Regression analysis with randomly right censored data. The Annal of Statistics, 9, 1276-1288. https://doi.org/10.1214/aos/1176345644
Krall, J. N., Utho, V. A. and Harvey, J. B. (1975). A step-up procedure for selecting variables associated with survival. Biometrics, 31, 49-57 https://doi.org/10.2307/2529709
Mercer, J. (1909). Functions of positive and negative type and their connection with theory of integral equations. Philosophical Transactions of Royal Society A, 415-446.
Orbe, J., Ferreira, E. and Nunez-Anton, V. (2003). Censored partial regression. Biostatistics, 4, 109-121. https://doi.org/10.1093/biostatistics/4.1.109
Sauerbrei, W. and Schumacher, M. (1992). A bootstrap resampling procedure for model building: Application to the Cox regression model. Statistical Medicine, 11, 2093-2099. https://doi.org/10.1002/sim.4780111607
Saunders, C., Gammerman, A. and Vovk, V. (1998). Ridge regression learning algorithm in dual variables. Proceedings of the 15th International Conference on Machine Learning, 515-521.
Shim, J. and Lee, J. T. (2009). Kernel method for autoregressive data. Journal of the Korean Data & Information Science Society, 20, 467-472.
Shim, J., Kim, C. and Hwang, C. (2011). Semiparametric least squares support vector machine for accelerated failure time model. Journal of the Korean Statistical Society, 40, 75-83. https://doi.org/10.1016/j.jkss.2010.05.002
Shim, J., Sohn, I., Kim, S., Lee, J. W., Green, P. E. and Hwang, C. (2009). Selecting marker genes for cancer classiﬁcation using supervised weighted kernel clustering and the support vector machine. Computational Statistics and Data Analysis, 53, 1736-1742. https://doi.org/10.1016/j.csda.2008.04.028
Suykens, J. A. K. and Vanderwalle, J. (1999). Least square support vector machine classiﬁer. Neural Pro-cessing Letters, 9, 293-300. https://doi.org/10.1023/A:1018628609742
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B, 58, 267-288.
Tibshirani, R. (1997). The lasso method for variable selection in the Cox model. Statistics in Medicine, 16, 385-395. https://doi.org/10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3
Tibshirani, R., Hastie, T., Narasimhan, B. and Chu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proceedings of the National Academy of Sciences, 99, 6567-6572. https://doi.org/10.1073/pnas.082099299
Vapnik, V. N. (1995). The nature of statistical learning theory, Springer, New York.
Vapnik, V. N. (1998). Statistical learning theory, Springer, New York.
Zhou, M. (1992). M-estimation in censored linear models. Biometrika, 79, 837-841. https://doi.org/10.1093/biomet/79.4.837
Zhou, M. (1998). Regression with censored data : The synthetic data and least squares approach, Technical Report 374, University of Kentucky, Kentucky.

피인용 문헌

Robust minimum distance estimation of a linear regression model with correlated errors in the presence of outliers vol.50, pp.23, 2013, https://doi.org/10.1080/03610926.2020.1734831

Journal of the Korean Data and Information Science Society

Variable selection in censored kernel regression

초록

키워드

참고문헌

피인용 문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)