Browse > Article
http://dx.doi.org/10.5351/KJAS.2016.29.6.1117

Network-based regularization for analysis of high-dimensional genomic data with group structure  

Kim, Kipoong (Department of Statistics, Pusan National University)
Choi, Jiyun (Department of Statistics, Pusan National University)
Sun, Hokeun (Department of Statistics, Pusan National University)
Publication Information
The Korean Journal of Applied Statistics / v.29, no.6, 2016 , pp. 1117-1128 More about this Journal
Abstract
In genetic association studies with high-dimensional genomic data, regularization procedures based on penalized likelihood are often applied to identify genes or genetic regions associated with diseases or traits. A network-based regularization procedure can utilize biological network information (such as genetic pathways and signaling pathways in genetic association studies) with an outstanding selection performance over other regularization procedures such as lasso and elastic-net. However, network-based regularization has a limitation because cannot be applied to high-dimension genomic data with a group structure. In this article, we propose to combine data dimension reduction techniques such as principal component analysis and a partial least square into network-based regularization for the analysis of high-dimensional genomic data with a group structure. The selection performance of the proposed method was evaluated by extensive simulation studies. The proposed method was also applied to real DNA methylation data generated from Illumina Innium HumanMethylation27K BeadChip, where methylation beta values of around 20,000 CpG sites over 12,770 genes were compared between 123 ovarian cancer patients and 152 healthy controls. This analysis was also able to indicate a few cancer-related genes.
Keywords
high-dimensional genomic data; network-based regularization; genetic network; principal component analysis (PCA); partial least square (PLS);
Citations & Related Records
연도 인용수 순위
  • Reference
1 Alexander, D. and Lange, K. (2011). Stability selection for genome-wide association. Genetic Epidemiology, 35, 722-728.   DOI
2 Chen, M., Cho, J., and Zhao, H. (2011). Incorporating biological pathways via a Markov random field model in genome-wide association studies. PLoS Genetics, 7, e1001353.   DOI
3 Du, P., Zhang, X., Huang, C., Jafari, N., Kibbe, W., Hou, L., and Lin, S. (2010). Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics, 11, 587.   DOI
4 Faraway, J. (2014). Linear Models with R (2nd ed.), Chapman and Hall/CRC.
5 Friedman J., Hastie T., and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33, 1-22.
6 Li, C. and Li, H. (2008). Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics, 24, 1175-1182.   DOI
7 Peng, J., Wang, P., Zhou, N., and Zhu, J. (2009). Partial correlation estimation by joint sparse regression models. Journal of the American Statistical Association, 104, 735-746.   DOI
8 Li, C. and Li, H. (2010). Variable selection and regression analysis for graph-structured covariates with an application to genomics. Annals of Applied Statistics, 4, 1498-1516.   DOI
9 Marsit, C., Christensen, B., Houseman, E., Karagas, M., Wrensch, M., Yeh, R., Nelson, H., Wiemels, J., Zheng, S., Posner, M., McClean, M., Wiencke, J., and Kelsey, K. (2009). Epigenetic profiling reveals etiologically distinct patterns of DNA methylation in head and neck squamous cell carcinoma. Carcinogenesis, 30, 416-422.   DOI
10 Meinshausen, N. and Buhlmann, P. (2010). Stability selection. Journal of the Royal Statistical Society, Series B, 72, 417-473.   DOI
11 Simon, N., Friedman, J., Hastie, T., and Tibshirani, R. (2011). Regularization paths for Cox's proportional hazards model via coordinate descent. Journal of Statistical Software, 39, 1-13.
12 Sun, H. and Wang, S. (2012). Penalized logistic regression for high-dimensional DNA methylation data with case-control studies. Bioinformatics, 28, 1368-1375.   DOI
13 Sun, H. and Wang, S. (2013). Network-based regularization for matched case-control analysis of high-dimensional DNA methylation data. Statistics in Medicine, 32, 2127-2139.   DOI
14 Sun, H., Lin, W., Feng, R., and Li, H. (2014). Network-regularized high-dimensional Cox regression for analysis of genomic data. Statistca Sinica, 24, 1433-1459.
15 Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, 67, 301-320.   DOI
16 Teschendorff, A., Menon, U., Gentry-Maharaj, A., Ramus, S., Weisenberger, D., Shen, H., Campan, M., Noushmehr, H., Bell, C., Maxwell, A., Savage, D., Mueller-Holzner, E., Marth, C., Kocjan, G., Gayther, S., Jones, A., Beck, S., Wagner, W., Laird, P., Jacobs, I., and Widschwendter, M. (2010). Age-dependent DNA methylation of genes that are suppressed in stem cells is hallmark of cancer. Genome Research, 20, 440-446.   DOI
17 Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58, 267-288.
18 Whittaker, J. (1990). Graphical Models in Applied Mathematical Multivariate Statistics, Wiley, New York.