Browse > Article
http://dx.doi.org/10.3745/KTSDE.2019.8.10.397

A Node2Vec-Based Gene Expression Image Representation Method for Effectively Predicting Cancer Prognosis  

Choi, Jonghwan (연세대학교 컴퓨터과학과)
Park, Sanghyun (연세대학교 컴퓨터과학과)
Publication Information
KIPS Transactions on Software and Data Engineering / v.8, no.10, 2019 , pp. 397-402 More about this Journal
Abstract
Accurately predicting cancer prognosis to provide appropriate treatment strategies for patients is one of the critical challenges in bioinformatics. Many researches have suggested machine learning models to predict patients' outcomes based on their gene expression data. Gene expression data is high-dimensional numerical data containing about 17,000 genes, so traditional researches used feature selection or dimensionality reduction approaches to elevate the performance of prognostic prediction models. These approaches, however, have an issue of making it difficult for the predictive models to grasp any biological interaction between the selected genes because feature selection and model training stages are performed independently. In this paper, we propose a novel two-dimensional image formatting approach for gene expression data to achieve feature selection and prognostic prediction effectively. Node2Vec is exploited to integrate biological interaction network and gene expression data and a convolutional neural network learns the integrated two-dimensional gene expression image data and predicts cancer prognosis. We evaluated our proposed model through double cross-validation and confirmed superior prognostic prediction accuracy to traditional machine learning models based on raw gene expression data. As our proposed approach is able to improve prediction models without loss of information caused by feature selection steps, we expect this will contribute to development of personalized medicine.
Keywords
Bioinformatics; Gene Expression; Node2Vec; Cancer Prognostic Prediction; Personalized Medicine;
Citations & Related Records
연도 인용수 순위
  • Reference
1 S. W. Min, B. G. Lee, and S. R. Yoon, "Deep learning in bioinformatics," Briefings in Bioinformatics, Vol.18, No.5, pp.851-869, 2017.   DOI
2 K. Kourou, T. P. Exarchos, K. P. Exarchos, M. V. Karamouzis, and D. I. Fotiadis, "Machine learning applications in cancer prognosis and prediction," Computational and Structural Biotechnology Journal, Vol.13, pp.8-17, 2015.   DOI
3 Ministry of Health and Welfare, Republic Korea, "National Cancer Statistics in 2016." 2018.
4 C. Sotiriou, P. Wirapati, S. Loi, A. Harris, S. Fox, J. Smeds, H. Nordgren, P. Farmer, V. Praz, B. Haibe-Kains, C. Desmedt, D. Larsimont, F. Cardoso, H. Peterse, D. Nuyten, M. Buyse, M. J. Van de Vijver, J. Bergh, M. Piccart, and M. Delorenzi, "Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis," Journal of the National Cancer Institute, Vol.98, No.4, pp.262-272, 2006.   DOI
5 R. Clarke, H. W. Ressom, A. Wang, J. Xuan, M. C. Liu, E. A. Gehan, and Y. Wang, "The properties of high-dimensional data spaces: implications for exploring gene and protein expression data," Nature Reviews Cancer, Vol.8, No.1, pp.37, 2008.   DOI
6 L. Wang, Y. Wang, and Q. Chang, ""Feature selection methods for big data bioinformatics: A survey from the search perspective," Methods, Vol.111, pp.21-31, 2016.   DOI
7 J. Choi, S. Park, Y. Yoon, and J. Ahn, "Improved prediction of breast cancer outcome by identifying heterogeneous biomarkers," Bioinformatics, Vol.33, No.22, pp.3619-3626, 2017.   DOI
8 E. Martinez-Ledesma, R. G. W. Verhaak, and V. Trevino, "Identification of a multi-cancer gene expression biomarker for cancer clinical outcomes using a network-based algorithm," Scientific Reports, Vol.5, pp.11966, 2015.   DOI
9 J. Choi, I. Oh, S. Seo, and J. Ahn, "G2Vec: Distributed gene representations for identification of cancer prognostic genes," Scientific Reports, Vol.8, No.1, pp.13729, 2018.   DOI
10 A. Grover, and J. Leskovec, "Node2vec: Scalable feature learning for networks," Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2016.
11 S. Varma, and R. Simon, "Bias in error estimation when using cross-validation for model selection," BMC Bioinformatics, Vol.7, No.1, pp.91, 2006.   DOI
12 J. Bergstra, and Y. Bengio, "Random search for hyper-parameter optimization," Journal of Machine Learning Research, Vol.13(Feb.), pp.281-305, 2012.
13 T. Dozat, "Incorporating nesterov momentum into adam," 2016.
14 K. Tomczak, P. Czerwinska, and M. Wiznerowicz, "The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge," Contemporary Oncology, Vol.19, No.1A, pp.A68, 2015.
15 A. Colaprico, T. C. Silva, C. Olsen, L. Garofano, C. Cava, D. Garolini, T. S. Sabedot, T. M. Malta, S. M. Pagnotta, I. Castiglioni, M. Ceccarelli, G. Bontempi, and H. Noushmehr, "TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data," Nucleic Acids Research, Vol.44, No.8, pp.e71-e71, 2015.   DOI
16 J. Qiu, Y. Dong, H. Ma, J. Li, and K. Wang, "Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec," Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, ACM, 2018.
17 F. Danielsson, T. James, D. Gomez-Cabrero, and M. Huss, "Assessing the consistency of public human tissue RNA-seq data sets," Briefings in Bioinformatics, Vol.16, No.6, pp.941-949, 2015.   DOI
18 D. Szklarczyk, A. L. Gable, D. Lyon, A. Junge, S. Wyder, J. Huerta-Cepas, M. Simonovic, N. T. Doncheva, J. H. Morris, P. Bork, L. J. Jensen, and C. Mering, "STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets," Nucleic Acids Research, Vol.47, No.D1, pp.D607-D613, 2018.