DOI QR코드

DOI QR Code

Comparison of the performance of various kernels for the survival prediction model

  • Seungyeoun Lee (Department of Mathematics and Statistics, Sejong University) ;
  • Nayeon Kim (Department of Mathematics and Statistics, Sejong University) ;
  • Beomseok Kim (Department of Statistics, Korea University) ;
  • Inyoung Kim (Department of Statistics, Virginia Tech)
  • Received : 2024.07.03
  • Accepted : 2024.09.30
  • Published : 2024.11.30

Abstract

With the development of high-throughput technologies for producing genomic data, more advanced statistical methods such as regularization and machine learning techniques have been adapted to survival analysis. However, the clinical information such as age, gender and medical history plays a critical role in constructing a survival prediction model. Machine learning technique such as the support vector machine (SVM) can improve the predictability of the survival model by using the available clinical information. When implementing SVM for a predictive survival model, the clinical kernel was proposed by Daemen et al. by equalizing the influence of clinical variables and taking account of the range of these variables. However, this clinical kernel uses the same weight for all clinical variables without considering the different effect of those variables on the survival time. In this study, we proposed a simple kernel, called ensemble kernel, by combining a clinical kernel with model fitting. Since the proposed ensemble kernel is based on model fitting, two different kernels are considered by using either Cox model or accelerated failure time (AFT) model. We compare the performance of these two ensemble kernels with that of the linear kernel and the clinical kernel by the concordance index (C-index) using the four real data sets. While both linear and clinical kernels use all clinical variables in defining global kernels, the proposed two ensemble kernels can use only significant variables from either a Cox model or an AFT model. The comparative result shows that the proposed two ensemble kernels perform similarly as the existing clinical kernel does and the performance of four kernels vary according to data sets.

Keywords

Acknowledgement

This research was supported by the Basic Science Research Program through the National Research Foundation (NRF) funded by the Ministry of Science, (2022R1F1A1074343).

References

  1. Daemen A and Moor De B (2009). Development of a kernel function for clinical data. In Proceedings of 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Minneapolis, MN, 5913-5917.
  2. Daemen A, Timmerman D, Bosch T et al. (2012). Improved modeling of clinical data with kernel methods, Artificial Intelligence in Medicine, 54, 103-114.
  3. Harrel F, Califf R, Pryor D, Lee KL, and Rosati RA (1982). Evaluating the yield of medical tests, Journal of American Medical Association, 247, 2543-2546.
  4. Mok L, Kim Y, Lee S, Choi S, Lee S, Jang J-Y, and Park T (2019). HisCoM-PAGE: Hierarchical structural component models for pathway analysis of gene expression data, Genes 2019, 10, 931.
  5. Kalbfleisch D and Prentice R (1980). The Statistical Analysis of Failure Time Data, Wiley, New York.
  6. Loprinzi C, Laurie J, Wieand H et al. (1994). Prospective evaluation of prognostic variables from patient-completed questionnaires, North Central Cancer Treatment Group. Journal of Clinical Oncology, 12, 601-607.
  7. Andersen P, Borgan O, Gill RD, and Keiding N (1993). Statistical Models Based on Counting Processes, Springer-Verlag, New York.