DOI QR코드

DOI QR Code

A comparison study of canonical methods: Application to -Omics data

오믹스 자료를 이용한 정준방법 비교

  • Seungsoo Lee (Department of Biomedicine & Health Sciences, The Catholic University of Korea) ;
  • Eun Jeong Min (Department of Biomedicine & Health Sciences, The Catholic University of Korea)
  • 이승수 (가톨릭대학교 의생명.건강과학과) ;
  • 민은정 (가톨릭대학교 의생명.건강과학과)
  • Received : 2023.10.23
  • Accepted : 2023.10.31
  • Published : 2024.04.30

Abstract

Integrative analysis for better understanding of complex biological systems gains more attention. Observing subjects from various perspectives and conducting integrative analysis of those multiple datasets enables a deeper understanding of the subject. In this paper, we compared two methods that simultaneously consider two datasets gathered from the same objects, canonical correlation analysis (CCA) and co-inertia analysis (CIA). Since CCA cannot handle the case when the data exhibit high-dimensionality, two strategies were considered instead: Utilization of a ridge constant (CCA-ridge) and substitution of covariance matrices of each data to identity matrix and then applying penalized singular value decomposition (CCA-PMD). To illustrate CIA and CCA, both extensions of CCA and CIA were applied to NCI60 cell line data. It is shown that both methods yield biologically meaningful and significant results by identifying important genes that enhance our comprehension of the data. Their results shows some dissimilarities arisen from the different criteria used to measure the relationship between two sets of data in each method. Additionally, CIA exhibits variations dependent on the weight matrices employed.

생명현상의 복잡한 시스템에 대한 이해를 위한 융합분석의 중요성이 점점 커지고 있다. 하나의 연구대상을 다양한 관점에서 관찰하여 얻게 되는 여러 데이터의 융합분석은 통해 좀 더 대상에 대한 깊은 이해를 가능하게 한다. 본 연구에서는 그중에서도 특히 하나의 샘플에서 두개의 고차원 데이터가 생성된 경우 다룰 수 있는 분석인 공관성분석과 정준상관분석을 비교하였다. 정준상관분석의 경우 고차원 데이터를 다룰 수 없는 단점이 있기에, 해당 문제를 극복하기 위하여 능형상수를 이용하는 방법(CCA-ridge)과 각 데이터의 공분산행렬을 항등행렬로 가정하여 벌점화 특이값분해를 이용한 방법(CCA-PMD) 두 가지를 고려하였으며 각 방법을 NCI60 세포주 패널에서 얻은 RNA 시퀀싱 데이터와 단백질 시퀀싱 데이터 분석에 적용하였다. 그 결과 정준상관분석의 경우 두 정준변수간의 상관관계에 좀 더 집중하는 반면 공관성분석은 각 데이터의 선형조합간의 상관관계뿐 아니라 각 선형조합의 변동성을 함께 고려함을 확인할 수 있었다. 또한 공관성분석의 경우 여러가지의 가중치행렬을 고려하여 그 결과값을 비교하고 중요 시사점을 도출하였다.

Keywords

Acknowledgement

이 논문은 2021년도 정부의 재원으로 한국연구재단의 지원을 받아 수행된 기초연구사업임 (NRF-2021R1F1A1058613).

References

  1. Centonze G, Natalini D, Salemme V, Costamagna A, Cabodi S, and Defilippi P (2021). p130Cas/BCAR1 and p140Cap/SRCIN1 adaptors: The Yin Yang in breast cancer?, Frontiers in Cell and Developmental Biology, 9, 729093.
  2. Culhane AC, Perriere G, and Higgins DG (2003). Cross-platform comparison and visualisation of gene expression data using co-inertia analysis, BMC Bioinformatics, 4, 59.
  3. Vinod HD (1976). Canonical ridge and econometrics of joint production, Journal of Econometrics, 4, 147-1663. https://doi.org/10.1016/0304-4076(76)90010-5
  4. Hotelling H (1936). Relations between two sets of variates, Biometrika, 28, 321-377. https://doi.org/10.1093/biomet/28.3-4.321
  5. Wilms I and Croux C (2016). Robust sparse canonical correlation analysis, BMC Systems Biology, 10, 72.
  6. Le Cao KA, Martin PG, Robert-Granie C, and Besse P (2009). Sparse canonical methods for biological data integration: Application to a cross-platform study, BMC Bioinformatics, 10, 34.
  7. Liu H, D'Andrade P, Fulmer-Smentek S, Lorenzi P, Kohn KW, Weinstein JN, Pommier Y, and Reinhold WC (2010). mRNA and microRNA expression profiles of the NCI-60 integrated with drug activities, Molecular Cancer Therapeutics, 9, 1080-1091. https://doi.org/10.1158/1535-7163.MCT-09-0965
  8. Lund RR, Leth-Larsen R, Caterino TD, Terp MG, Nissen J, Laenkholm AV, Jensen ON, and Ditzel HJ (2015). NADH-Cytochrome b5 reductase 3 promotes colonization and metastasis formation and is a prognostic marker of disease-free and overall survival in estrogen receptor-negative breast cancer, Molecular & Cellular Proteomics (MCP), 14, 2988-2999.
  9. Meng C, Kuster B, Culhane AC, and Gholami AM (2014). A multivariate approach to the integration of multi-omics datasets, BMC Bioinformatics, 15, 162.
  10. Tenenhaus M (1998). La regression PLS: theorie et Pratique, Editions Technip.
  11. Min EJ, Safo SE, and Long Q (2019). Penalized co-inertia analysis with applications to -omics data, Bioinformatics (Oxford, England), 35, 1018-1025. https://doi.org/10.1093/bioinformatics/bty726
  12. Nishizuka S, Charboneau L, Young L et al. (2003). Proteomic profiling of the NCI-60 cancer cell lines using new high-density reverse-phase lysate microarrays, Proceedings of the National Academy of Sciences of the United States of America, 100, 14229-14234. https://doi.org/10.1073/pnas.2331323100
  13. Tibshirani R (1996). Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological), 58, 267-288.
  14. Tibshirani R, Saunders M, Rosset S, Zhu J, and Knight K (2005). Sparsity and smoothness via the fused lasso, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67, 91-108. https://doi.org/10.1111/j.1467-9868.2005.00490.x
  15. Tibshirani R, Hastie T, Narasimhan B, and Chu G (2003). Class prediction by nearest shrunken centroids, with applications to DNA microarrays, Statistical Science, 18, 104-117. https://doi.org/10.1214/ss/1056397488
  16. Dudoit S, Fridly J, and Speed TP (2001). Comparison of discrimination methods for the classification of tumors using gene expression data, Journal of the American Statistical Association, 96, 1151-1160.
  17. Sheikh MS and Satti SA (2021). The emerging CDK4/6 inhibitor for breast cancer treatment, Molecular and Cellular Pharmacology, 13, 9.
  18. Dray S, Chessel D, and Thioulouse J (2003). Co-inertia analysis and the linking of ecological data tables, Ecology, 84, 3078-3089. https://doi.org/10.1890/03-0178
  19. Doledec S and Chessel D (1994). Co-inertia analysis: An alternative method for studying species-environment relationships, Freshwater Biology, 31, 277-294. https://doi.org/10.1111/j.1365-2427.1994.tb01741.x
  20. Tamir A, Gangadharan A, Balwani S et al. (2016). The serine protease prostasin (PRSS8) is a potential biomarker for early detection of ovarian cancer, Journal of Ovarian Research, 9, 20.
  21. Tan HS, Jiang WH, He Y et al. (2017). KRT8 upregulation promotes tumor metastasis and is predictive of a poor prognosis in clear cell renal cell carcinoma, Oncotarget, 8, 76189-76203. https://doi.org/10.18632/oncotarget.19198
  22. Witten DM, Tibshirani R, and Hastie T (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis, Biostatistics (Oxford, England), 10, 515-534. https://doi.org/10.1093/biostatistics/kxp008
  23. Zhang J, Hu S, and Li Y (2019). KRT18 is correlated with the malignant status and acts as an oncogene in colorectal cancer, Bioscience Reports, 39, 8.