Browse > Article
http://dx.doi.org/10.29220/CSAM.2022.29.4.453

A guideline for the statistical analysis of compositional data in immunology  

Yoo, Jinkyung (Department of Statistics, Kyungpook National University)
Sun, Zequn (Department of Preventive Medicine - Biostatistics, Northwestern University)
Greenacre, Michael (Department of Economics and Business, Universitat Pompeu Fabra, and Barcelona School of Management)
Ma, Qin (Department of Biomedical Informatics, The Ohio State University)
Chung, Dongjun (Department of Biomedical Informatics, The Ohio State University)
Kim, Young Min (Department of Statistics, Kyungpook National University)
Publication Information
Communications for Statistical Applications and Methods / v.29, no.4, 2022 , pp. 453-469 More about this Journal
Abstract
The study of immune cellular composition has been of great scientific interest in immunology because of the generation of multiple large-scale data. From the statistical point of view, such immune cellular data should be treated as compositional. In compositional data, each element is positive, and all the elements sum to a constant, which can be set to one in general. Standard statistical methods are not directly applicable for the analysis of compositional data because they do not appropriately handle correlations between the compositional elements. In this paper, we review statistical methods for compositional data analysis and illustrate them in the context of immunology. Specifically, we focus on regression analyses using log-ratio transformations and the alternative approach using Dirichlet regression analysis, discuss their theoretical foundations, and illustrate their applications with immune cellular fraction data generated from colorectal cancer patients.
Keywords
compositional data; compositional regression; Dirichlet regression; immunology; immuno-oncology; log-ratio transformation;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Curran T, Sun Z, Gerry B, et al. (2021). Differential immune signatures in the tumor microenvironment are associated with colon cancer racial disparities, Cancer Medicine, 10, 1805-1814.   DOI
2 Hron K, Templ M, and Filzmoser P (2010). Imputation of missing values for compositional data using classical and robust methods, Computational Statistics & Data Analysis, 54, 3095-3107.   DOI
3 King Thomas J, Mir H, Kapur N, and Singh S (2019). Racial differences in immunological landscape modifiers contributing to disparity in prostate cancer, Cancers, 11, 1857.
4 Lubbe S, Filzmoser P, and Templ M (2021). Comparison of zero replacement strategies for compositional data with large numbers of zeros, Chemometrics and Intelligent Laboratory Systems, 210, 104248.
5 Greenacre M (2019). Variable selection in compositional data analysis using pairwise logratios, Mathematical Geosciences, 51, 649-682.   DOI
6 Greenacre M (2010). Log-ratio analysis is a limiting case of correspondence analysis, Mathematical Geosciences, 42, 129-134.   DOI
7 Greenacre M, Grunsky E, and Bacon-Shone J (2020). A comparison of isometric and amalgamation logratio balances in compositional data analysis, Computers & Geosciences, 148, 104621.
8 Greenacre M, Martinez-Alvaro M, and Blasco A (2021). Compositional data analysis of microbiome and any-Omics datasets: A validation of the additive logratio transformation, Frontiers in Microbiology, 2625.
9 Greenacre M (2022). Compositional data analysis - linear algebra, visualization and interpretation, In A. Bekker et al. (Eds), Innovations in Multivariate Statistical Modelling: Navigating Theoretical and Multidisciplinary Domains, in press, Springer.
10 Gueorguieva R, Rosenheck R, and Zelterman D (2008). Dirichlet component regression and its applications to psychiatric data, Computational Statistics & Data Analysis, 52, 5344-5355.   DOI
11 Cook RD (1986). Assessment of local influence, Journal of the Royal Statistical Society: Series B (Methodological), 48, 133-155.   DOI
12 Filzmoser P, Hron K, and Templ M (2018). Applied Compositional Data Analysis, Cham: Springer.
13 Van den Boogaart KG and Tolosana-Delgado R (2008). "compositions": a unified R package to analyze compositional data, Computers & Geosciences, 34, 320-338.   DOI
14 Pillai KCS (1955). Some new test criteria in multivariate analysis, The Annals of Mathematical Statistics, 26, 117-121.   DOI
15 Templ M, Hron K, and Filzmoser P (2011). robCompositions: An R-package for robust statistical analysis of compositional data, Ch. 25. In Pawlowsky-Glahn V, Buccianti A (Eds) Compositional Data Analysis: Theory and Applications (pp. 341-355), Chichester, UK: John Wiley & Sons, Ltd.
16 Thorsson V, Gibbs DL, Brown SD, et al. (2018). The immune landscape of cancer, Immunity, 48, 812-830.   DOI
17 Newman AM, Liu CL, Green MR, et al. (2015). Robust enumeration of cell subsets from tissue expression profiles, Nature Methods, 12, 453-457.   DOI
18 Zelterman D and Chen CF (1988). Homogeneity tests against central-mixture alternatives, Journal of the American Statistical Association, 83, 179-182.   DOI
19 Newman AM, Steen CB, Liu CL, et al. (2019). Determining cell type abundance and expression from bulk tissues with digital cytometry, Nature Biotechnology, 37, 773-782.   DOI
20 Van den Boogaart KG and Tolosana-Delgado R (2013). Analyzing Compositional Data with R, Springer.
21 Greenacre M (2021). Compositional data analysis, Annual Review of Statistics and its Application, 8, 271-299.   DOI
22 Hijazi RH and Jernigan RW (2009). Modeling compositional data using Dirichlet regression models, Journal of Applied Probability & Statistics, 4, 77-91.
23 Legendre P and Legendre L (2012). Numerical Ecology, Elsevier Science.
24 Maier MJ (2014). DirichletReg: Dirichlet regression for compositional data in R, Research Report Series / Department of Statistics and Mathematics, 125, WU Vienna University of Economics and Business, Vienna.
25 Melo TFN, Vasconcellos KLP, and Lemonte AJ (2009). Some restriction tests in a new class of regression models for proportions, Computational Statistics & Data Analysis, 53, 3972-3979.   DOI
26 Campbell G and Mosimann J (1987). Multivariate methods for proportional shape, ASA Proceedings of the Section on Statistical Graphics, 1, 10-17.
27 Aitchison J (1982). The statistical analysis of compositional data, Journal of the Royal Statistical Society: Series B (Methodological), 44, 139-160.   DOI
28 Aitchison J (1986). Logratio analysis of composition, In The Statistical Analysis of Compositional Data (pp. 141-183), London: Champman & Hall.
29 Aitchison J and Greenacre M (2002). Biplots of compositional data, Journal of the Royal Statistical Society: Series C (Applied Statistics), 51, 375-392.   DOI
30 Camargo AP, Stern JM, and Lauretto MS (2012). Estimation and model selection in Dirichlet regression, AIP Conference Proceedings 31st, 1443, 206-213.
31 Coenders G and Pawlowsky-Glahn V RD (2020). On interpretations of tests and effect sizes in regression models with a compositional predictor, SORT - Statistics and Operations Research Transactions, 44, 200-220.
32 Greenacre M (2016). Data reporting and visualization in ecology, Polar Biology, 39, 2189-2205.   DOI
33 Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, and Barcelo-Vidal C (2003). Isometric logratio transformations for compositional data analysis, Mathematical Geology, 35, 279-300.   DOI
34 Gower JC and Dijksterhuis GB (2004). Procrustes problems, Oxford, New York: Oxford University Press.
35 Graeve M and Greenacre M (2020). The selection and analysis of fatty acid ratios: a new approach for the univariate and multivariate analysis of fatty acid trophic markers in marine organisms, Limnology and Oceanography: Methods, 18, 196-210.   DOI
36 Greenacre M (2018). Compositional Data Analysis in Practice, Chapman and Hall/CRC.