Microbiota Data: A Statistical Analysis Workflow

Authors

  • Chiara Maura Ciniselli Unit of Bioinformatics and Biostatistics, Fondazione IRCCS Istituto Nazionale dei Tumori image/svg+xml
  • Valeria Duroni Unit of Bioinformatics and Biostatistics, Fondazione IRCCS Istituto Nazionale dei Tumori image/svg+xml
  • Adriana Blanda Unit of Bioinformatics and Biostatistics, Fondazione IRCCS Istituto Nazionale dei Tumori image/svg+xml
  • Giancarla Bernardo Unit of Microenvironment and Biomarkers of Solid Tumors, Fondazione IRCCS Istituto Nazionale dei Tumori image/svg+xml
  • Armando Licata Unit of Integrated Biology of Rare Tumors, Fondazione IRCCS Istituto Nazionale dei Tumori image/svg+xml
  • Loris De Cecco Unit of Integrated Biology of Rare Tumors, Fondazione IRCCS Istituto Nazionale dei Tumori image/svg+xml
  • Lucia Sfondrini Unit of Microenvironment and Biomarkers of Solid Tumors, Fondazione IRCCS Istituto Nazionale dei Tumori image/svg+xml
  • Elda Tagliabue Unit of Microenvironment and Biomarkers of Solid Tumors, Fondazione IRCCS Istituto Nazionale dei Tumori image/svg+xml
  • Gabriella Sozzi Unit of Epigenomics & Biomarkers of Solid Tumors, Fondazione IRCCS Istituto Nazionale dei Tumori image/svg+xml
  • Paolo Verderio Unit of Bioinformatics and Biostatistics, Fondazione IRCCS Istituto Nazionale dei Tumori image/svg+xml

DOI:

https://doi.org/10.54103/2282-0930/29363

Abstract

Introduction

The microbiota includes the collection of microorganisms that colonize our body, playing a pivotal role in maintaining homeostasis and modulating immune responses. An imbalance in the composition of these microbes is associated with various diseases, including oncological conditions [1]. From a methodological perspective, several aspects should be considered during microbiome analysis, such as the compositional nature of the data, its high dimensionality, overdispersion, and the prevalence of zero-inflated count data [2]. The institutional microbiome project currently active at the Fondazione IRCCS Istituto Nazionale dei Tumori was designed to assess the feasibility of investigating the microbiota in different tumors settings and biological matrices, with the final aim of identifying microbial communities able of distinguishing patients characterized by worse prognosis from those with less aggressive disease. The cancer settings included in the projects are lung, breast and prostate cancer, as well as pseudomyxoma peritonei.

Objectives

Given the unique characteristics of microbiome data, we propose a unified workflow designed for analysing data from various cancer settings to ensure consistency and comparability across the different cancer types.

Methods

Biological samples analysed in this study were disease-specific and include, tumor tissues and matched normal counterpart for lung cancer, tumor tissues for breast and prostate cancer, matched tumor tissue, feces and mucin for pseudomyxoma peritonei. Samples were processed using the same standardized analytical pipeline across all pathologies. This process includes bacterial DNA extraction, enrichment, library preparation and sequencing of the variable regions of the bacterial 16S rRNA gene. Microbial communities were firstly characterized in terms of alpha- and beta-diversity: the first one quantifies the diversity within a given sample in terms of richness or eveness [3], whereas second one assesses diversity differences between sample-groups. Specifically, for alpha-diversity we used a set of different indices: Chao1, Hill and Observed for richness, Gini-Simpson and Shannon for eveness [3]; for beta-diversity the Principal Coordinate Analysis (PCoA) analysis [4,5] was adopted together with the PERmutational Multivariate ANalysis of VAriance (PERMANOVA) [4] and PERmutational Multivariate analysis of DISPersion (PERMDISP) tests [6] by using the Bray-Curtis distance metrics [5]. Then, the workflow incorporates tests and statistical models for both continuous and categorical data to identify bacterial taxa that are differentially expressed or present in distinct biological matrices or associated with clinical-pathological characteristics investigated in each cancer context. Test for paired or unpaired groups comparison were included for both continuous and categorical data analysis.

Results

Using lung cancer as a model setting, matched tumor and normal counterpart of 155 lung cancer patients (stage I-III), were profiled using 16S rRNA gene sequencing for a total of 310 observation and 63 identified bacteria taxa (i.e. genus level). The analysis revealed distinct microbial compositions, in terms of beta-diversity, according to histology. Moreover, by modelling the count data and its presence/absence, a specific subset of bacteria was significantly associated with tumor progression and aggressiveness; specific bacteria were also found differentially expressed between tissue types (tumor or normal counterpart). We are now applying this workflow at the other settings, with the final aim of better understanding the bacteria that characterize tumor aggressiveness.

Conclusion

The developed workflow allows: (i) the use of a shared pre-analytical and analytical workflow of analysis among the different tumor settings under investigation, (ii) the characterization of the microbial community within/between samples and (iii) the evaluation of associations between specific taxa bacteria and tumor characteristics.

Downloads

Download data is not yet available.

References

[1] Derosa L, Iebba V, Silva CAC, et al., Custom scoring based on ecological topology of gut microbiota associated with cancer immunotherapy outcome. Cell., 2024 Jun;187(13):3373-3389.e16 DOI: https://doi.org/10.1016/j.cell.2024.05.029

[2] Jonsson V, Österlund T, Nerman O, et al., Modelling of zero-inflation improves inference of metagenomic gene count data. Stat Methods Med Res., 2019 Dec;28(12):3712-3728 DOI: https://doi.org/10.1177/0962280218811354

[3] Koh H, Tuddenham S, Sears CL, et al., Meta-analysis methods for multiple related markers: Applications to microbiome studies with the results on multiple α-diversity indices. Stat Med., 2021 May;40(12):2859-2876 DOI: https://doi.org/10.1002/sim.8940

[4] Goodrich JK, Di Rienzi SC, Poole AC, et al., Conducting a microbiome study. Cell., 2014 Jul;158(2):250-262 DOI: https://doi.org/10.1016/j.cell.2014.06.037

[5] Peterson CB, Saha S, Do KA. Analysis of Microbiome Data. Annu Rev Stat Appl. 2024 Apr;11(1):483-504 DOI: https://doi.org/10.1146/annurev-statistics-040522-120734

[6] Anderson MJ, Santana-Garcon J. Measures of precision for dissimilarity-based multivariate analysis of ecological communities. Ecol Lett. 2015 Jan;18(1):66-73 DOI: https://doi.org/10.1111/ele.12385

Published

2025-09-08

How to Cite

1.
Ciniselli CM, Duroni V, Blanda A, Bernardo G, Licata A, De Cecco L, et al. Microbiota Data: A Statistical Analysis Workflow. ebph [Internet]. 2025 [cited 2026 Feb. 28];. Available from: https://riviste.unimi.it/index.php/ebph/article/view/29363

Issue

Section

Congress Abstract - Section 3: Metodi Biostatistici