The Vaginal Microbiome is Associated with Endometrial Cancer Grade and Histology

The human microbiome has been strongly correlated with disease pathology and outcomes, yet remains relatively underexplored in patients with malignant endometrial disease. In this study, vaginal microbiome samples were prospectively collected at the time of hysterectomy from 61 racially and ethnically diverse patients from three disease conditions: (i) benign gynecologic disease (controls, n = 11), (ii) low-grade endometrial carcinoma (n = 30), and (iii) high-grade endometrial carcinoma (n = 20). Extracted DNA underwent shotgun metagenomics sequencing, and microbial α and β diversities were calculated. Hierarchical clustering was used to describe community state types (CST), which were then compared by microbial diversity and grade. Differential abundance was calculated, and machine learning utilized to assess the predictive value of bacterial abundance to distinguish grade and histology. Both α- and β-diversity were associated with patient tumor grade. Four vaginal CST were identified that associated with grade of disease. Different histologies also demonstrated variation in CST within tumor grades. Using supervised clustering algorithms, critical microbiome markers at the species level were used to build models that predicted benign versus carcinoma, high-grade carcinoma versus benign, and high-grade versus low-grade carcinoma with high accuracy. These results confirm that the vaginal microbiome segregates not just benign disease from endometrial cancer, but is predictive of histology and grade. Further characterization of these findings in large, prospective studies is needed to elucidate their potential clinical applications. Significance: The vaginal microbiome reliably segregates not just benign gynecologic condition from endometrial cancer, but also predicts cancer grade and histology. Patterns of microbial abundance and gene expression should be increasingly considered as a factor in the evolution of precision medicine approaches, especially as they relate to cancer screening, disease pathogenesis, and patient-centered outcomes.


Introduction
Endometrial cancer is the most common gynecologic malignancy in the United States (1). The incidence of this disease has been increasing, and it is now listed genomic stability in host cells, and producing oncometabolites (7). Defining the microbiome by community state types (CST), which are groups of microbes of similar phyla and abundance, has been useful to describe differences across groups of women, but the association of CSTs with clinical and pathologic features in patients with endometrial cancer has not been described previously. Because endometrial cancer is a heterogeneous disease comprised of differing histologies and biologic drivers of malignant transformation, comparisons of microbial communities relative to specific histologies and grades may vary and suggest additional unexplored pathways for disease pathogenesis and propagation. Our primary objective was to conduct an exploratory analysis to characterize the preoperative vaginal microbiome in women undergoing surgery for endometrial cancer using metagenomic analyses. The secondary objective was to identify patterns which would reliably segregate not just benign from malignant disease, but also distinguish LG from HG tumors, as guided by CSTs. Such data may identify opportunities where further exploration of the microbiome in relation to disease pathogenesis or early detection is needed.

Ethical Approval and Consent
Approval for this study was provided by the Institutional Review Board at the University of Miami (Miami, FL; protocol no. 20170660). Informed consent was obtained from all participating patients, with forms provided in English, Spanish, and Haitian Creole. This cross-sectional study is reported in accordance with the Strengthening the Reporting of Genetic Association Studies reporting guideline (8). Patients were consented between February 2018 and October 2018 in a sequential manner without any preplanned stratification or matching. The initial protocol called for an oversampling of uterine serous carcinoma (planned accrual n = 10). Written informed consent was obtained from all patients, and the study was conducted in concordance with the Declaration of Helsinki.

Population for Study and Patient-related Information
Three groups of patients were recruited for the study: (i) Women with benign gynecologic disease undergoing elective surgery for nonmalignant conditions, such as fibroids or endometriosis, and all with normal or inactive endometrium (controls); (ii) Women with LG endometrial carcinoma (EC), defined as endometrial intraepithelial neoplasia (EIN, preinvasive disease), grade 1 or grade 2 endometrioid adenocarcinoma on preoperative endometrial biopsy or uterine curettage; (iii) Women with HG endometrial carcinoma, defined as grade 3 endometrioid, serous, small-cell, clear-cell, undifferentiated, or dedifferentiated carcinoma, or uterine carcinosarcoma, on preoperative endometrial biopsy or uterine curettage.
Women were required to be ≥18 years of age, able to provide written consent, and able to read and understand English, Spanish, or Haitian Creole. All patients underwent surgery at one of the hospitals affiliated with the physician

Specimen Collection and Processing
On the day of surgery, following induction of anesthesia, and prior to both vaginal preparation with betadine/chlorhexidine and administration of prophylactic antibiotics, the vaginal swab (4N6FLOQSwab, Thermo Fisher Scientific, #4473979) was placed into the vagina by the attending physician, with care to ensure contact of the swab with the cervix, posterior fornix of the vagina, and vaginal sidewalls. The swab was immediately transferred to the bead tubes which were then snap frozen and kept at −80°C. Microbial DNA was extracted with the PureLink microbiome DNA purification kit (Thermo Fisher Scientific, Invitrogen, #A29790) following manufacturer's protocol. DNA was eluted in 50 μL of AE buffer and quantified using a NanoDrop 2000c Spectrophotometer (Thermo Fisher Scientific). Additional details regarding DNA library construction and sequencing can be found in Supplementary Data S1.

Statistical Analyses
Statistical analyses were performed using custom scripts written in the statistical language R for Statistical Computing. To avoid bias, all patients were included in the analyses, even when missing specific data points, and all available data were included. Summary statistics were used to describe the entire cohort. Significant differences among patient clinical characteristics were determined using Kruskal-Wallis and Wilcoxon signed-rank test. All tests were two sided, with significance set at P < 0.05. Explanation of the power calculation can be found in Supplementary Data S1.

Alpha and Beta Microbial Diversity
Alpha (α) and Beta (β) diversity are standard ecological measures of microbial diversity representing, respectively, the number of unique taxa per sample and similarity in composition between samples. We calculated the observed number of operational taxonomic units as the α-diversity measure for each sample within the tumor type groups after rarefaction. We also calculated the Shannon index as our main α-diversity metric, which was generally concordant with observed number of species. We then fitted a linear model for independent samples. The t test was used to determine statistical significance. For β-diversity, we rarefied the data prior to calculating the various distance measures. To test the association between the covariates and β-diversity measures, we used PERMANOVA, a distance-based analysis of variance method based on permutation. An omnibus test, which is a permutation test taking the minimum of the P values of individual β-diversity measures as the test statistic, was used to combine multiple sources of association evidence provided by different βdiversity measures and an overall association P value was reported. Ordination plots were generated using classic multidimensional scaling. Analyses of the effects of covariates are provided in Supplementary Data S2. total within-cluster variance. We used gap statistics to determine the optimum number of clusters in the dataset. Considering the sample size, we used k = 4 as the optimum number of clusters.

Differential Abundance Analysis
We performed microbiome-wide analysis to identify phylum, family, genus, and species that were differentially abundant between samples with different tumor grades and histology. Using phyloseq_to_deseq2 from phyloseq package (11), we transformed microbial relative abundance data into a DESeq dataset with dispersions estimated. We then identified differentially abundant taxa species using the Wald tests from R package DESeq2. We used samples' species abundance without rarefying to account for variability in read depth between samples. Reported P values were adjusted for the FDR (P adj < 0.05) using the Benjamini-Hochberg procedure.

Gene Expression and Pathway Analysis
We used VIRGO (9) to identify and quantify community gene content, or gene richness, defined as the abundance of nonredundant genes. Nonredundant genes were also annotated with a rich set of functional descriptions. For gene set enrichment analysis (GSEA; ref. 12) we conducted enrichment analysis after constructing gene sets: overrepresentation and underrepresentation analyses across pathologies: benign, LG endometrial carcinoma, HG endometrial carcinoma, and tumor versus benign. We ranked genes based on their fold change (FC) between two sample groups using DEseq2 (13). Then using the fgsea R package, we performed GSEA with three gene sets including Kyoto Encyclopedia of Genes and Genomes (KEGG; ref. 14), Gene Ontology (15), and EggNOG, (v.5; ref. 16). Significantly enriched gene sets were filtered on the basis of a cutoff of q < 0.01.

Machine Learning for Biomarker Discovery
Construction and evaluation of machine learning models on the basis of microbial species was performed using SIAMCAT (17). Read counts at the species level were converted to relative abundances. Species with an overall abundance lower than 0.01 were removed. To quantify associations between vaginal microbiome and tumor grade, we computed for each species the significance using Wilcoxon test and different effect sizes for the association (e.g., AUC or FC). The data used for feature selection were microbial relative abundance after the filtering of low abundant features. FDR was used to correct for multiple testing.

Data and Materials Availability
All data associated with this study are available upon request and have been uploaded to Gene Expression Omnibus. SRA Submission ID: SUB9784683

Demographics
The clinical and demographic characteristics of the studied cohort are displayed in Table 1.
Patients with HG-EC were older than LG-EC and benign patients (q = 0.024).
There was a significant difference in BMI between benign, HG-EC, and LG-EC patients (q = 0.041). More non-Hispanic patients were in the HG-EC cohort versus the LG-EC, in which there were more women of Hispanic ethnicity (q = 0.036). There were no differences in tobacco use, HPV status, or race across the three groups (all P > 0.05).

Composition of the Vaginal Microbiome
Of the approximately 7.

CST Composition and Structure
Four major CSTs were identified with significant differences in microbiome composition, diversity, and structure. Each of the four identified CSTs was comprised of communities disproportionately composed by different phyla (Fig 2A). Bacteroidetes was absent in CST2, and Fusobacteria absent in CST1.
Acinetobacteria and Firmicules were variably present across all four CSTs. The most diverse and taxonomically rich cluster was CST4; the least was CST2.
There was statistically significant clustering into CSTs by both grade and histology (Fig 2B and C). Benign disease predominantly clustered in CST1, while LG clustered in CST2, and HG into both CST3 and CST4 (P = 0.036). There was also variation in CST clustering by histology (P = 0.017). Clinical characteristics and CSTs were evaluated against microbial diversity; only grade and histology had significant associations (benign vs. HG, P adj = 0.019; benign vs. carcinosarcoma, P adj = 0.037; benign vs. EIN, P adj = 0.037; Table 2).

Differential Abundance Analysis
Differential abundance (DA) analyses were conducted to determine the vaginal microbial species enriched or depleted consistently in EC communities.

Gene Expression and Pathway Analyses
The metagenomic approach used allows us to investigate gene abundance and thus pathway analyses of the microbiota observed across endometrial pathologies and endometrial cancer histotypes. The HG communities were typically categorized as low gene count as 73.8% of them had less than 1,000 genes.
Benign communities commonly displayed high gene count as 65% of them had more than 1,000 genes. Hierarchical clustering of the profiles was performed using ward linkage based on their Euclidean distance, the result of

tumors versus benign, and (iii) HG tumors versus
LG tumors (Fig 4). To detect useful species markers of tumor, we conducted a fivefold cross-validation on a random forest model between case and control samples in the discovery phase. For each model, a different set of species was identified as an optimum microbiome signature, consisting of a various number of features and performance of the constructed models based on the area under the ROC curve (Fig 4A-E). The tumor versus benign model selected three important species. The discriminant model based on the abundance of these species effectively distinguishes tumor from benign disease (mean prediction AUC = 0.878; Fig 4B). Two other RF models generated from additional species abundance distinguished HG from benign, and LG from HG with AUC of 0.80 and 0.77, respectively (Fig 4D-F).
We examined the performance of models trained by samples labeled according to histologic subtype (e.g., serous, endometrioid;Supplementary Data S9). The highest prediction performance obtained from the model that trained to distinguish benign samples from samples labeled as serous endometrial carcinoma (mean AUC = 0.826) followed by two models that distinguish benign from endometrioid samples (mean AUC = 0.795) and serous from endometrioid (mean AUC = 0.776). Each of these three histologic classifier models is based 50, 60, and 65 biomarker species, respectively.

Discussion
Among patients with endometrial carcinoma, the vaginal microbiome demonstrates significant variation by tumor pathologic characteristics. This exploratory investigation establishes that not only do prominent species vary by grade, but so too do microbial abundance and CST. These findings represent a novel perspective on the microbial content of the vagina and how the con-fluence with the uterus may provide opportunities for further exploration into its role as an indicator of endometrial carcinoma or further understanding of disease development and propagation.
There have been few studies about the vaginal microbiome in patients with endometrial carcinoma. In 2016, Walther-Antonio and colleagues (18)   status (19). While we did not assess vaginal pH in the current study, we found no association between age or BMI and microbial diversity ( Table 2). Our methodology, however, differed in that our data were segregated categorically to represent clinically meaningful groups (i.e., BMI following World Health Organization categorization; age of 50 serving as a surrogate for menopause). This variation in analysis may account for our findings, but could also be reflective of differences in the population of study relative to our own, as 97% of the Walsh cohort was White and only 10 patients had HG cancers. As microbial diversity in the current study was associated with tumor factors only (grade and histology) and not with categorical clinical factors, it suggests that patient-specific factors may not necessarily need to be included in a predictive model for screening.
While the differential phyla abundance between benign and tumor provides some insight into the local vaginal environment, differences in species abundance may also be meaningful in terms of tumor pathogenesis. Prevotella bivia, with greater than a 6-fold abundance in HG versus LG, is associated with pelvic inflammatory disease and bacterial vaginosis. P. bivia has been shown to upregulate proinflammatory (LAMP3, STAT1, and TAP1) genes in cervical cancer (22). Furthermore, Lactobacillus spp, which were underrepresented in HG versus benign and HG versus LG, are known to inhibit P. bivia (23).
Bifidobacterium longum was the most greatly suppressed species in terms of abundance in HG versus LG disease. B. longum has been shown to have low relative abundance in patients with the most aggressive forms of gastric cancer, suggesting it may be protective (24). It has also been shown to improve immune-mediated tumor control (25). Fusobacterium ulcerans also demonstrated higher abundance in HG. This species has an association with cellular ulceration by secretion of high levels of butyrate (26); very little data exist about its role in cancer pathogenesis. Fusobacterium nucleatum, though not one of the most abundant species contributing to the predictive models, but with a greater than 4-fold presence in HG versus benign, has been found to promote tumor growth (27), associate with high microsatellite instability (28), and induce chemotherapy resistance (29). Patients with cervical cancer who have high levels of intratumoral F. nucleatum have worse progression-free and overall survival (30). In colorectal cancer, the bacterium secretes the adhesin Fap2, which binds to galactose N-acetyl-D-galactosamine (Gal-GalNAc), facilitating the enrichment of tumor cells (29). Gal-GalNAc levels have been shown to be higher in uterine adenocarcinomas relative to benign endometrium (31), and overexpression of the transferases that facilitate Gal-GalNAc glycosylation are strongly associated with histologic grade of tumor and myometrial invasion (32). In colorectal cancer cells in vitro, a high abundance of intratumoral F. nucleatum also activates autophagy, thus inducing resistance to platinum-based chemotherapy (29). The role of all these bacteria in the pathogenesis and treatment of endometrial carcinoma, and specifically high-grade histologies, requires further investigation.
The mechanisms by which the microbiome influences endometrial carcinoma pathogenesis have yet to be determined but are likely multifactorial in the context of tumor stromal function and alterations in cancer cell signaling pathways.
AACRJournals.org Cancer Res Commun; 2(6) June 2022 Lu and colleagues recently reported that the presence of specific bacteria in the endometrium are associated with variable levels of the proinflammatory cytokines IL6, IL8, and IL17 (33). These molecules are known to modify the local microenvironment, and have been implicated in gynecologic cancer development through increased angiogenesis, cellular proliferation, and modification of local immune response (34)(35)(36). In patients with colorectal cancer, the presence of F. nucleatum, may activate the Wnt/β-catenin signaling pathway (37).
In the endometrium, this pathway is important for normal physiologic cellular proliferation during the menstrual cycle, but oncogenic activation is also associated with endometrial carcinoma development (38,39). Consideration should also be given to environmental mediators of microbial content, as practices such as douching have also been shown to favorably modify the gynecologic tract for pathogens (40).
There are several limitations to our study. Our population was from a single institution, so the results may not be applicable in other study environments. Nonetheless, the population was racially and ethnically diverse, which may increase generalizability. Though our sample population was small, we were still able to identify statistically significant associations between CSTs and histology, with >90% power (Supplementary Data S1). Additionally, these relationships were maintained across our analyses, including composition and DA. We designed the study to specifically include more patients with serous carcinoma, and this oversampling approach allowed for greater representation of understudied, high-risk endometrial histologies, relative to other reports (18,19). Moreover these analyses used a metagenomics approach instead of 16S rRNA sequencing in the assessment of endometrial carcinoma-associated microbiomes. This allowed for a more robust evaluation of relative microbial abundance and diversity. While others have advocated for the use of one or two species to discriminate between benign and malignant (18,19), this study included multiple bacterial species to define clusters of organisms that collectively predicted not just malignancy, but subsets of disease. Such an approach may increase the accuracy of these models. Further increases in model accuracy may be achieved with inclusion of tumor-specific factors that can affect bacterial milieu, such as tumor size/volume, degree of myometrial invasion, and amount of necrosis, which were not utilized as covariates in the current investigation.

Conclusions
In this exploratory analysis, the vaginal microbiome reasonably segregated not only endometrial carcinoma from benign disease, but also had strong potential predictive value by grade and histology. Further study in larger populations is needed for validation of our findings, with continued attention to diverse populations to capture variations that may arise from differences associated with clinically relevant demographic factors (race, ethnicity, immigrant status, etc.). The role of the microbiome as a biomarker of disease requires additional exploration, especially because endometrial carcinoma is a disease for which no tool exists for screening or early detection. It will also be important to further characterize the relationships between the microbiome and tumor microenvironment, be they symbiotic or simply associative, and how these may contribute to disease etiology, tumor propagation, and potential novel therapeutic approaches.