Abstract
There is potential for fecal microbiome profiling to improve colorectal cancer screening. This has been demonstrated by research studies, but it has not been quantified at scale using samples collected and processed routinely by a national screening program.
Between 2016 and 2019, the largest of the NHS Bowel Cancer Screening Programme hubs prospectively collected processed guaiac fecal occult blood test (gFOBT) samples with subsequent colonoscopy outcomes: blood-negative [n = 491 (22%)]; colorectal cancer [n = 430 (19%)]; adenoma [n = 665 (30%)]; colonoscopy-normal [n = 300 (13%)]; nonneoplastic [n = 366 (16%)]. Samples were transported and stored at room temperature. DNA underwent 16S rRNA gene V4 amplicon sequencing. Taxonomic profiling was performed to provide features for classification via random forests (RF).
Samples provided 16S amplicon-based microbial profiles, which confirmed previously described colorectal cancer–microbiome associations. Microbiome-based RF models showed potential as a first-tier screen, distinguishing colorectal cancer or neoplasm (colorectal cancer or adenoma) from blood-negative with AUC 0.86 (0.82–0.89) and AUC 0.78 (0.74–0.82), respectively. Microbiome-based models also showed potential as a second-tier screen, distinguishing from among gFOBT blood-positive samples, colorectal cancer or neoplasm from colonoscopy-normal with AUC 0.79 (0.74–0.83) and AUC 0.73 (0.68–0.77), respectively. Models remained robust when restricted to 15 taxa, and performed similarly during external validation with metagenomic datasets.
Microbiome features can be assessed using gFOBT samples collected and processed routinely by a national colorectal cancer screening program to improve accuracy as a first- or second-tier screen. The models required as few as 15 taxa, raising the potential of an inexpensive qPCR test. This could reduce the number of colonoscopies in countries that use fecal occult blood test screening.
To assess the utility of microbiome profiles for national-scale colorectal cancer screening, we assessed 2,252 routinely processed NHS Bowel Cancer Screening Programme guaiac fecal occult blood test (gFOBT) samples. We generated four microbiome-based random forest classification models, each showing potential to improve accuracy. Two distinguished either colorectal cancer or neoplasm (colorectal cancer or adenoma) from gFOBT blood-negative samples (equivalent to first-tier screening). Two distinguished colorectal cancer or neoplasm from samples that had tested positive for blood by gFOBT, with participants referred for colonoscopy, but at colonoscopy no lesion was found (second-tier screening to rule out gFOBT false positives). Each model remained robust to validation and when restricted to 15 taxa, raising the possibility of an inexpensive qPCR test. The models performed favorably compared with existing microbiome studies, fecal immunochemical test, and Cologuard. These results suggest that microbiome analysis could be integrated into national colorectal cancer screening to improve accuracy and reduce the number of unnecessary screening colonoscopies.
Introduction
Globally, colorectal cancer is the third most common cause of cancer deaths (1). Screening reduces mortality by detecting asymptomatic adenomas or early-stage colorectal cancer (2). Countries have adopted different screening approaches. In England, the NHS Bowel Cancer Screening Programme (NHSBCSP) tests for occult fecal blood; if detected, participants are referred for colonoscopy. Until June 2019, the NHSBCSP used the guaiac fecal occult blood test (gFOBT). Specificity is limited, with only 40% of screening colonoscopies detecting adenoma and 10% colorectal cancer (3, 4); this represents a significant cost, resource, and patient burden.
Research suggests that fecal microbiome analysis may serve as an improvement or adjunct to current colorectal cancer screening (5). However, previous studies have not yet bridged the gap between preclinical, basic scientific discovery and the population scale necessary for translation to a national screening program. These limitations were outlined in a systematic review: many had small numbers of participants (the largest had 490, of which 120 were patients with colorectal cancer); many collected samples in a manner incompatible with national screening (refrigerated/frozen samples); some used post-colonoscopy samples (bowel preparation alters the microbiome); and few had the opportunity to externally validate their models (5).
We aimed to quantify the utility of integrating microbiome analysis into a national colorectal cancer screening program by analyzing microbiome features from large numbers of routinely processed NHSBCSP gFOBT samples. Technical studies have shown that it is possible to measure a subset of clinically relevant microbiome features from gFOBT stored at room temperature (6–13). Two studies have analyzed large numbers of bowel-preparation naïve individuals, but neither performed microbiome analysis directly from screening samples; one study has performed preliminary analysis of screening fecal immunochemical test (FIT) samples, but did not determine diagnostic performance of the microbiome (14–16). To our knowledge, our study is the first to analyze microbiome features from large numbers of routinely processed gFOBT screening samples.
To reflect the aims of the NHSBCSP, we explored the potential of microbiome-based random forest (RF) models to detect colorectal cancer alone, or to detect colorectal cancer and adenoma (a group we term “neoplasm”). We investigated the potential to use these microbiome-based RF models as a first-tier screen, equivalent to the use of gFOBT; we used gFOBT blood-negative samples as the control group, as 98% of screening gFOBT yield a blood-negative result. In addition, we explored the potential to use the microbiome-based RF models as a second-tier screen; a second-tier represents an opportunity to triage those samples with a blood-positive gFOBT result, to reduce the number of unnecessary screening colonoscopies. As a second-tier screen, we explored the potential of microbiome-based RF models to distinguish gFOBT blood-positive samples associated with colorectal cancer or neoplasm, from gFOBT blood-positive samples associated with a normal colonoscopy result. We used “colonoscopy-normal” samples as the control group, as although a proportion of screening colonoscopies yield a “nonneoplastic” diagnosis (e.g., diverticulosis, nondysplastic polyp), this is a heterogeneous group. We found that microbiome-based RF models show potential as a first-tier screen for the detection of colorectal cancer [AUC 0.86 (0.82–0.89)] or neoplasm [AUC 0.78 (0.74–0.82)], and as a second-tier screen, for the detection of colorectal cancer [AUC 0.79 (0.74–0.83)] or neoplasm [AUC 0.73 (0.68–0.77)].
Materials and Methods
Study design and participants
The NHSBCSP Southern Hub prospectively collected a convenience series of routinely processed gFOBT October 2016–August 2019: this included all “blood-positive” gFOBT (blue discoloration affecting five or six squares) processed by the Southern Hub (n = 3,700), and a random sample of “blood-negative” (no blue discoloration) gFOBT (n = 530). Of the samples collected, 3,601 (85%) had complete basic clinical data recorded on the NHSBCSP database at the time of the final data extract. From this group, we selected samples to achieve sample sizes that were approximately equal across the different clinical groups (Fig. 1; Supplementary Materials and Methods).
Microbiome taxonomic profiling demonstrates potential to improve colorectal cancer screening accuracy. A, Overview of the NHSBCSP and the design of this study. Briefly, we used 16S amplicon-based microbiome profiling from routinely collected gFOBT specimens to supplement first-tier (colorectal cancer/neoplasm vs. blood-negative) or second-tier (colorectal cancer/neoplasm vs. colonoscopy-normal) opportunities for early cancer screening. B, Microbiome profiles improve colorectal cancer or neoplasm classification versus blood-negative gFOBT samples (first-tier screening application) or blood-positive colonoscopy-normal samples (second-tier screening application) relative to purely clinical characteristics (age and sex). Classification used RF models and shows the performance of the “total” RF models bootstrapped from the total datasets. Shading represents the 95% CI. Clinical = RF models based on age and sex. Bacteria = RF models based on relative abundances of genera. Neoplasm = a group comprising an approximately equal ratio of colorectal cancer, low-risk adenoma, intermediate-risk adenoma, and high-risk adenoma samples.
Microbiome taxonomic profiling demonstrates potential to improve colorectal cancer screening accuracy. A, Overview of the NHSBCSP and the design of this study. Briefly, we used 16S amplicon-based microbiome profiling from routinely collected gFOBT specimens to supplement first-tier (colorectal cancer/neoplasm vs. blood-negative) or second-tier (colorectal cancer/neoplasm vs. colonoscopy-normal) opportunities for early cancer screening. B, Microbiome profiles improve colorectal cancer or neoplasm classification versus blood-negative gFOBT samples (first-tier screening application) or blood-positive colonoscopy-normal samples (second-tier screening application) relative to purely clinical characteristics (age and sex). Classification used RF models and shows the performance of the “total” RF models bootstrapped from the total datasets. Shading represents the 95% CI. Clinical = RF models based on age and sex. Bacteria = RF models based on relative abundances of genera. Neoplasm = a group comprising an approximately equal ratio of colorectal cancer, low-risk adenoma, intermediate-risk adenoma, and high-risk adenoma samples.
This enabled profiling of 2,252 samples: samples whereby hemoglobin was not detected that is, “blood-negative” [n = 491 (22%)] and “blood-positive” [n = 1,761 (78%)]. Blood-positive samples had the following colonoscopy-diagnoses: colorectal cancer [n = 430 (19%)], adenoma [n = 665 (30%)], colonoscopy-normal [n = 300 (13%)], nonneoplastic condition [n = 366 (16%)]. While the composition of our overall study group does not reflect the composition of the NHSBCSP population (2% of gFOBT are blood-positive; 10% of screening colonoscopies reveal colorectal cancer, 40% adenoma, and 50% reveal a normal colon or nonneoplastic condition), we required these respective sample numbers to adequately profile the colorectal cancer and neoplasm-associated microbiome and to train RF models (3, 4). Test statistics that are affected by disease prevalence would be different in the NHSBCSP population, for example, positive predictive value would be lower.
Samples were transported to the University of Leeds (Leeds, United Kingdom) at room temperature, and stored at room temperature prior to DNA extraction. The NHSBCSP asks participants to record the date of fecal collection; this information was available for 2,167 samples. Of these, 1,363 recorded 3 consecutive days; 95 recorded a single date (implying a single stool), and maximum duration between collections was 16 days. Time between fecal collection and DNA extraction was 46–706 days (median 374 days; Supplementary Materials and Methods). To determine whether prolonged storage at room temperature prior to DNA extraction altered results, a set of DNA extraction replicates was created. Three squares were dissected and combined to make a sample and, after a period of time (6–23 months), the alternate three squares were dissected and combined to make a replicate (n = 26 pairs). For comparison, a set of “same-day” DNA extraction replicates were created, whereby three squares of fecally loaded card were dissected and combined to make a sample and, at the same time, the alternate three squares were dissected and combined to make a replicate (n = 48 pairs).
Data were extracted from the NHSBCSP database: age, sex, screening-round, episode-outcome, and for blood-positive gFOBT: diagnosis [normal, adenoma (low-, intermediate-, or high-risk; ref. 17), colorectal cancer, nonneoplastic condition], and lesion location. In cases of more than one lesion, only the most advanced was recorded. Data are based on information collected and quality assured by Public Health England (PHE) Population Screening Programmes. Access to the data was facilitated by the PHE Office for Data Release.
The screening age is 60–74 inclusive. People older than 74 years can self-refer to the program. The study cohort contained 35 older participants (ages 75–89) and one younger participant (age 59, 1 week before their birthday).
A power calculation was performed using the R package pwr (based on a variance-stabilized linear model) using effect sizes from the Human Microbiome Project (RRID:SCR_012956) with Bonferroni correction (18). Assuming 900 samples with 50,000 reads per sample, we anticipated power 0.95 to detect a 0.055-unit difference in common taxa (0.003 relative abundance), and a 0.022-unit difference in rare taxa (0.0004 relative abundance).
Ethical approval
Tyne & Wear South REC (IRAS:188007; REC:16/NE/0210), BCSP Research Committee (BCSPID_160), Office for Data Release(ODR1617_126). Patients and the public were not involved in the study design but have since been involved in the study and will be involved in the dissemination of results.
Laboratory methods
From each developed gFOBT (Hema Screen, Immunostics, Inc), three alternate squares of fecally loaded card were dissected and processed as a combined sample. This approach subsamples a larger volume of stool, ensuring adequate material even from thinly smeared cards, and leaves three residual squares for alternative analysis or extraction replicates. DNA was extracted using a modified version of the QIAamp DNA Mini Kit protocol (Qiagen; detailed in Supplementary Materials and Methods). DNA extraction was performed in batches of up to 24 samples; to limit batch effects, batches were designed to contain samples representing the different clinical groups. Library preparation was according to the Earth Microbiome Project 16S Illumina Amplicon methodology with single PCR reactions of 20 ng DNA per sample and additional indexes to increase multiplexing capacity (19). Samples were pooled and sequenced across two runs, each comprising one lane of an Illumina HiSeq3000, for 2 × 150 bp sequencing, with a 10 bp single index read.
Bioinformatic and statistical analysis
During quality control, 16 samples had fewer than 10,000 reads and were removed from analysis. With these samples removed, read count per sample was 14,635–555,465 (median 123,265).
Reads were stripped of adaptors using cutadapt and trimmed to maximum 145 bp (20). Pairs were merged, denoised, and representative sequences chosen using DADA2 (21). Further processing was conducted in QIIME2 (version 2019.4; ref. 22). Differences of Shannon index were assessed by Kruskal–Wallis test. Taxa were assigned by the QIIME2 feature classifier using the BLAST+ algorithm (23, 24) using the SILVA version 132 99% similarity database (RRID:SCR_006423; ref. 25). Principle coordinate analysis of Bray–Curtis distances was performed. Further analysis was performed using R (version 3.5.1). Differences in beta diversity were assessed by PERMANOVA analysis of Bray–Curtis distances using Adonis (26). Differences in beta diversity between sample groups were further explored by PERMANOVA analysis of Bray–Curtis distances performed using the beta-group-significance function within QIIME2 (27). Taxa differing significantly between groups were obtained using LEfSe (linear discriminant analysis effect size; RRID:SCR_014609; ref. 28).
RF models and AUC were generated using randomForest (RRID:SCR_015718) and pROC (29–31). For the neoplasm models, the neoplasm group contained an approximately equal number of randomly selected low-, intermediate-, and high-risk adenomas and colorectal cancer. Alternate samples were assigned to test or validation models (Supplementary Table S3); when used, total sample sets were also bootstrapped by RF during training. Each forest was built with 1,000 trees. Mtry was determined on the basis of the lowest out-of-bag error. A total of 95% confidence intervals (CI) for the ROC curves and AUC were created using 2,000 stratified bootstrap replicates. AUC were compared using roc.test, using the method of DeLong (32). Confusion matrices were created using the predict function of randomForest using the default vote proportion cutoff of 50%.
Taxa were compared with nine colorectal cancer fecal metagenomic datasets (33–40), processed using MetaPhlAn version 3.0 (RRID:SCR_004915; refs. 41–43). The majority of the datasets have been comprehensively profiled in two recent meta-analyses (33, 34). Datasets were collapsed to genus level for comparison. The Thomas_c (34) and Yachida (35) datasets were merged as they originated from the same cohort. RF models were built as above, using taxa present in all datasets. For within-dataset comparisons, each study was randomly split 20 times into equal sized training and validation sets, and mean AUC recorded. For the leave-one-dataset-out (LODO), models were built using all but one dataset, and validated on the missing dataset. For each test/validation pair of cohorts, confusion matrices were created using the predict function of randomForest using the default vote proportion cutoff of 50%. Sensitivity was calculated as the proportion of colorectal cancer samples called as colorectal cancer within the validation dataset, based on the test dataset RF model. Specificity was calculated as the proportion of control samples called as control. For the self-validation comparisons, the mean sensitivity and specificity of the 20 repetitions was recorded.
To compare our gFOBT-derived biomarker with microbial taxonomic biomarkers from existing datasets, we used the genus-summarized profiles to calculate a single, meta-analyzed biomarker. This used the “metafor” R package with a random effects model incorporating standardized mean differences from these taxonomic profiles and sample sizes from all 10 datasets (including either gFOBT colorectal cancer vs. blood-negative or colorectal cancer vs. colonoscopy-normal).
Data are available at PRJEB37635 (http://www.ebi.ac.uk/ena/data/view/PRJEB37635).
Role of the funding source
The funders had no role in study design, data collection, analysis, interpretation, or writing. The corresponding author had full access to all the data and final responsibility for the decision to submit for publication.
Results
Summary of population characteristics and microbiome profiling
We profiled the fecal microbiomes of 2,252 NHSBCSP participants using gFOBT samples, confirming that NHSBCSP gFOBT contained adequate material for V4 16S rRNA gene amplicon sequencing. Samples retained after quality control represented phenotypes of blood-negative gFOBT [n = 491 (22%)] and blood-positive [n = 1,761 (78%)]. The blood-positive samples were grouped according to subsequent colonoscopy diagnosis: colorectal cancer [n = 430 (19%)], adenoma [n = 665 (30%)], colonoscopy-normal [n = 300 (13%)], nonneoplastic diagnosis [n = 366 (16%); Table 1]. The male preponderance of colorectal cancer and adenoma samples (67% and 65%) likely reflects the male-preponderance of colorectal neoplasia (44); in later analysis, we show that sex has minimal effect on overall microbiome structure.
Table of participant characteristics.
. | . | Number of samples . | ||
---|---|---|---|---|
Clinical group . | Mean age (SD) . | Total . | Male (%) . | Female (%) . |
gFOBT blood-negative | 67.0 (4.5) | 491 (22%) | 205 (42%) | 286 (58%) |
gFOBT blood-positive, with the following diagnosis at colonoscopy: | ||||
Colorectal cancer | 68.1 (5.0) | 430 (19%) | 289 (67%) | 141 (33%) |
Adenoma | 66.3 (4.7) | 665 (30%) | 432 (65%) | 233 (35%) |
Normal colonoscopy | 66.6 (4.3) | 300 (13%) | 155 (52%) | 145 (48%) |
Nonneoplastic diagnosis | 66.7 (4.7) | 366 (16%) | 188 (51%) | 178 (49%) |
. | . | Number of samples . | ||
---|---|---|---|---|
Clinical group . | Mean age (SD) . | Total . | Male (%) . | Female (%) . |
gFOBT blood-negative | 67.0 (4.5) | 491 (22%) | 205 (42%) | 286 (58%) |
gFOBT blood-positive, with the following diagnosis at colonoscopy: | ||||
Colorectal cancer | 68.1 (5.0) | 430 (19%) | 289 (67%) | 141 (33%) |
Adenoma | 66.3 (4.7) | 665 (30%) | 432 (65%) | 233 (35%) |
Normal colonoscopy | 66.6 (4.3) | 300 (13%) | 155 (52%) | 145 (48%) |
Nonneoplastic diagnosis | 66.7 (4.7) | 366 (16%) | 188 (51%) | 178 (49%) |
Of the colorectal cancer samples, lesion data were available for 359 of 430 (83%), corresponding to 378 colorectal cancers [342 (95%) samples resulted in a single colorectal cancer being detected at colonoscopy; 17 (5%) samples resulted in more than one synchronous colorectal cancer being detected at colonoscopy]. Where type was recorded [n = 298 (79%)], the majority were adenocarcinoma [n = 297 (99%)]; and one rectal tumor was a squamous cell carcinoma (<1%). Where grade was recorded [n = 253 (67%)], the majority were well/moderately differentiated [n = 224 (89%)]; 29 (11%) were poorly differentiated. The commonest tumor location was sigmoid/rectum (Table 2). Unfortunately, tumor stage was not available. Of the nonneoplastic samples, lesion data were available for 333 of 366 (91%). Many had more than one diagnosis, the commonest being “diverticulosis” (Supplementary Materials and Methods).
Table of colorectal cancer locations.
CRC tumor location . | Number . |
---|---|
Ileum | 1 (<1%) |
Cecum | 43 (11%) |
Ascending colon | 40 (11%) |
Hepatic flexure | 21 (6%) |
Transverse colon | 32 (8%) |
Splenic flexure | 15 (4%) |
Descending colon | 12 (3%) |
Sigmoid | 90 (24%) |
Rectosigmoid | 27 (7%) |
Rectum | 96 (25%) |
Anus | 1 (<1%) |
CRC tumor location . | Number . |
---|---|
Ileum | 1 (<1%) |
Cecum | 43 (11%) |
Ascending colon | 40 (11%) |
Hepatic flexure | 21 (6%) |
Transverse colon | 32 (8%) |
Splenic flexure | 15 (4%) |
Descending colon | 12 (3%) |
Sigmoid | 90 (24%) |
Rectosigmoid | 27 (7%) |
Rectum | 96 (25%) |
Anus | 1 (<1%) |
Pairs of technical DNA extraction replicates extracted after prolonged storage had similar microbiome structures, equivalent to “same-day” DNA extraction replicates, confirming that time until DNA extraction has minimal effect on results (Supplementary Fig. S1).
Gut microbiome profiles of the NHSBCSP cohort
While the amount of biomass and resolution of amplicon-based taxonomic profiling from these samples was limited, it was more than sufficient to establish overall fecal microbiome structure, as well as to subsequently classify by phenotype. As expected, microbial structure was dominated by a gradient trade-off between Bacteroidetes versus Firmicutes phylum members, with beta diversity minimally influenced by clinical group (∼1% variation in microbiome structure, by Bray–Curtis PERMANOVA), and even less by sex and age (Supplementary Table S1; Fig. 2). Microbiome structure differed significantly between individual clinical groups by Bray–Curtis PERMANOVA (Supplementary Table S2). Similarly, alpha diversity was significantly higher in blood-negative and colorectal cancer samples, although with very small effect size difference between groups (Kruskal-Wallis P = 4.50 × 10–25; Supplementary Table S3; Fig. 2). This suggested a combination of both global and taxon-specific differences in the microbiome during colorectal cancer, in agreement with previous studies (45).
Microbiome-based gFOBT colorectal cancer/neoplasm classification requires as few as 15 taxa and compares favorably with models built using external shotgun metagenomic datasets. A, Genus-level bacteria only “total” RF classification models were built using an increasing number of taxa of decreasing RF importance score. Shading represents the 95% CI of the AUC. Neoplasm = a group comprising an approximately equal ratio of colorectal cancer, low-risk adenoma, intermediate-risk adenoma, and high-risk adenoma samples. For each model, the AUC plateaus at approximately 15 taxa. B, Performance of the amplicon-based “colorectal cancer versus blood-negative” total RF model compared with models built using external fecal shotgun metagenomic datasets. The matrix displays cross-prediction AUCs. LODO denotes AUC generated by training a model using all but the dataset of the associated column and testing it using the dataset of that column. Within-study and cross-study performance of the “colorectal cancer versus blood-negative” model falls within the range of performances of the external models, indicating a degree of generalizability. C, Specific taxa prioritized by gFOBT amplicon-based regression models (at the genus level) are strikingly similar to genera prioritized from shotgun metagenomic taxonomic profiles in complementary populations.
Microbiome-based gFOBT colorectal cancer/neoplasm classification requires as few as 15 taxa and compares favorably with models built using external shotgun metagenomic datasets. A, Genus-level bacteria only “total” RF classification models were built using an increasing number of taxa of decreasing RF importance score. Shading represents the 95% CI of the AUC. Neoplasm = a group comprising an approximately equal ratio of colorectal cancer, low-risk adenoma, intermediate-risk adenoma, and high-risk adenoma samples. For each model, the AUC plateaus at approximately 15 taxa. B, Performance of the amplicon-based “colorectal cancer versus blood-negative” total RF model compared with models built using external fecal shotgun metagenomic datasets. The matrix displays cross-prediction AUCs. LODO denotes AUC generated by training a model using all but the dataset of the associated column and testing it using the dataset of that column. Within-study and cross-study performance of the “colorectal cancer versus blood-negative” model falls within the range of performances of the external models, indicating a degree of generalizability. C, Specific taxa prioritized by gFOBT amplicon-based regression models (at the genus level) are strikingly similar to genera prioritized from shotgun metagenomic taxonomic profiles in complementary populations.
We thus went on to identify specific taxa that were significantly enriched/depleted between clinical groups, which proved to include colorectal cancer-microbiome associations described in the existing literature. Both inflammation-associated and oral microbes were enriched, such as Escherichia-Shigella, Peptostreptococcus, Porphyromonas, Fusobacterium, and Parvimonas (Supplementary Fig. S3). Interestingly, 43 taxa were significantly enriched and 43 depleted in the blood-negative group compared with the blood-positive colonoscopy-normal group. Existing studies usually compare colorectal cancer with either healthy volunteers (equivalent to the blood-negative group) or controls with a normal colonoscopy; it is rare for both groups to be available within a study. Thus, notably, choice of control group was shown to affect which taxa were colorectal cancer–enriched relative to controls (Supplementary Fig. S3). Of the colorectal cancer–enriched taxa, seven featured in both comparisons (including Porphyromonas, Parvimonas, and Peptostreptococcus), and of the colorectal cancer–depleted taxa, only one featured in both comparisons (Anaerotruncus). An inverse association with colorectal cancer was shown for 25 taxa between the two choices of control group (including Fusobacterium and Escherichia-Shigella). These findings indicate that choice of control group can have an important bearing on results, and suggest that certain taxa (especially typically oral taxa, e.g., Porphyromonas, Parvimonas, and Peptostreptococcus) may have an association with colorectal cancer that is independent of the presence of fecal-blood (at least at the level detectable by gFOBT), whereas others (Fusobacterium and Escherichia-Shigella) may not.
Microbiome analysis of NHSBCSP samples has the potential to improve colorectal cancer screening
To determine whether microbiome profiles from NHSBCSP gFOBT samples could improve screening accuracy, we created RF classifiers using relative abundances of genera (Fig. 1). While LEfSe indicates taxa which are significantly enriched or depleted between groups, RF classifiers identify taxa which have predictive associations (28–30). We assessed four models, the first two of which investigated whether microbiome analysis could be used as a first-tier screen—that is, to distinguish colorectal cancer or neoplasm from blood-negative gFOBT. On the basis of a randomly selected 50% training-validation split, colorectal cancer outcomes were separated from blood-negative gFOBTs (colorectal cancer vs. blood-negative) with AUC 0.86 (0.82–0.89; Supplementary Tables S4–S6). The second model distinguished neoplasm (a group comprising an approximately equal ratio of colorectal cancer, low-, intermediate-, and high-risk adenoma) from blood-negative gFOBTs (neoplasm vs. blood-negative) with AUC 0.78 (0.74–0.82; Supplementary Tables S5 and S6). Neither model showed a significant difference between AUCs of the test or validation sets (Supplementary Table S5).
The next two models assessed whether microbiome profiles could distinguish, strictly among the blood-positive samples, colorectal cancer or neoplasm from subsequently colonoscopy-normal samples (i.e., a second-tier screen, to identify gFOBT false positives). As expected, these more biologically similar outcomes were more difficult to differentiate, but were still accessible via microbiome measures. The third model distinguished colorectal cancer from colonoscopy-normal gFOBT (colorectal cancer vs. colonoscopy-normal) with AUC 0.79 (0.74–0.83; Supplementary Tables S5 and S6; Supplementary Fig. S4). The last model differentiated neoplasms from colonoscopy-normal gFOBT (neoplasm vs. colonoscopy-normal) with AUC 0.73 (0.68–0.77; Supplementary Tables S5 and S6; Supplementary Fig. S4). Again, neither model showed a significant difference between AUCs of the test or validation sets (Supplementary Table S5).
All of the models performed significantly better than models generated for comparison which used age and sex. Combining age and sex with relative abundances of genera led to a small improvement in AUC for three of the models (Supplementary Table S5). Model performance remained similar after restricting the models to a small number of taxa, mimicking what might be possible by qPCR; for all four models, AUC increased as the number of taxa increased up to 15, after which the AUC approximately stabilized (Fig. 2; Supplementary Table S5; Supplementary Fig. S4). Interestingly, the 15 most important taxa for the “colorectal cancer versus blood-negative” and “colorectal cancer versus colonoscopy-normal” models featured eight of the same taxa, including Fusobacterium, Peptostreptococcus, Parvimonas, Gemella, Odoribacter, and Faecalibacterium, and three taxa (Faecalibacterium, Akkermansia, and Escherichia-Shigella) were shared between the “neoplasm versus blood-negative” and “neoplasm versus colonoscopy-normal” models (Supplementary Fig. S4). Several of the same taxa appeared in the 15 taxa most important to the “colorectal cancer versus blood-negative” and “neoplasm versus blood-negative,” and “colorectal cancer versus colonoscopy-normal” and “neoplasm versus colonoscopy-normal” models, respectively (Supplementary Fig. S4).
Finally, we compared the performance of these 16S-based RF models to similar models using existing fecal shotgun metagenomic datasets (Fig. 2; Supplementary Fig. S5; refs. 33–40). As the majority of these existing studies had only profiled colorectal cancer, we restricted the comparison with the two colorectal cancer RF models. Within-study cross-validation of the “colorectal cancer versus blood-negative” model produced an AUC of 0.86, which compared favorably with the AUCs of the external datasets (range, 0.59–0.95; Fig. 2; Supplementary Fig. S5). Between-study performance of the model also fell within the range of performances of the models built using the external datasets, and the majority of the most important taxa paralleled those of the external studies, indicating a degree of generalizability. The “colorectal cancer versus colonoscopy-normal” model had a within-study cross-validation AUC that was within the range of the models built using external datasets, but between-study validation performance was lower (Fig. 2; Supplementary Fig. S5). Taxa which were of highest importance to the model were shared by many of the models built using external datasets, indicating both their potential underlying biological importance and their ability to be consistently detected by a variety of assays.
For completeness, we also explored the ability of microbial RF models to detect adenoma. Performance was generally comparable; models distinguished colorectal cancer from adenoma with AUC 0.71 (0.66–0.76), adenoma from colonoscopy-normal with AUC 0.72 (0.67–0.77), and adenoma from blood-negative with AUC 0.84 (0.80–0.87; Supplementary Tables S7–S10). The taxa of greatest importance to the RF models included several “colorectal cancer—associated” taxa. Finally, we investigated the performance of bacteria RF models using a “colonoscopy-control” group, comprising an approximately equal ratio of nonneoplastic and colonoscopy-normal samples (Supplementary Tables S7–S10). Colorectal cancer was detected with an AUC 0.76 (0.72–0.80), similar to the RF model which used colonoscopy-normal samples alone as the control group. However, the models designed to detect adenoma and neoplasm performed inferiorly compared with RF models built using colonoscopy-normal samples alone. This could reflect the heterogeneous nature of the nonneoplastic group, or greater microbiome similarity between the adenoma and nonneoplastic groups.
Discussion
To our knowledge, this is the first study to profile the microbiome of large numbers of colorectal cancer screening samples, collected and processed routinely by a national screening program, and to demonstrate the potential of microbiome analysis as an accurate adjunct to early screening. We profiled the fecal microbiome of 2,252 processed NHSBCSP gFOBT samples, representing blood-negative results, colonoscopy-normal outcomes, colorectal cancer, adenomas, and nonneoplastic diagnoses. Using RF models as a simple classification method, microbiome taxonomic profiles were able to serve as accurate first- and second-tier screens, the former separating colorectal cancer/neoplasm from blood-negative results, and the latter separating colorectal cancer/neoplasm from normal-colonoscopy results. All four microbiome-based models performed significantly better than models built using the only clinical data available—age and sex—and were robust to hold-out validation and in comparison with external data.
As a baseline for translational applications, the first-tier “colorectal cancer versus blood-negative” model performed similarly to existing screening methods. This includes those that rely on low-dimensional or high-dimensional biomarkers. For example, a meta-analysis of FIT and a separate study of FIT for colorectal cancer screening reported an AUC for the detection of colorectal cancer as high as 0.95 (46, 47). Separately, a trial of the FDA-approved Cologuard reached an AUC of 0.94 for the discrimination of colorectal cancer versus “nonadvanced neoplasia/lesser findings,” and with FIT an AUC of 0.89 (48). Our microbiome-based “neoplasm versus blood-negative” model again performed similarly (possibly superiorly) to existing methods [AUCs from the aforementioned studies of 0.72(FIT), 0.67(FIT), and 0.73(Cologuard); refs. 47, 48], although differences in the composition of the case and control groups between the studies should be borne in mind. Importantly, in comparison with Cologuard, which requires whole stool and costs approximately $600/test, amplicon-based microbiome profiling requires very little biomaterial and would be easier to translate to a national screening program. The fact that model performance required as few as 15 taxa, in agreement with existing studies, raises the potential of a rapid qPCR-based test which could be integrated into a screening program at low cost (34, 49–52). Although we were not able to assess it in our study, it has been shown that microbiome analysis is able to detect lesions missed by FIT, suggesting a potential role as an adjunct to FIT for the detection of nonbleeding colorectal cancer (53).
The second-tier models perhaps showed the greatest clinical potential, as they were able to identify colorectal cancer and neoplasms from among the blood-positive gFOBT cohort. Currently, all NHSBCSP participants with a blood-positive gFOBT are referred for colonoscopy, yet 50% reveal a normal bowel or nonneoplastic condition. The high number of unnecessary colonoscopies carries associated risks and strains endoscopy capacity. There are limited examples of second-tier screens in the existing literature. A study from the NHSBCSP demonstrated second-tier performance for the detection of neoplasm by FIT with AUC 0.63, improved to 0.66 by incorporating screening data (54). A similar study reported an equivalent AUC of 0.69 (FIT), improved to 0.76 by questionnaire-collected data (55). The advantage of a microbiome-based second-tier screen that could be performed using existing screening samples is that it would not require additional tests, nor would it place extra burden on screening participants, something which can potentially jeopardize screening uptake.
Given that we profiled the microbiome directly from gFOBT screening samples, we were interested to compare the performance of our models with the existing microbiome literature, most of which has used shotgun metagenomics and/or frozen whole stool. Performance compared favorably: meta-analyses and a systematic review reported AUCs of 0.68–0.95 (detection of colorectal cancer), and AUCs of 0.59–0.94 (detection of neoplasm—many studies, like ours, report inferior detection of neoplasms compared with colorectal cancer, due to the reduced discriminatory power of microbiome-based models to detect adenomas; refs. 5, 33, 34, 49, 50, 56–59). It is remarkable that our models performed so well in light of the fact that samples were prepared routinely by screening participants in their own homes (in the majority of instances over 3 days), transported through the routine post, stored at room temperature (for on average 1 year prior to DNA extraction), and the following variables, all of which affect the microbiome, were unknown: antibiotic/medication use, diet, comorbidities, smoking status, and body mass index (60). While this technical variability and missing information will unavoidably affect the precision of microbiome measurements feasible from gFOBT, and their applicability to general microbiome epidemiology, it is noteworthy that they do not impede gFOBT microbiome use for colorectal cancer screening. We further confirmed this in a quantitative manner, by comparing the performance of our colorectal cancer models with models built using nine external metagenomic datasets. Validation of the gFOBT-based models among studies showed similar performance and, interestingly, identification of many of the same discriminatory taxa.
These taxa included those previously described as colorectal cancer associated, including Fusobacterium, Escherichia-Shigella, Peptostreptococcus, Porphyromonas, Parvimonas, Alistipes, and Gemella, and those that have previously been shown to be inversely associated with colorectal cancer, including Faecalibacterium (61) and Lactobacillus (49). Although we limited ourselves to analysis at the genus level for simplicity, these genera contain species which have been associated with colorectal cancer, including inflammation-associated and oral-taxa: Fusobacterium nucleatum, (49) pks+Escherichia coli, (62) Peptostreptococcus stomatis, (36) Peptostreptococcus anaerobius, (35) Porphyromonas asaccharolytica, (49) Porphyromonas somerae, (33) Porphyromonas uenonis, (33) Parvimonas micra, (49) Alistipes finegoldii (49), and Gemella morbillorum (33). It is hypothesized that oral taxa may increase colonic mucosal permeability, allowing bacterial invasion, with resulting inflammation, and subsequent epithelial proliferation (63–65). Certain taxa have also been shown to be capable of inducing and/or promoting tumorigenesis: colibactin, produced by pks+Escherichia coli, is able to damage DNA, (62) while Fusobacterium nucleatum promotes tumor proliferation and a protumor inflammatory state (66). It was interesting that some (but not all) of these taxa remained colorectal cancer–enriched even in comparisons with the blood-positive colonoscopy-normal group, suggesting that certain colorectal cancer-microbiome associations may act independently of the presence of fecal blood.
Among this study's potential limitations, two stand out. The first is that participants in the blood-negative group did not undergo colonoscopy, as this would disrupt routine screening. As the sensitivity of gFOBT for colorectal cancer is estimated to be 50%, the blood-negative group may have included undiagnosed adenomas or colorectal cancer (67–69). However, because the incidence of colorectal cancer is low, the absolute number of undiagnosed colorectal cancer is predicted to have been small, with little effect on the performance of the RF models, except perhaps to have made the result more conservative. This leads to an arguably minor, but still systematic, difference between these controls and a broader population: the specific models evaluated here will underpredict non-bleeding cancers and should be further generalized prior to application. The second is that the majority of the blood-negative samples were collected within a short time frame at the beginning of the study. However, any effect due to prolonged storage prior to DNA extraction is likely to have been minimal, as DNA extraction replicates created after 6–23 months storage at room temperature demonstrated similar microbiome structures, equivalent to “same-day” DNA extraction replicates.
In addition to the refinements that would be necessary to translate these results into a screening product, including investigation of sensitivity, consistency, and cost-effectiveness analysis, future work aims to replicate the study using NHSBCSP FIT samples. The advantage of having performed the current study is that, should microbiome analysis of FIT (which collects a much smaller volume of feces) not produce adequate accuracy, a gFOBT-based microbiome screening test could still be used as an adjunct to the NHSBCSP. We also plan to investigate whether screening accuracy could be improved further by the incorporation of additional clinical data, FIT concentration, and fecal mutation, bacterial virulence-factor or toxin testing (33, 34, 49, 52, 70, 71). In conclusion, this study has confirmed that microbiome analysis can be performed on samples collected and processed routinely by a national colorectal cancer screening program to improve accuracy. Models required as few as 15 taxa, making this practical to implement as an inexpensive qPCR-based test. This could reduce the number of unnecessary colonoscopies in countries which use fecal occult blood test screening.
Authors' Disclosures
C. Young reports grants from Wellcome Trust, Cancer Research UK, and Pathological Society of Great Britain & Ireland during the conduct of the study. A. Fuentes Balaguer reports grants from Wellcome Trust, Cancer Research UK, and Pathological Society of Great Britain & Ireland during the conduct of the study. D. Bottomley reports grants from Wellcome Trust, Cancer Research UK, and Path Soc during the conduct of the study. N. Gallop reports grants from Wellcome Trust, Cancer Research UK, and Pathological Society of Great Britain and Ireland during the conduct of the study. L. Wilkinson reports grants from Wellcome Trust, Cancer Research UK, and Pathological Society of Great Britain & Ireland during the conduct of the study. C. Huttenhower reports personal fees from Seres Therapeutics outside the submitted work. P. Quirke reports grants from Wellcome Trust, Cancer Research UK, and Pathological Society of Great Britain & Ireland, and NIH senior investigator award during the conduct of the study; in addition, P. Quirke has a patent for IHC detection of markers of response to anti-EGFr therapy pending to Roche/Leeds University and is clinical adviser to the National Health Service Bowel Cancer Screening Programme and is a member of an advisory body to the National Health Service Bowel Cancer Screening Programme. No disclosures were reported by the other authors.
Authors' Contributions
C. Young: Conceptualization, formal analysis, funding acquisition, investigation, methodology, writing–original draft, project administration, writing–review and editing. H.M. Wood: Conceptualization, data curation, formal analysis, supervision, investigation, writing–original draft, writing–review and editing. A. Fuentes Balaguer: Investigation, writing–review and editing. D. Bottomley: Investigation, writing–review and editing. N. Gallop: Investigation, writing–review and editing. L. Wilkinson: Investigation, writing–review and editing. S.C. Benton: Resources, data curation, writing–review and editing. M. Brealey: Resources, data curation, writing–review and editing. C. John: Resources, data curation, writing–review and editing. C. Burtonwood: Resources, data curation, writing–review and editing. K.N. Thompson: Formal analysis, writing–original draft, writing–review and editing. Y. Yan: Formal analysis, writing–original draft, writing–review and editing. J.H. Barrett: Formal analysis, writing–review and editing. E.J.A. Morris: Conceptualization, supervision, writing–review and editing. C. Huttenhower: Formal analysis, supervision, writing–original draft, writing–review and editing. P. Quirke: Conceptualization, supervision, funding acquisition, writing–review and editing.
Acknowledgments
This work was funded by a Wellcome Trust Clinical Research Training Fellowship (203524/Z/16/Z) to C. Young, a Pathological Society of Great Britain & Ireland “Visiting Fellowship” (2234) to C. Young, and a Cancer Research UK Grand Challenge Initiative (OPTIMISTICC C10674/A27140) to P. Quirke and C. Huttenhower. P. Quirke is a National Institute of Health Research Senior Investigator.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.