Purpose:

There is potential for fecal microbiome profiling to improve colorectal cancer screening. This has been demonstrated by research studies, but it has not been quantified at scale using samples collected and processed routinely by a national screening program.

Experimental Design:

Between 2016 and 2019, the largest of the NHS Bowel Cancer Screening Programme hubs prospectively collected processed guaiac fecal occult blood test (gFOBT) samples with subsequent colonoscopy outcomes: blood-negative [n = 491 (22%)]; colorectal cancer [n = 430 (19%)]; adenoma [n = 665 (30%)]; colonoscopy-normal [n = 300 (13%)]; nonneoplastic [n = 366 (16%)]. Samples were transported and stored at room temperature. DNA underwent 16S rRNA gene V4 amplicon sequencing. Taxonomic profiling was performed to provide features for classification via random forests (RF).

Results:

Samples provided 16S amplicon-based microbial profiles, which confirmed previously described colorectal cancer–microbiome associations. Microbiome-based RF models showed potential as a first-tier screen, distinguishing colorectal cancer or neoplasm (colorectal cancer or adenoma) from blood-negative with AUC 0.86 (0.82–0.89) and AUC 0.78 (0.74–0.82), respectively. Microbiome-based models also showed potential as a second-tier screen, distinguishing from among gFOBT blood-positive samples, colorectal cancer or neoplasm from colonoscopy-normal with AUC 0.79 (0.74–0.83) and AUC 0.73 (0.68–0.77), respectively. Models remained robust when restricted to 15 taxa, and performed similarly during external validation with metagenomic datasets.

Conclusions:

Microbiome features can be assessed using gFOBT samples collected and processed routinely by a national colorectal cancer screening program to improve accuracy as a first- or second-tier screen. The models required as few as 15 taxa, raising the potential of an inexpensive qPCR test. This could reduce the number of colonoscopies in countries that use fecal occult blood test screening.

Translational Relevance

To assess the utility of microbiome profiles for national-scale colorectal cancer screening, we assessed 2,252 routinely processed NHS Bowel Cancer Screening Programme guaiac fecal occult blood test (gFOBT) samples. We generated four microbiome-based random forest classification models, each showing potential to improve accuracy. Two distinguished either colorectal cancer or neoplasm (colorectal cancer or adenoma) from gFOBT blood-negative samples (equivalent to first-tier screening). Two distinguished colorectal cancer or neoplasm from samples that had tested positive for blood by gFOBT, with participants referred for colonoscopy, but at colonoscopy no lesion was found (second-tier screening to rule out gFOBT false positives). Each model remained robust to validation and when restricted to 15 taxa, raising the possibility of an inexpensive qPCR test. The models performed favorably compared with existing microbiome studies, fecal immunochemical test, and Cologuard. These results suggest that microbiome analysis could be integrated into national colorectal cancer screening to improve accuracy and reduce the number of unnecessary screening colonoscopies.

Globally, colorectal cancer is the third most common cause of cancer deaths (1). Screening reduces mortality by detecting asymptomatic adenomas or early-stage colorectal cancer (2). Countries have adopted different screening approaches. In England, the NHS Bowel Cancer Screening Programme (NHSBCSP) tests for occult fecal blood; if detected, participants are referred for colonoscopy. Until June 2019, the NHSBCSP used the guaiac fecal occult blood test (gFOBT). Specificity is limited, with only 40% of screening colonoscopies detecting adenoma and 10% colorectal cancer (3, 4); this represents a significant cost, resource, and patient burden.

Research suggests that fecal microbiome analysis may serve as an improvement or adjunct to current colorectal cancer screening (5). However, previous studies have not yet bridged the gap between preclinical, basic scientific discovery and the population scale necessary for translation to a national screening program. These limitations were outlined in a systematic review: many had small numbers of participants (the largest had 490, of which 120 were patients with colorectal cancer); many collected samples in a manner incompatible with national screening (refrigerated/frozen samples); some used post-colonoscopy samples (bowel preparation alters the microbiome); and few had the opportunity to externally validate their models (5).

We aimed to quantify the utility of integrating microbiome analysis into a national colorectal cancer screening program by analyzing microbiome features from large numbers of routinely processed NHSBCSP gFOBT samples. Technical studies have shown that it is possible to measure a subset of clinically relevant microbiome features from gFOBT stored at room temperature (6–13). Two studies have analyzed large numbers of bowel-preparation naïve individuals, but neither performed microbiome analysis directly from screening samples; one study has performed preliminary analysis of screening fecal immunochemical test (FIT) samples, but did not determine diagnostic performance of the microbiome (14–16). To our knowledge, our study is the first to analyze microbiome features from large numbers of routinely processed gFOBT screening samples.

To reflect the aims of the NHSBCSP, we explored the potential of microbiome-based random forest (RF) models to detect colorectal cancer alone, or to detect colorectal cancer and adenoma (a group we term “neoplasm”). We investigated the potential to use these microbiome-based RF models as a first-tier screen, equivalent to the use of gFOBT; we used gFOBT blood-negative samples as the control group, as 98% of screening gFOBT yield a blood-negative result. In addition, we explored the potential to use the microbiome-based RF models as a second-tier screen; a second-tier represents an opportunity to triage those samples with a blood-positive gFOBT result, to reduce the number of unnecessary screening colonoscopies. As a second-tier screen, we explored the potential of microbiome-based RF models to distinguish gFOBT blood-positive samples associated with colorectal cancer or neoplasm, from gFOBT blood-positive samples associated with a normal colonoscopy result. We used “colonoscopy-normal” samples as the control group, as although a proportion of screening colonoscopies yield a “nonneoplastic” diagnosis (e.g., diverticulosis, nondysplastic polyp), this is a heterogeneous group. We found that microbiome-based RF models show potential as a first-tier screen for the detection of colorectal cancer [AUC 0.86 (0.82–0.89)] or neoplasm [AUC 0.78 (0.74–0.82)], and as a second-tier screen, for the detection of colorectal cancer [AUC 0.79 (0.74–0.83)] or neoplasm [AUC 0.73 (0.68–0.77)].

Study design and participants

The NHSBCSP Southern Hub prospectively collected a convenience series of routinely processed gFOBT October 2016–August 2019: this included all “blood-positive” gFOBT (blue discoloration affecting five or six squares) processed by the Southern Hub (n = 3,700), and a random sample of “blood-negative” (no blue discoloration) gFOBT (n = 530). Of the samples collected, 3,601 (85%) had complete basic clinical data recorded on the NHSBCSP database at the time of the final data extract. From this group, we selected samples to achieve sample sizes that were approximately equal across the different clinical groups (Fig. 1; Supplementary Materials and Methods).

Figure 1.

Microbiome taxonomic profiling demonstrates potential to improve colorectal cancer screening accuracy. A, Overview of the NHSBCSP and the design of this study. Briefly, we used 16S amplicon-based microbiome profiling from routinely collected gFOBT specimens to supplement first-tier (colorectal cancer/neoplasm vs. blood-negative) or second-tier (colorectal cancer/neoplasm vs. colonoscopy-normal) opportunities for early cancer screening. B, Microbiome profiles improve colorectal cancer or neoplasm classification versus blood-negative gFOBT samples (first-tier screening application) or blood-positive colonoscopy-normal samples (second-tier screening application) relative to purely clinical characteristics (age and sex). Classification used RF models and shows the performance of the “total” RF models bootstrapped from the total datasets. Shading represents the 95% CI. Clinical = RF models based on age and sex. Bacteria = RF models based on relative abundances of genera. Neoplasm = a group comprising an approximately equal ratio of colorectal cancer, low-risk adenoma, intermediate-risk adenoma, and high-risk adenoma samples.

Figure 1.

Microbiome taxonomic profiling demonstrates potential to improve colorectal cancer screening accuracy. A, Overview of the NHSBCSP and the design of this study. Briefly, we used 16S amplicon-based microbiome profiling from routinely collected gFOBT specimens to supplement first-tier (colorectal cancer/neoplasm vs. blood-negative) or second-tier (colorectal cancer/neoplasm vs. colonoscopy-normal) opportunities for early cancer screening. B, Microbiome profiles improve colorectal cancer or neoplasm classification versus blood-negative gFOBT samples (first-tier screening application) or blood-positive colonoscopy-normal samples (second-tier screening application) relative to purely clinical characteristics (age and sex). Classification used RF models and shows the performance of the “total” RF models bootstrapped from the total datasets. Shading represents the 95% CI. Clinical = RF models based on age and sex. Bacteria = RF models based on relative abundances of genera. Neoplasm = a group comprising an approximately equal ratio of colorectal cancer, low-risk adenoma, intermediate-risk adenoma, and high-risk adenoma samples.

Close modal

This enabled profiling of 2,252 samples: samples whereby hemoglobin was not detected that is, “blood-negative” [n = 491 (22%)] and “blood-positive” [n = 1,761 (78%)]. Blood-positive samples had the following colonoscopy-diagnoses: colorectal cancer [n = 430 (19%)], adenoma [n = 665 (30%)], colonoscopy-normal [n = 300 (13%)], nonneoplastic condition [n = 366 (16%)]. While the composition of our overall study group does not reflect the composition of the NHSBCSP population (2% of gFOBT are blood-positive; 10% of screening colonoscopies reveal colorectal cancer, 40% adenoma, and 50% reveal a normal colon or nonneoplastic condition), we required these respective sample numbers to adequately profile the colorectal cancer and neoplasm-associated microbiome and to train RF models (3, 4). Test statistics that are affected by disease prevalence would be different in the NHSBCSP population, for example, positive predictive value would be lower.

Samples were transported to the University of Leeds (Leeds, United Kingdom) at room temperature, and stored at room temperature prior to DNA extraction. The NHSBCSP asks participants to record the date of fecal collection; this information was available for 2,167 samples. Of these, 1,363 recorded 3 consecutive days; 95 recorded a single date (implying a single stool), and maximum duration between collections was 16 days. Time between fecal collection and DNA extraction was 46–706 days (median 374 days; Supplementary Materials and Methods). To determine whether prolonged storage at room temperature prior to DNA extraction altered results, a set of DNA extraction replicates was created. Three squares were dissected and combined to make a sample and, after a period of time (6–23 months), the alternate three squares were dissected and combined to make a replicate (n = 26 pairs). For comparison, a set of “same-day” DNA extraction replicates were created, whereby three squares of fecally loaded card were dissected and combined to make a sample and, at the same time, the alternate three squares were dissected and combined to make a replicate (n = 48 pairs).

Data were extracted from the NHSBCSP database: age, sex, screening-round, episode-outcome, and for blood-positive gFOBT: diagnosis [normal, adenoma (low-, intermediate-, or high-risk; ref. 17), colorectal cancer, nonneoplastic condition], and lesion location. In cases of more than one lesion, only the most advanced was recorded. Data are based on information collected and quality assured by Public Health England (PHE) Population Screening Programmes. Access to the data was facilitated by the PHE Office for Data Release.

The screening age is 60–74 inclusive. People older than 74 years can self-refer to the program. The study cohort contained 35 older participants (ages 75–89) and one younger participant (age 59, 1 week before their birthday).

A power calculation was performed using the R package pwr (based on a variance-stabilized linear model) using effect sizes from the Human Microbiome Project (RRID:SCR_012956) with Bonferroni correction (18). Assuming 900 samples with 50,000 reads per sample, we anticipated power 0.95 to detect a 0.055-unit difference in common taxa (0.003 relative abundance), and a 0.022-unit difference in rare taxa (0.0004 relative abundance).

Ethical approval

Tyne & Wear South REC (IRAS:188007; REC:16/NE/0210), BCSP Research Committee (BCSPID_160), Office for Data Release(ODR1617_126). Patients and the public were not involved in the study design but have since been involved in the study and will be involved in the dissemination of results.

Laboratory methods

From each developed gFOBT (Hema Screen, Immunostics, Inc), three alternate squares of fecally loaded card were dissected and processed as a combined sample. This approach subsamples a larger volume of stool, ensuring adequate material even from thinly smeared cards, and leaves three residual squares for alternative analysis or extraction replicates. DNA was extracted using a modified version of the QIAamp DNA Mini Kit protocol (Qiagen; detailed in Supplementary Materials and Methods). DNA extraction was performed in batches of up to 24 samples; to limit batch effects, batches were designed to contain samples representing the different clinical groups. Library preparation was according to the Earth Microbiome Project 16S Illumina Amplicon methodology with single PCR reactions of 20 ng DNA per sample and additional indexes to increase multiplexing capacity (19). Samples were pooled and sequenced across two runs, each comprising one lane of an Illumina HiSeq3000, for 2 × 150 bp sequencing, with a 10 bp single index read.

Bioinformatic and statistical analysis

During quality control, 16 samples had fewer than 10,000 reads and were removed from analysis. With these samples removed, read count per sample was 14,635–555,465 (median 123,265).

Reads were stripped of adaptors using cutadapt and trimmed to maximum 145 bp (20). Pairs were merged, denoised, and representative sequences chosen using DADA2 (21). Further processing was conducted in QIIME2 (version 2019.4; ref. 22). Differences of Shannon index were assessed by Kruskal–Wallis test. Taxa were assigned by the QIIME2 feature classifier using the BLAST+ algorithm (23, 24) using the SILVA version 132 99% similarity database (RRID:SCR_006423; ref. 25). Principle coordinate analysis of Bray–Curtis distances was performed. Further analysis was performed using R (version 3.5.1). Differences in beta diversity were assessed by PERMANOVA analysis of Bray–Curtis distances using Adonis (26). Differences in beta diversity between sample groups were further explored by PERMANOVA analysis of Bray–Curtis distances performed using the beta-group-significance function within QIIME2 (27). Taxa differing significantly between groups were obtained using LEfSe (linear discriminant analysis effect size; RRID:SCR_014609; ref. 28).

RF models and AUC were generated using randomForest (RRID:SCR_015718) and pROC (29–31). For the neoplasm models, the neoplasm group contained an approximately equal number of randomly selected low-, intermediate-, and high-risk adenomas and colorectal cancer. Alternate samples were assigned to test or validation models (Supplementary Table S3); when used, total sample sets were also bootstrapped by RF during training. Each forest was built with 1,000 trees. Mtry was determined on the basis of the lowest out-of-bag error. A total of 95% confidence intervals (CI) for the ROC curves and AUC were created using 2,000 stratified bootstrap replicates. AUC were compared using roc.test, using the method of DeLong (32). Confusion matrices were created using the predict function of randomForest using the default vote proportion cutoff of 50%.

Taxa were compared with nine colorectal cancer fecal metagenomic datasets (33–40), processed using MetaPhlAn version 3.0 (RRID:SCR_004915; refs. 41–43). The majority of the datasets have been comprehensively profiled in two recent meta-analyses (33, 34). Datasets were collapsed to genus level for comparison. The Thomas_c (34) and Yachida (35) datasets were merged as they originated from the same cohort. RF models were built as above, using taxa present in all datasets. For within-dataset comparisons, each study was randomly split 20 times into equal sized training and validation sets, and mean AUC recorded. For the leave-one-dataset-out (LODO), models were built using all but one dataset, and validated on the missing dataset. For each test/validation pair of cohorts, confusion matrices were created using the predict function of randomForest using the default vote proportion cutoff of 50%. Sensitivity was calculated as the proportion of colorectal cancer samples called as colorectal cancer within the validation dataset, based on the test dataset RF model. Specificity was calculated as the proportion of control samples called as control. For the self-validation comparisons, the mean sensitivity and specificity of the 20 repetitions was recorded.

To compare our gFOBT-derived biomarker with microbial taxonomic biomarkers from existing datasets, we used the genus-summarized profiles to calculate a single, meta-analyzed biomarker. This used the “metafor” R package with a random effects model incorporating standardized mean differences from these taxonomic profiles and sample sizes from all 10 datasets (including either gFOBT colorectal cancer vs. blood-negative or colorectal cancer vs. colonoscopy-normal).

Data are available at PRJEB37635 (http://www.ebi.ac.uk/ena/data/view/PRJEB37635).

Role of the funding source

The funders had no role in study design, data collection, analysis, interpretation, or writing. The corresponding author had full access to all the data and final responsibility for the decision to submit for publication.

Summary of population characteristics and microbiome profiling

We profiled the fecal microbiomes of 2,252 NHSBCSP participants using gFOBT samples, confirming that NHSBCSP gFOBT contained adequate material for V4 16S rRNA gene amplicon sequencing. Samples retained after quality control represented phenotypes of blood-negative gFOBT [n = 491 (22%)] and blood-positive [n = 1,761 (78%)]. The blood-positive samples were grouped according to subsequent colonoscopy diagnosis: colorectal cancer [n = 430 (19%)], adenoma [n = 665 (30%)], colonoscopy-normal [n = 300 (13%)], nonneoplastic diagnosis [n = 366 (16%); Table 1]. The male preponderance of colorectal cancer and adenoma samples (67% and 65%) likely reflects the male-preponderance of colorectal neoplasia (44); in later analysis, we show that sex has minimal effect on overall microbiome structure.

Table 1.

Table of participant characteristics.

Number of samples
Clinical groupMean age (SD)TotalMale (%)Female (%)
gFOBT blood-negative 67.0 (4.5) 491 (22%) 205 (42%) 286 (58%) 
gFOBT blood-positive, with the following diagnosis at colonoscopy: 
 Colorectal cancer 68.1 (5.0) 430 (19%) 289 (67%) 141 (33%) 
 Adenoma 66.3 (4.7) 665 (30%) 432 (65%) 233 (35%) 
 Normal colonoscopy 66.6 (4.3) 300 (13%) 155 (52%) 145 (48%) 
 Nonneoplastic diagnosis 66.7 (4.7) 366 (16%) 188 (51%) 178 (49%) 
Number of samples
Clinical groupMean age (SD)TotalMale (%)Female (%)
gFOBT blood-negative 67.0 (4.5) 491 (22%) 205 (42%) 286 (58%) 
gFOBT blood-positive, with the following diagnosis at colonoscopy: 
 Colorectal cancer 68.1 (5.0) 430 (19%) 289 (67%) 141 (33%) 
 Adenoma 66.3 (4.7) 665 (30%) 432 (65%) 233 (35%) 
 Normal colonoscopy 66.6 (4.3) 300 (13%) 155 (52%) 145 (48%) 
 Nonneoplastic diagnosis 66.7 (4.7) 366 (16%) 188 (51%) 178 (49%) 

Of the colorectal cancer samples, lesion data were available for 359 of 430 (83%), corresponding to 378 colorectal cancers [342 (95%) samples resulted in a single colorectal cancer being detected at colonoscopy; 17 (5%) samples resulted in more than one synchronous colorectal cancer being detected at colonoscopy]. Where type was recorded [n = 298 (79%)], the majority were adenocarcinoma [n = 297 (99%)]; and one rectal tumor was a squamous cell carcinoma (<1%). Where grade was recorded [n = 253 (67%)], the majority were well/moderately differentiated [n = 224 (89%)]; 29 (11%) were poorly differentiated. The commonest tumor location was sigmoid/rectum (Table 2). Unfortunately, tumor stage was not available. Of the nonneoplastic samples, lesion data were available for 333 of 366 (91%). Many had more than one diagnosis, the commonest being “diverticulosis” (Supplementary Materials and Methods).

Table 2.

Table of colorectal cancer locations.

CRC tumor locationNumber
Ileum 1 (<1%) 
Cecum 43 (11%) 
Ascending colon 40 (11%) 
Hepatic flexure 21 (6%) 
Transverse colon 32 (8%) 
Splenic flexure 15 (4%) 
Descending colon 12 (3%) 
Sigmoid 90 (24%) 
Rectosigmoid 27 (7%) 
Rectum 96 (25%) 
Anus 1 (<1%) 
CRC tumor locationNumber
Ileum 1 (<1%) 
Cecum 43 (11%) 
Ascending colon 40 (11%) 
Hepatic flexure 21 (6%) 
Transverse colon 32 (8%) 
Splenic flexure 15 (4%) 
Descending colon 12 (3%) 
Sigmoid 90 (24%) 
Rectosigmoid 27 (7%) 
Rectum 96 (25%) 
Anus 1 (<1%) 

Pairs of technical DNA extraction replicates extracted after prolonged storage had similar microbiome structures, equivalent to “same-day” DNA extraction replicates, confirming that time until DNA extraction has minimal effect on results (Supplementary Fig. S1).

Gut microbiome profiles of the NHSBCSP cohort

While the amount of biomass and resolution of amplicon-based taxonomic profiling from these samples was limited, it was more than sufficient to establish overall fecal microbiome structure, as well as to subsequently classify by phenotype. As expected, microbial structure was dominated by a gradient trade-off between Bacteroidetes versus Firmicutes phylum members, with beta diversity minimally influenced by clinical group (∼1% variation in microbiome structure, by Bray–Curtis PERMANOVA), and even less by sex and age (Supplementary Table S1; Fig. 2). Microbiome structure differed significantly between individual clinical groups by Bray–Curtis PERMANOVA (Supplementary Table S2). Similarly, alpha diversity was significantly higher in blood-negative and colorectal cancer samples, although with very small effect size difference between groups (Kruskal-Wallis P = 4.50 × 10–25; Supplementary Table S3; Fig. 2). This suggested a combination of both global and taxon-specific differences in the microbiome during colorectal cancer, in agreement with previous studies (45).

Figure 2.

Microbiome-based gFOBT colorectal cancer/neoplasm classification requires as few as 15 taxa and compares favorably with models built using external shotgun metagenomic datasets. A, Genus-level bacteria only “total” RF classification models were built using an increasing number of taxa of decreasing RF importance score. Shading represents the 95% CI of the AUC. Neoplasm = a group comprising an approximately equal ratio of colorectal cancer, low-risk adenoma, intermediate-risk adenoma, and high-risk adenoma samples. For each model, the AUC plateaus at approximately 15 taxa. B, Performance of the amplicon-based “colorectal cancer versus blood-negative” total RF model compared with models built using external fecal shotgun metagenomic datasets. The matrix displays cross-prediction AUCs. LODO denotes AUC generated by training a model using all but the dataset of the associated column and testing it using the dataset of that column. Within-study and cross-study performance of the “colorectal cancer versus blood-negative” model falls within the range of performances of the external models, indicating a degree of generalizability. C, Specific taxa prioritized by gFOBT amplicon-based regression models (at the genus level) are strikingly similar to genera prioritized from shotgun metagenomic taxonomic profiles in complementary populations.

Figure 2.

Microbiome-based gFOBT colorectal cancer/neoplasm classification requires as few as 15 taxa and compares favorably with models built using external shotgun metagenomic datasets. A, Genus-level bacteria only “total” RF classification models were built using an increasing number of taxa of decreasing RF importance score. Shading represents the 95% CI of the AUC. Neoplasm = a group comprising an approximately equal ratio of colorectal cancer, low-risk adenoma, intermediate-risk adenoma, and high-risk adenoma samples. For each model, the AUC plateaus at approximately 15 taxa. B, Performance of the amplicon-based “colorectal cancer versus blood-negative” total RF model compared with models built using external fecal shotgun metagenomic datasets. The matrix displays cross-prediction AUCs. LODO denotes AUC generated by training a model using all but the dataset of the associated column and testing it using the dataset of that column. Within-study and cross-study performance of the “colorectal cancer versus blood-negative” model falls within the range of performances of the external models, indicating a degree of generalizability. C, Specific taxa prioritized by gFOBT amplicon-based regression models (at the genus level) are strikingly similar to genera prioritized from shotgun metagenomic taxonomic profiles in complementary populations.

Close modal

We thus went on to identify specific taxa that were significantly enriched/depleted between clinical groups, which proved to include colorectal cancer-microbiome associations described in the existing literature. Both inflammation-associated and oral microbes were enriched, such as Escherichia-Shigella, Peptostreptococcus, Porphyromonas, Fusobacterium, and Parvimonas (Supplementary Fig. S3). Interestingly, 43 taxa were significantly enriched and 43 depleted in the blood-negative group compared with the blood-positive colonoscopy-normal group. Existing studies usually compare colorectal cancer with either healthy volunteers (equivalent to the blood-negative group) or controls with a normal colonoscopy; it is rare for both groups to be available within a study. Thus, notably, choice of control group was shown to affect which taxa were colorectal cancer–enriched relative to controls (Supplementary Fig. S3). Of the colorectal cancer–enriched taxa, seven featured in both comparisons (including Porphyromonas, Parvimonas, and Peptostreptococcus), and of the colorectal cancer–depleted taxa, only one featured in both comparisons (Anaerotruncus). An inverse association with colorectal cancer was shown for 25 taxa between the two choices of control group (including Fusobacterium and Escherichia-Shigella). These findings indicate that choice of control group can have an important bearing on results, and suggest that certain taxa (especially typically oral taxa, e.g., Porphyromonas, Parvimonas, and Peptostreptococcus) may have an association with colorectal cancer that is independent of the presence of fecal-blood (at least at the level detectable by gFOBT), whereas others (Fusobacterium and Escherichia-Shigella) may not.

Microbiome analysis of NHSBCSP samples has the potential to improve colorectal cancer screening

To determine whether microbiome profiles from NHSBCSP gFOBT samples could improve screening accuracy, we created RF classifiers using relative abundances of genera (Fig. 1). While LEfSe indicates taxa which are significantly enriched or depleted between groups, RF classifiers identify taxa which have predictive associations (28–30). We assessed four models, the first two of which investigated whether microbiome analysis could be used as a first-tier screen—that is, to distinguish colorectal cancer or neoplasm from blood-negative gFOBT. On the basis of a randomly selected 50% training-validation split, colorectal cancer outcomes were separated from blood-negative gFOBTs (colorectal cancer vs. blood-negative) with AUC 0.86 (0.82–0.89; Supplementary Tables S4–S6). The second model distinguished neoplasm (a group comprising an approximately equal ratio of colorectal cancer, low-, intermediate-, and high-risk adenoma) from blood-negative gFOBTs (neoplasm vs. blood-negative) with AUC 0.78 (0.74–0.82; Supplementary Tables S5 and S6). Neither model showed a significant difference between AUCs of the test or validation sets (Supplementary Table S5).

The next two models assessed whether microbiome profiles could distinguish, strictly among the blood-positive samples, colorectal cancer or neoplasm from subsequently colonoscopy-normal samples (i.e., a second-tier screen, to identify gFOBT false positives). As expected, these more biologically similar outcomes were more difficult to differentiate, but were still accessible via microbiome measures. The third model distinguished colorectal cancer from colonoscopy-normal gFOBT (colorectal cancer vs. colonoscopy-normal) with AUC 0.79 (0.74–0.83; Supplementary Tables S5 and S6; Supplementary Fig. S4). The last model differentiated neoplasms from colonoscopy-normal gFOBT (neoplasm vs. colonoscopy-normal) with AUC 0.73 (0.68–0.77; Supplementary Tables S5 and S6; Supplementary Fig. S4). Again, neither model showed a significant difference between AUCs of the test or validation sets (Supplementary Table S5).

All of the models performed significantly better than models generated for comparison which used age and sex. Combining age and sex with relative abundances of genera led to a small improvement in AUC for three of the models (Supplementary Table S5). Model performance remained similar after restricting the models to a small number of taxa, mimicking what might be possible by qPCR; for all four models, AUC increased as the number of taxa increased up to 15, after which the AUC approximately stabilized (Fig. 2; Supplementary Table S5; Supplementary Fig. S4). Interestingly, the 15 most important taxa for the “colorectal cancer versus blood-negative” and “colorectal cancer versus colonoscopy-normal” models featured eight of the same taxa, including Fusobacterium, Peptostreptococcus, Parvimonas, Gemella, Odoribacter, and Faecalibacterium, and three taxa (Faecalibacterium, Akkermansia, and Escherichia-Shigella) were shared between the “neoplasm versus blood-negative” and “neoplasm versus colonoscopy-normal” models (Supplementary Fig. S4). Several of the same taxa appeared in the 15 taxa most important to the “colorectal cancer versus blood-negative” and “neoplasm versus blood-negative,” and “colorectal cancer versus colonoscopy-normal” and “neoplasm versus colonoscopy-normal” models, respectively (Supplementary Fig. S4).

Finally, we compared the performance of these 16S-based RF models to similar models using existing fecal shotgun metagenomic datasets (Fig. 2; Supplementary Fig. S5; refs. 33–40). As the majority of these existing studies had only profiled colorectal cancer, we restricted the comparison with the two colorectal cancer RF models. Within-study cross-validation of the “colorectal cancer versus blood-negative” model produced an AUC of 0.86, which compared favorably with the AUCs of the external datasets (range, 0.59–0.95; Fig. 2; Supplementary Fig. S5). Between-study performance of the model also fell within the range of performances of the models built using the external datasets, and the majority of the most important taxa paralleled those of the external studies, indicating a degree of generalizability. The “colorectal cancer versus colonoscopy-normal” model had a within-study cross-validation AUC that was within the range of the models built using external datasets, but between-study validation performance was lower (Fig. 2; Supplementary Fig. S5). Taxa which were of highest importance to the model were shared by many of the models built using external datasets, indicating both their potential underlying biological importance and their ability to be consistently detected by a variety of assays.

For completeness, we also explored the ability of microbial RF models to detect adenoma. Performance was generally comparable; models distinguished colorectal cancer from adenoma with AUC 0.71 (0.66–0.76), adenoma from colonoscopy-normal with AUC 0.72 (0.67–0.77), and adenoma from blood-negative with AUC 0.84 (0.80–0.87; Supplementary Tables S7–S10). The taxa of greatest importance to the RF models included several “colorectal cancer—associated” taxa. Finally, we investigated the performance of bacteria RF models using a “colonoscopy-control” group, comprising an approximately equal ratio of nonneoplastic and colonoscopy-normal samples (Supplementary Tables S7–S10). Colorectal cancer was detected with an AUC 0.76 (0.72–0.80), similar to the RF model which used colonoscopy-normal samples alone as the control group. However, the models designed to detect adenoma and neoplasm performed inferiorly compared with RF models built using colonoscopy-normal samples alone. This could reflect the heterogeneous nature of the nonneoplastic group, or greater microbiome similarity between the adenoma and nonneoplastic groups.

To our knowledge, this is the first study to profile the microbiome of large numbers of colorectal cancer screening samples, collected and processed routinely by a national screening program, and to demonstrate the potential of microbiome analysis as an accurate adjunct to early screening. We profiled the fecal microbiome of 2,252 processed NHSBCSP gFOBT samples, representing blood-negative results, colonoscopy-normal outcomes, colorectal cancer, adenomas, and nonneoplastic diagnoses. Using RF models as a simple classification method, microbiome taxonomic profiles were able to serve as accurate first- and second-tier screens, the former separating colorectal cancer/neoplasm from blood-negative results, and the latter separating colorectal cancer/neoplasm from normal-colonoscopy results. All four microbiome-based models performed significantly better than models built using the only clinical data available—age and sex—and were robust to hold-out validation and in comparison with external data.

As a baseline for translational applications, the first-tier “colorectal cancer versus blood-negative” model performed similarly to existing screening methods. This includes those that rely on low-dimensional or high-dimensional biomarkers. For example, a meta-analysis of FIT and a separate study of FIT for colorectal cancer screening reported an AUC for the detection of colorectal cancer as high as 0.95 (46, 47). Separately, a trial of the FDA-approved Cologuard reached an AUC of 0.94 for the discrimination of colorectal cancer versus “nonadvanced neoplasia/lesser findings,” and with FIT an AUC of 0.89 (48). Our microbiome-based “neoplasm versus blood-negative” model again performed similarly (possibly superiorly) to existing methods [AUCs from the aforementioned studies of 0.72(FIT), 0.67(FIT), and 0.73(Cologuard); refs. 47, 48], although differences in the composition of the case and control groups between the studies should be borne in mind. Importantly, in comparison with Cologuard, which requires whole stool and costs approximately $600/test, amplicon-based microbiome profiling requires very little biomaterial and would be easier to translate to a national screening program. The fact that model performance required as few as 15 taxa, in agreement with existing studies, raises the potential of a rapid qPCR-based test which could be integrated into a screening program at low cost (34, 49–52). Although we were not able to assess it in our study, it has been shown that microbiome analysis is able to detect lesions missed by FIT, suggesting a potential role as an adjunct to FIT for the detection of nonbleeding colorectal cancer (53).

The second-tier models perhaps showed the greatest clinical potential, as they were able to identify colorectal cancer and neoplasms from among the blood-positive gFOBT cohort. Currently, all NHSBCSP participants with a blood-positive gFOBT are referred for colonoscopy, yet 50% reveal a normal bowel or nonneoplastic condition. The high number of unnecessary colonoscopies carries associated risks and strains endoscopy capacity. There are limited examples of second-tier screens in the existing literature. A study from the NHSBCSP demonstrated second-tier performance for the detection of neoplasm by FIT with AUC 0.63, improved to 0.66 by incorporating screening data (54). A similar study reported an equivalent AUC of 0.69 (FIT), improved to 0.76 by questionnaire-collected data (55). The advantage of a microbiome-based second-tier screen that could be performed using existing screening samples is that it would not require additional tests, nor would it place extra burden on screening participants, something which can potentially jeopardize screening uptake.

Given that we profiled the microbiome directly from gFOBT screening samples, we were interested to compare the performance of our models with the existing microbiome literature, most of which has used shotgun metagenomics and/or frozen whole stool. Performance compared favorably: meta-analyses and a systematic review reported AUCs of 0.68–0.95 (detection of colorectal cancer), and AUCs of 0.59–0.94 (detection of neoplasm—many studies, like ours, report inferior detection of neoplasms compared with colorectal cancer, due to the reduced discriminatory power of microbiome-based models to detect adenomas; refs. 5, 33, 34, 49, 50, 56–59). It is remarkable that our models performed so well in light of the fact that samples were prepared routinely by screening participants in their own homes (in the majority of instances over 3 days), transported through the routine post, stored at room temperature (for on average 1 year prior to DNA extraction), and the following variables, all of which affect the microbiome, were unknown: antibiotic/medication use, diet, comorbidities, smoking status, and body mass index (60). While this technical variability and missing information will unavoidably affect the precision of microbiome measurements feasible from gFOBT, and their applicability to general microbiome epidemiology, it is noteworthy that they do not impede gFOBT microbiome use for colorectal cancer screening. We further confirmed this in a quantitative manner, by comparing the performance of our colorectal cancer models with models built using nine external metagenomic datasets. Validation of the gFOBT-based models among studies showed similar performance and, interestingly, identification of many of the same discriminatory taxa.

These taxa included those previously described as colorectal cancer associated, including Fusobacterium, Escherichia-Shigella, Peptostreptococcus, Porphyromonas, Parvimonas, Alistipes, and Gemella, and those that have previously been shown to be inversely associated with colorectal cancer, including Faecalibacterium (61) and Lactobacillus (49). Although we limited ourselves to analysis at the genus level for simplicity, these genera contain species which have been associated with colorectal cancer, including inflammation-associated and oral-taxa: Fusobacterium nucleatum, (49) pks+Escherichia coli, (62) Peptostreptococcus stomatis, (36) Peptostreptococcus anaerobius, (35) Porphyromonas asaccharolytica, (49) Porphyromonas somerae, (33) Porphyromonas uenonis, (33) Parvimonas micra, (49) Alistipes finegoldii (49), and Gemella morbillorum (33). It is hypothesized that oral taxa may increase colonic mucosal permeability, allowing bacterial invasion, with resulting inflammation, and subsequent epithelial proliferation (63–65). Certain taxa have also been shown to be capable of inducing and/or promoting tumorigenesis: colibactin, produced by pks+Escherichia coli, is able to damage DNA, (62) while Fusobacterium nucleatum promotes tumor proliferation and a protumor inflammatory state (66). It was interesting that some (but not all) of these taxa remained colorectal cancer–enriched even in comparisons with the blood-positive colonoscopy-normal group, suggesting that certain colorectal cancer-microbiome associations may act independently of the presence of fecal blood.

Among this study's potential limitations, two stand out. The first is that participants in the blood-negative group did not undergo colonoscopy, as this would disrupt routine screening. As the sensitivity of gFOBT for colorectal cancer is estimated to be 50%, the blood-negative group may have included undiagnosed adenomas or colorectal cancer (67–69). However, because the incidence of colorectal cancer is low, the absolute number of undiagnosed colorectal cancer is predicted to have been small, with little effect on the performance of the RF models, except perhaps to have made the result more conservative. This leads to an arguably minor, but still systematic, difference between these controls and a broader population: the specific models evaluated here will underpredict non-bleeding cancers and should be further generalized prior to application. The second is that the majority of the blood-negative samples were collected within a short time frame at the beginning of the study. However, any effect due to prolonged storage prior to DNA extraction is likely to have been minimal, as DNA extraction replicates created after 6–23 months storage at room temperature demonstrated similar microbiome structures, equivalent to “same-day” DNA extraction replicates.

In addition to the refinements that would be necessary to translate these results into a screening product, including investigation of sensitivity, consistency, and cost-effectiveness analysis, future work aims to replicate the study using NHSBCSP FIT samples. The advantage of having performed the current study is that, should microbiome analysis of FIT (which collects a much smaller volume of feces) not produce adequate accuracy, a gFOBT-based microbiome screening test could still be used as an adjunct to the NHSBCSP. We also plan to investigate whether screening accuracy could be improved further by the incorporation of additional clinical data, FIT concentration, and fecal mutation, bacterial virulence-factor or toxin testing (33, 34, 49, 52, 70, 71). In conclusion, this study has confirmed that microbiome analysis can be performed on samples collected and processed routinely by a national colorectal cancer screening program to improve accuracy. Models required as few as 15 taxa, making this practical to implement as an inexpensive qPCR-based test. This could reduce the number of unnecessary colonoscopies in countries which use fecal occult blood test screening.

C. Young reports grants from Wellcome Trust, Cancer Research UK, and Pathological Society of Great Britain & Ireland during the conduct of the study. A. Fuentes Balaguer reports grants from Wellcome Trust, Cancer Research UK, and Pathological Society of Great Britain & Ireland during the conduct of the study. D. Bottomley reports grants from Wellcome Trust, Cancer Research UK, and Path Soc during the conduct of the study. N. Gallop reports grants from Wellcome Trust, Cancer Research UK, and Pathological Society of Great Britain and Ireland during the conduct of the study. L. Wilkinson reports grants from Wellcome Trust, Cancer Research UK, and Pathological Society of Great Britain & Ireland during the conduct of the study. C. Huttenhower reports personal fees from Seres Therapeutics outside the submitted work. P. Quirke reports grants from Wellcome Trust, Cancer Research UK, and Pathological Society of Great Britain & Ireland, and NIH senior investigator award during the conduct of the study; in addition, P. Quirke has a patent for IHC detection of markers of response to anti-EGFr therapy pending to Roche/Leeds University and is clinical adviser to the National Health Service Bowel Cancer Screening Programme and is a member of an advisory body to the National Health Service Bowel Cancer Screening Programme. No disclosures were reported by the other authors.

C. Young: Conceptualization, formal analysis, funding acquisition, investigation, methodology, writing–original draft, project administration, writing–review and editing. H.M. Wood: Conceptualization, data curation, formal analysis, supervision, investigation, writing–original draft, writing–review and editing. A. Fuentes Balaguer: Investigation, writing–review and editing. D. Bottomley: Investigation, writing–review and editing. N. Gallop: Investigation, writing–review and editing. L. Wilkinson: Investigation, writing–review and editing. S.C. Benton: Resources, data curation, writing–review and editing. M. Brealey: Resources, data curation, writing–review and editing. C. John: Resources, data curation, writing–review and editing. C. Burtonwood: Resources, data curation, writing–review and editing. K.N. Thompson: Formal analysis, writing–original draft, writing–review and editing. Y. Yan: Formal analysis, writing–original draft, writing–review and editing. J.H. Barrett: Formal analysis, writing–review and editing. E.J.A. Morris: Conceptualization, supervision, writing–review and editing. C. Huttenhower: Formal analysis, supervision, writing–original draft, writing–review and editing. P. Quirke: Conceptualization, supervision, funding acquisition, writing–review and editing.

This work was funded by a Wellcome Trust Clinical Research Training Fellowship (203524/Z/16/Z) to C. Young, a Pathological Society of Great Britain & Ireland “Visiting Fellowship” (2234) to C. Young, and a Cancer Research UK Grand Challenge Initiative (OPTIMISTICC C10674/A27140) to P. Quirke and C. Huttenhower. P. Quirke is a National Institute of Health Research Senior Investigator.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Ferlay
J
,
Ervik
M
,
Lam
F
,
Colombet
M
,
Mery
L
,
Piñeros
M
, et al
Global cancer observatory: cancer today
.
Lyon, France
:
International Agency for Research on Cancer
; 
2018
. Available from: https://gco.iarc.fr/today.
2.
Koo
S
,
Neilson
LJ
,
Von Wagner
C
,
Rees
CJ
. 
The NHS Bowel Cancer Screening Program: current perspectives on strategies for improvement
.
Risk Manag Healthc Policy
2017
;
10
:
177
87
.
3.
Public Health England. 
Bowel cancer screening: the facts (FOB test kit)
. Available from: https://www.gov.uk/government/publications/bowel-cancer-screening-benefits-and-risks.
4.
Scottish Bowel Screening Programme Statistics: for invitations between 1 May 2016 and 30 April 2018
. 
2019
.
Available from
: https://www.isdscotland.org/Health-Topics/Cancer/Publications/2019-02-05/2019-02-05-Bowel-Screening-Publication-Summary.pdf.
5.
Amitay
EL
,
Krilaviciute
A
,
Brenner
H
. 
Systematic review: gut microbiota in fecal samples and detection of colorectal neoplasms
.
Gut Microbes
2018
;
9
:
293
307
.
6.
Vogtmann
E
,
Chen
J
,
Amir
A
,
Shi
J
,
Abnet
CC
,
Nelson
H
, et al
Comparison of collection methods for fecal samples in microbiome studies
.
Am J Epidemiol
2017
;
185
:
115
23
.
7.
Sinha
R
,
Chen
J
,
Amir
A
,
Vogtmann
E
,
Shi
J
,
Inman
KS
, et al
Collecting fecal samples for microbiome analyses in epidemiology studies
.
Cancer Epidemiol Biomarkers Prev
2016
;
25
:
407
16
.
8.
Dominianni
C
,
Wu
J
,
Hayes
RB
,
Ahn
J
. 
Comparison of methods for fecal microbiome biospecimen collection
.
BMC Microbiol
2014
;
14
:
103
.
9.
Wong
WSW
,
Clemency
N
,
Klein
E
,
Provenzano
M
,
Iyer
R
,
Niederhuber
JE
, et al
Collection of non-meconium stool on fecal occult blood cards is an effective method for fecal microbiota studies in infants
.
Microbiome
2017
;
5
:
114
.
10.
Taylor
M
,
Wood
HM
,
Halloran
SP
,
Quirke
P
. 
Examining the potential use and long-term stability of guaiac faecal occult blood test cards for microbial DNA 16S rRNA sequencing
.
J Clin Pathol
2017
;
70
:
600
6
.
11.
Vogtmann
E
,
Chen
J
,
Kibriya
MG
,
Chen
Yu
,
Islam
T
,
Eunes
M
, et al
Comparison of fecal collection methods for microbiota studies in Bangladesh
.
Appl Environ Microbiol
2017
;
83
:
e00361
17
.
12.
von Huth
S
,
Thingholm
LB
,
Bang
C
,
Rühlemann
MC
,
Franke
A
,
Holmskov
U
. 
Minor compositional alterations in faecal microbiota after five weeks and five months storage at room temperature on filter papers
.
Sci Rep
2019
;
9
:
19008
.
13.
Byrd
DA
,
Sinha
R
,
Hoffman
KL
,
Chen
J
,
Hua
X
,
Shi
J
, et al
Comparison of methods to collect fecal samples for microbiome studies using whole-genome shotgun metagenomic sequencing
.
mSphere
2020
;
5
:
e00827
19
.
14.
Amitay
EL
,
Werner
S
,
Vital
M
,
Pieper
DH
,
Höfler
D
,
Gierse
I-J
, et al
Fusobacterium and colorectal cancer: Causal factor or passenger? Results from a large colorectal cancer screening study
.
Carcinogenesis
2017
;
38
:
781
8
.
15.
Eklöf
V
,
Löfgren-Burström
A
,
Zingmark
C
,
Edin
S
,
Larsson
P
,
Karling
P
, et al
Cancer-associated fecal microbial markers in colorectal cancer detection
.
Int J Cancer
2017
;
141
:
2528
36
.
16.
Grobbee
EJ
,
Lam
SY
,
Fuhler
GM
,
Blakaj
B
,
Konstantinov
SR
,
Bruno
MJ
, et al
First steps towards combining faecal immunochemical testing with the gut microbiome in colorectal cancer screening
.
United European Gastroenterol J
2020
;
8
:
293
302
.
17.
Logan
RFA
,
Patnick
J
,
Nickerson
C
,
Coleman
L
,
Rutter
MD
,
von Wagner
C
, et al
Outcomes of the Bowel Cancer Screening Programme (BCSP) in England after the first 1 million tests
.
Gut
2012
;
61
:
1439
46
.
18.
Human Microbiome Project Consortium
. 
A framework for human microbiome research
.
Nature
2012
;
486
:
215
21
.
19.
Earth microbiome project
.
Available from
: http://www.earthmicrobiome.org.
20.
Martin
M
. 
Cutadapt removes adapter sequences from high-throughput sequencing reads
.
EMBnet J
2011
;
17
:
10
2
.
21.
Callahan
BJ
,
McMurdie
PJ
,
Rosen
MJ
,
Han
AW
,
Johnson
AJoA
,
Holmes
SP
. 
DADA2: high-resolution sample inference from Illumina amplicon data
.
Nat Methods
2016
;
13
:
581
3
.
22.
Bolyen
E
,
Rideout
JR
,
Dillon
MR
,
Bokulich
NA
,
Abnet
CC
,
Al-Ghalith
GA
, et al
Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2
.
Nat Biotechnol
2019
;
37
:
852
7
.
23.
Bokulich
NA
,
Kaehler
BD
,
Rideout
JR
,
Dillon
M
,
Bolyen
E
,
Knight
R
, et al
Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2′s q2-feature-classifier plugin
.
Microbiome
2018
;
6
:
90
.
24.
Camacho
C
,
Coulouris
G
,
Avagyan
V
,
Ma
N
,
Papadopoulos
J
,
Bealer
K
, et al
BLAST+: architecture and applications
.
BMC Bioinformatics
2009
;
10
:
421
.
25.
Quast
C
,
Pruesse
E
,
Yilmaz
P
,
Gerken
J
,
Schweer
T
,
Yarza
P
, et al
The SILVA ribosomal RNA gene database project: improved data processing and web-based tools
.
Nucleic Acids Res
2012
;
41
:
D590
D96
.
26.
Oksanen
J
,
Guillaume Blanchet
F
,
Friendly
M
,
Kindt
R
,
Legendre
P
,
McGlinn
D
, et al
vegan: community ecology package
; 
2018
.
Available from
: https://CRAN.R-project.org/package=vegan.
27.
Anderson
MJ
. 
A new method for non-parametric multivariate analysis of variance
.
Austral Ecol
2001
;
26
:
32
46
.
28.
Segata
N
,
Izard
J
,
Waldron
L
,
Gevers
D
,
Miropolsky
L
,
Garrett
WS
, et al
Metagenomic biomarker discovery and explanation
.
Genome Biol
2011
;
12
:
R60
.
29.
Breiman
L
. 
Random forests
.
Mach Learn
2001
;
45
:
5
32
.
30.
Liaw
A
,
Wiener
M
. 
Classification and regression by randomForest
.
R News
2002
;
2
:
18
22
.
31.
Robin
X
,
Turck
N
,
Hainard
A
,
Tiberti
N
,
Lisacek
F
,
Sanchez
J-C
, et al
pROC: an open-source package for R and S+ to analyze and compare ROC curves
.
BMC Bioinformatics
2011
;
12
:
77
.
32.
DeLong
ER
,
DeLong
DM
,
Clarke-Pearson
DL
. 
Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach
.
Biometrics
1988
;
44
:
837
45
.
33.
Wirbel
J
,
Pyl
PT
,
Kartal
E
,
Zych
K
,
Kashani
A
,
Milanese
A
, et al
Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer
.
Nat Med
2019
;
25
:
679
89
.
34.
Thomas
AM
,
Manghi
P
,
Asnicar
F
,
Pasolli
E
,
Armanini
F
,
Zolfo
M
, et al
Metagenomic analysis of colorectal cancer datasets identifies cross-cohort microbial diagnostic signatures and a link with choline degradation
.
Nat Med
2019
;
25
:
667
78
.
35.
Yachida
S
,
Mizutani
S
,
Shiroma
H
,
Shiba
S
,
Nakajima
T
,
Sakamoto
T
, et al
Metagenomic and metabolomic analyses reveal distinct stage-specific phenotypes of the gut microbiota in colorectal cancer
.
Nat Med
2019
;
25
:
968
76
.
36.
Gupta
A
,
Dhakan
DB
,
Maji
A
,
Saxena
R
,
PK
VP
,
Mahajan
S
, et al
Association of flavonifractor plautii, a flavonoid-degrading bacterium, with the gut microbiome of colorectal cancer patients in India
.
mSystems
2019
;
4
:
e00438
19
.
37.
Feng
Q
,
Liang
S
,
Jia
H
,
Stadlmayr
A
,
Tang
L
,
Lan
Z
, et al
Gut microbiome development along the colorectal adenoma-carcinoma sequence
.
Nat Commun
2015
;
6
:
6528
.
38.
Vogtmann
E
,
Hua
X
,
Zeller
G
,
Sunagawa
S
,
Voigt
AY
,
Hercog
R
, et al
Colorectal cancer and the human gut microbiome: reproducibility with whole-genome shotgun sequencing
.
PLoS One
2016
;
11
:
e0155362
.
39.
Yu
J
,
Feng
Q
,
Wong
SH
,
Zhang
D
,
Liang
QYi
,
Qin
Y
, et al
Metagenomic analysis of faecal microbiome as a tool towards targeted non-invasive biomarkers for colorectal cancer
.
Gut
2017
;
66
:
70
8
.
40.
Zeller
G
,
Tap
J
,
Voigt
AY
,
Sunagawa
S
,
Kultima
JR
,
Costea
PI
, et al
Potential of fecal microbiota for early-stage detection of colorectal cancer
.
Mol Syst Biol
2014
;
10
:
766
.
41.
Segata
N
,
Waldron
L
,
Ballarini
A
,
Narasimhan
V
,
Jousson
O
,
Huttenhower
C
. 
Metagenomic microbial community profiling using unique clade-specific marker genes
.
Nat Methods
2012
;
9
:
811
14
.
42.
Pasolli
E
,
Truong
DT
,
Malik
F
,
Waldron
L
,
Segata
N
. 
Machine learning meta-analysis of large metagenomic datasets: tools and biological insights
.
PLoS Comput Biol
2016
;
12
:
e1004977
.
43.
Bernau
C
,
Riester
M
,
Boulesteix
A-L
,
Parmigiani
G
,
Huttenhower
C
,
Waldron
L
, et al
Cross-study validation for the assessment of prediction algorithms
.
Bioinformatics
2014
;
30
:
i105
12
.
44.
White
A
,
Ironmonger
L
,
Steele
RJC
,
Ormiston-Smith
N
,
Crawford
C
,
Seims
A
. 
A review of sex-related differences in colorectal cancer incidence, screening uptake, routes to diagnosis, cancer stage and survival in the UK
.
BMC Cancer
2018
;
18
:
906
.
45.
Yan
Y
,
Drew
DA
,
Markowitz
A
,
Lloyd-Price
J
,
Abu-Ali
G
,
Nguyen
LH
, et al
Structure of the mucosal and stool microbiome in lynch syndrome
.
Cell Host Microbe
2020
;
27
:
585
600
.
46.
Lee
JK
,
Liles
EG
,
Bent
S
,
Levin
TR
,
Corley
DA
. 
Accuracy of fecal immunochemical tests for colorectal cancer: systematic review and meta-analysis
.
Ann Intern Med
2014
;
160
:
171
.
47.
Brenner
H
,
Chen
H
. 
Fecal occult blood versus DNA testing: indirect comparison in a colorectal cancer screening population
.
Clin Epidemiol
2017
;
9
:
377
84
.
48.
Imperiale
TF
,
Ransohoff
DF
,
Itzkowitz
SH
,
Levin
TR
,
Lavin
P
,
Lidgard
GP
, et al
Multitarget stool DNA testing for colorectal-cancer screening
.
N Engl J Med
2014
;
370
:
1287
97
.
49.
Dai
Z
,
Coker
OO
,
Nakatsu
G
,
Wu
WKK
,
Zhao
L
,
Chen
Z
, et al
Multi-cohort analysis of colorectal cancer metagenome identified altered bacteria across populations and universal bacterial markers
.
Microbiome
2018
;
6
:
70
.
50.
Sze
MA
,
Schloss
PD
. 
Leveraging existing 16S rRNA gene surveys to identify reproducible biomarkers in individuals with colorectal tumors
.
mBio
2018
;
9
:
e00630
18
.
51.
Ai
D
,
Pan
H
,
Li
X
,
Gao
Y
,
Liu
G
,
Xia
LiC
. 
Identifying gut microbiota associated with colorectal cancer using a zero-inflated lognormal model
.
Front Microbiol
2019
;
10
:
826
.
52.
Gao
R
,
Wang
Z
,
Li
H
,
Cao
Z
,
Gao
Z
,
Chen
H
, et al
Gut microbiota dysbiosis signature is associated with the colorectal carcinogenesis sequence and improves the diagnosis of colorectal lesions
.
J Gastroenterol Hepatol
2020
;
35
:
2109
21
.
53.
Baxter
NT
,
Ruffin
MT
,
Rogers
MAM
,
Schloss
PD
. 
Microbiota-based model improves the sensitivity of fecal immunochemical test for detecting colonic lesions
.
Genome Med
2016
;
8
:
37
.
54.
Cooper
JA
,
Parsons
N
,
Stinton
C
,
Mathews
C
,
Smith
S
,
Halloran
SP
, et al
Risk-adjusted colorectal cancer screening using the FIT and routine screening data: development of a risk prediction model
.
Br J Cancer
2018
;
118
:
285
93
.
55.
Stegeman
I
,
de Wijkerslooth
TR
,
Stoop
EM
,
van Leerdam
ME
,
Dekker
E
,
van Ballegooijen
M
, et al
Combining risk factors with faecal immunochemical test outcome for selecting CRC screenees for colonoscopy
.
Gut
2014
;
63
:
466
71
.
56.
Zhang
B
,
Xu
S
,
Xu
W
,
Chen
Q
,
Chen
Z
,
Yan
C
, et al
Leveraging fecal bacterial survey data to predict colorectal tumors
.
Front Genet
2019
;
10
:
447
.
57.
Shah
MS
,
DeSantis
TZ
,
Weinmaier
T
,
McMurdie
PJ
,
Cope
JL
,
Altrichter
A
, et al
Leveraging sequence-based faecal microbial community survey data to identify a composite biomarker for colorectal cancer
.
Gut
2018
;
67
:
882
91
.
58.
Huang
Q
,
Peng
Y
,
Xie
F
. 
Fecal fusobacterium nucleatum for detecting colorectal cancer: a systematic review and meta-analysis
.
Int J Biol Markers
2018
;
33
:
345
52
.
59.
Zhang
X
,
Zhu
X
,
Cao
Y
,
Fang
J-Y
,
Hong
J
,
Chen
H
. 
Fecal Fusobacterium nucleatum for the diagnosis of colorectal tumor: a systematic review and meta-analysis
.
Cancer Med
2019
;
8
:
480
91
.
60.
Zhernakova
A
,
Kurilshikov
A
,
Bonder
MJ
,
Tigchelaar
EF
,
Schirmer
M
,
Vatanen
T
, et al
Population-based metagenomics analysis reveals markers for gut microbiome composition and diversity
.
Science
2016
;
352
:
565
9
.
61.
Guo
S
,
Li
L
,
Xu
B
,
Li
M
,
Zeng
Q
,
Xiao
H
, et al
A simple and novel fecal biomarker for colorectal cancer: ratio of Fusobacterium nucleatum to probiotics populations, based on their antagonistic effect
.
Clin Chem
2018
;
64
:
1327
37
.
62.
Pleguezuelos-Manzano
C
,
Puschhof
J
,
Rosendahl Huber
A
,
van Hoeck
A
,
Wood
HM
,
Nomburg
J
, et al
Mutational signature in colorectal cancer caused by genotoxic pks+ E. coli
.
Nature
2020
;
580
:
269
73
.
63.
Dejea
CM
,
Wick
EC
,
Hechenbleikner
EM
,
White
JR
,
Mark Welch
JL
,
Rossetti
BJ
, et al
Microbiota organization is a distinct feature of proximal colorectal cancers
.
Proc Natl Acad Sci U S A
2014
;
111
:
18321
6
.
64.
Drewes
JL
,
White
JR
,
Djea
CM
,
Fathi
P
,
Iyadorai
T
,
Vadivelu
J
, et al
High-resolution bacterial 16S rRNA gene profile meta-analysis and biofilm status reveal common colorectal cancer consortia
.
NPJ Biofilms Microbiomes
2017
;
3
:
34
.
65.
Tomkovich
S
,
Dejea
CM
,
Winglee
K
,
Drewes
JL
,
Chung
L
,
Housseau
F
, et al
Human colon mucosal biofilms from healthy or colon cancer hosts are carcinogenic
.
J Clin Invest
2019
;
130
:
1699
712
.
66.
Brennan
CA
,
Garrett
WS
. 
Fusobacterium nucleatum - symbiont, opportunist and oncobacterium
.
Nat Rev Microbiol
2019
;
17
:
156
66
.
67.
Moss
S
,
Mathews
C
,
Day
TJ
,
Smith
S
,
Seaman
HE
,
Snowball
J
, et al
Increased uptake and improved outcomes of bowel cancer screening with a faecal immunochemical test: results from a pilot study within the national screening programme in England
.
Gut
2017
;
66
:
1631
44
.
68.
Blanks
R
,
Burón Pust
A
,
Alison
R
,
He
E
,
Barnes
I
,
Patnick
J
, et al
Screen-detected and interval colorectal cancers in England: associations with lifestyle and other factors in women in a large UK prospective cohort
.
Int J Cancer
2019
;
145
:
728
34
.
69.
Morris
EJA
,
Whitehouse
LE
,
Farrell
T
,
Nickerson
C
,
Thomas
JD
,
Quirke
P
, et al
A retrospective observational study examining the characteristics and outcomes of tumours diagnosed within and without of the English NHS Bowel Cancer Screening Programme
.
Br J Cancer
2012
;
107
:
757
64
.
70.
Zhao
D
,
Liu
H
,
Zheng
Y
,
He
Y
,
Lu
D
,
Lyu
C
. 
A reliable method for colorectal cancer prediction based on feature selection and support vector machine
.
Med Biol Eng Comput
2019
;
57
:
901
12
.
71.
Zhai
R-L
,
Xu
F
,
Zhang
P
,
Zhang
W-Li
,
Wang
H
,
Wang
Ji-L
, et al
The diagnostic performance of stool DNA testing for colorectal cancer: a systematic review and meta-analysis
.
Medicine
2016
;
95
:
e2129
.

Supplementary data