Background:

Human microbiota have many functions that could contribute to cancer initiation and/or progression at local sites, yet the relation of the lung microbiota to lung cancer prognosis has not been studied.

Methods:

In a pilot study, 16S rRNA gene sequencing was performed on paired lung tumor and remote normal samples from the same lobe/segment in 19 patients with non–small cell lung cancer (NSCLC). We explored associations of tumor or normal tissue microbiome diversity and composition with recurrence-free (RFS) and disease-free survival (DFS), and compared microbiome diversity and composition between paired tumor and normal samples.

Results:

Higher richness and diversity in normal tissue were associated with reduced RFS (richness P = 0.08, Shannon index P = 0.03) and DFS (richness P = 0.03, Shannon index P = 0.02), as was normal tissue overall microbiome composition (Bray–Curtis P = 0.09 for RFS and P = 0.02 for DFS). In normal tissue, greater abundance of family Koribacteraceae was associated with increased RFS and DFS, whereas greater abundance of families Bacteroidaceae, Lachnospiraceae, and Ruminococcaceae were associated with reduced RFS or DFS (P < 0.05). Tumor tissue diversity and overall composition were not associated with RFS or DFS. Tumor tissue had lower richness and diversity (P ≤ 0.0001) than paired normal tissue, though overall microbiome composition did not differ between the paired samples.

Conclusions:

We demonstrate, for the first time, a potential relationship between the normal lung microbiota and lung cancer prognosis, which requires confirmation in a larger study.

Impact:

Definition of bacterial biomarkers of prognosis may lead to improved survival outcomes for patients with lung cancer.

Lung cancer is the most common cancer, excluding nonmelanoma skin cancer, and the most common cause of cancer-related death in the world, with approximately 1.8 million diagnoses and 1.6 million deaths per year (1). Although incidence rates for lung cancer have been declining in the United States due to reductions in smoking, challenges in early detection have left lung cancer as the leading cause of U.S. cancer-related deaths (5-year survival rate 18%, on average, in the United States; ref. 2). Non–small cell lung cancer (NSCLC), the most common form of lung cancer, is typically treated at the early stages with surgical resection, with or without chemotherapy or chemoradiotherapy (3). These early-stage cancers have better 5-year survival rates (50%–90%), however a substantial proportion of patients still die of disease recurrence (4). Improvements in early detection with low-dose CT (5) will inevitably increase the identification of early-stage lung cancers and offer more opportunities for curative resection, making it extremely timely to investigate factors contributing to long-term disease-free survival (DFS) following resection. Better identification of patients with early-stage at high risk of recurrence can improve survival by indicating which patients may benefit from increased surveillance and/or adjuvant therapy.

The healthy human lung is host to a unique and dynamic bacterial community, determined by bidirectional movement of nonsterile air and mucus in and out of the airways (6). In lung disease, regional changes in the lung environment create permissive niches for bacterial growth, resulting in significant differences in community composition between healthy and diseased lungs (7). Studies have explored the oral or airway microbiome in lung cancer cases compared with controls (8–12), noting lower microbial diversity and altered abundance of specific bacterial groups in cases. However, few studies have characterized the microbiome in lung tumor tissue (13, 14), and no studies have explored the relationship between the microbiome of resected lung tissue and lung cancer prognosis. Bacteria have many functions that could contribute to cancer initiation and/or progression at local sites, including genotoxic pathways, bacterial metabolite signaling, and induction of host inflammatory pathways (15). Investigation of potential bacterial involvement in lung cancer prognosis may lead to new biomarkers and therapies to improve survival outcomes for patients with lung cancer.

We conducted a pilot study of paired tumor and remote normal lung tissue samples from 19 patients with NSCLC at NYU Langone Health (New York, NY), 17 of them with prospective follow up. Using 16S rRNA gene sequencing, we explored whether the tumor or normal lung microbiome was associated with recurrence-free survival (RFS) and DFS, and compared the lung microbiome of paired tumor and normal samples.

Patients and sample collection

Samples were selected from the NYU Thoracic Surgery Archives (NTSA). Established in 2006, the NTSA has prospectively collected serum, plasma, buffy coat, peripheral blood mononuclear cells, along with lung cancer and matching normal lung specimens under the Institutional Review Board–approved 8896 protocol. Patients identified on preoperative workup as having a pulmonary nodule suspicious for lung cancer were consented for collection of blood and snap frozen tissues (tumor and remote lung from the same lobe/segment) in the operating room at the time of their resection. Lung and matching tumor are sterilely cut at the operating room table, transferred to prelabeled nunc vials and immediately snap frozen in liquid nitrogen within 10 minutes of resection. Samples are deidentified for storage at −80°C until use. Because these specimens remain sterile and are immediately frozen, they are ideal for microbiome analysis, as immediate freezing does not impact microbiome composition (16). Less is known regarding long-term (i.e., years) storage at −80°C, which may impact certain aspects of the microbiome (17, 18); however, the length of storage time in our samples was not associated with overall microbiome diversity and composition (α- and β-diversity).

Clinical and pathologic demographics are recorded in an encrypted Research Electronic Data Capture spreadsheet. Patients are seen at 3-month intervals for 2 years, and then at 6-month intervals for 1 year, and then annually, with CT scans performed for surveillance to document any systemic and loco-regional recurrences, or the development of a second primary tumor. The 19 patients' samples included in this study were originally chosen as pilot samples to test whether sufficient material was available for DNA extraction from tumor and matching normal lung. The samples were also chosen to represent patients with different stages of NSCLC and patients with recurrence, to explore the lung microbiome in relation to these factors.

Definitions

Endpoints were defined according to the consensus agreement in Punt and colleagues (19). DFS includes recurrences (loco-regional and systemic), new primaries (same or other cancer), and death from any cause as events. RFS includes recurrences (loco-regional and systemic) and death from any cause as events, ignoring new primaries as events. For both endpoints, person time is defined as time from surgery to event or loss to follow-up (censored).

Microbiome assay

Lung tissue samples underwent 16S rRNA gene sequencing at the Environmental Sample Preparation and Sequencing Facility at Argonne National Laboratory. DNA extraction and amplification steps occurred in two batches (batch 1: 10 samples and batch 2: 28 samples; tumor–normal pairs from same patient kept together), but all samples were sequenced in the same batch. DNA was extracted from tissue using the Mo Bio PowerSoil DNA Isolation Kit, following the manufacturer's protocol. This protocol uses mechanical bead beating and chemical methods to achieve sample homogenization and cell lysis, ensuring that sample features do not interfere with DNA extraction. The V4 region of the 16S rRNA gene was PCR amplified with the 515F/806R primer pair, which included sequencer adapter sequences used in the Illumina flowcell and sample-specific barcodes (20, 21). Each 25-μL PCR reaction contained 9.5 μL of Mo Bio PCR Water (Certified DNA-Free), 12.5 μL of QuantaBio's AccuStart II PCR ToughMix (2× concentration, 1× final), 1 μL Golay barcode-tagged Forward Primer (5 μmol/L concentration, 200 pmol/L final), 1 μL Reverse Primer (5 μmol/L concentration, 200 pmol/L final), and 1 μL of template DNA. The conditions for PCR were as follows: 94°C for 3 minutes to denature the DNA, with 35 cycles at 94°C for 45 seconds, 50°C for 60 seconds, and 72°C for 90 seconds; with a final extension of 10 minutes at 72°C. PCR products were quantified using PicoGreen (Invitrogen) and a plate reader (Infinite 200 PRO, Tecan). Each batch included two extraction blanks and 10 amplification blanks, all of which did not amplify. In addition, amplification levels for samples were in the same range for both batches. Sample PCR products were then pooled in equimolar amounts, purified using AMPure XP Beads (Beckman Coulter), and then quantified using a fluorometer (Qubit, Invitrogen). Molarity was then diluted to 2 nmol/L, denatured, and then diluted to a final concentration of 6.75 pmol/L with a 10% PhiX spike for sequencing on the Illumina MiSeq. Amplicons were sequenced on a 151 bp × 12 bp × 151 bp MiSeq run (21).

Sequence read processing

Sequence reads were processed using QIIME 2 (22). Briefly, sequence reads were demultiplexed and paired-end reads were joined, followed by quality filtering as described in Bokulich and colleagues (23). Next, the Deblur workflow was applied, which uses sequence error profiles to obtain putative error-free sequences, referred to as “sub” operational taxonomic units (s-OTU; ref. 24). s-OTUs were assigned taxonomy using a naïve Bayes classifier pretrained on the Greengenes (25) 13_8 99% OTUs, where the sequences have been trimmed to only include 250 bases from the 16S V4 region, bound by the 515F/806R primer pair. A phylogenetic tree was constructed via sequence alignment with MAFFT (26), filtering the alignment, and applying FastTree (27) to generate the tree. One tumor sample without detectable s-OTUs was dropped, leaving 37 samples (19 normal, 18 tumor) from 19 patients for final analysis. The number of sequence reads per sample prior to the Deblur workflow was similar in tumor compared with the normal tissue samples (Wilcoxon signed-rank P = 0.61), and marginally higher in batch 1 compared with the batch 2 (Wilcoxon rank-sum P = 0.08; Supplementary Fig. S1). Because of the amplification of human mitochondrial DNA in these tissue samples, the majority of sequence reads belonged to the human mitochondria and were dropped when not matching to the bacterial 16S database during Deblur. The number of sequence reads per sample after the Deblur workflow was marginally lower in tumor compared with normal tissue samples (Wilcoxon signed-rank P = 0.09), and higher in batch 1 compared with batch 2 (Wilcoxon rank-sum P = 0.001; Supplementary Table S1; Supplementary Fig. S1).

α-diversity

α-diversity (within-sample microbiome diversity) was assessed using richness (number of s-OTUs) and the Shannon diversity index, calculated in 100 iterations for rarefied s-OTU tables [(63 sequence reads per sample (lowest sequencing depth among samples)] using the QIIME 2 diversity plugin. Rarefaction curves suggested that this depth reflected the general ranking of community richness and diversity of the samples (Supplementary Fig. S2). We used Cox proportional hazard models to determine whether α-diversity was associated with RFS and DFS. We examined whether α-diversity differed between paired tumor and normal samples using the Wilcoxon signed-rank test.

β-diversity

β-diversity (between-sample microbiome diversity) was assessed using unweighted and weighted UniFrac distances (28), the Bray–Curtis dissimilarity, and the Jaccard index. Principal coordinate analysis (29) was used for visualization. The community-level test of association between the microbiota and survival times (MiRKAT-S; ref. 30) and the optimal microbiome-based survival analysis test (OMiSA; ref. 31) were used to test the association of overall bacterial composition with RFS and DFS. We also assigned samples to clusters by applying Ward Hierarchical Agglomerative Clustering method (32) to the distance matrices, and then tested whether these clusters were related RFS and DFS using log-rank tests. Permutational multivariate ANOVA (33) was used to examine statistically whether overall bacterial composition differed between paired tumor and normal samples, using patient ID as strata. We also compared between-pair distances in overall bacterial composition for tumor and normal tissue sample pairs with distances for all possible pairings of tumor and normal samples from different subjects (i.e., true pairs vs. not-true pairs) using the Wilcoxon rank-sum test, to determine whether true sample pairs are more similar to each other than random pairings. These analyses were performed with and without rarefying s-OTU tables to an even depth (63 sequence reads per sample), as β-diversity can be sensitive to sequencing depth (34).

Differential abundance

Relative abundance of s-OTUs (total sum scaling) was calculated, and s-OTUs were agglomerated to phylum, class, order, family, genus, and species levels. We filtered taxa to include in analysis only those present in 25% of the samples. We used Cox proportional hazard models to assess whether taxa centered log ratio (clr)-transformed (35, 36) abundance or carriage was associated with RFS and DFS. We used the Wilcoxon signed-rank test and McNemar test to assess differences in taxon relative abundance and carriage, respectively, between paired tumor and normal samples. P values were adjusted for the false discovery rate.

Sensitivity analyses

We checked whether results for overall α-diversity and β-diversity were consistent when restricting to adenocarcinoma cases only, restricting to patients in the larger extraction batch (batch 2), excluding stage III and IV cases, excluding current smokers, and excluding samples with low sequencing depths (≤124 reads/sample). We did not perform analyses within other histology groups or the smaller extraction batch due to small sample size (n = 4 squamous cell carcinoma, n = 1 sarcomatoid carcinoma, n = 5 in batch 1).

Patient characteristics

Demographic and clinical characteristics of the 19 patients are presented in Table 1. The average patient age was 71.6 years old, and 37% were male, 100% were white, and 95% formerly or currently smoked. The majority of patients had lung adenocarcinomas (74%), whereas a minority had other histologic types (squamous cell carcinoma 21%; sarcomatoid carcinoma 5%). Two patients with no follow-up due to postoperative death were excluded from survival analysis; of the remaining 17 patients, 3 had new primaries and 9 had recurrences (loco-regional or systemic) during follow-up (follow-up times ranged from 1–12 years).

Table 1.

Characteristics of 19 patients with lung cancer

AllDisease-freeRecurrenceNew primary
Characteristic(n = 19)(n = 5)(n = 9)(n = 3)
Age, mean ± SD 71.6 ± 6.7 73.6 ± 6.3 73.6 ± 6.3 64.3 ± 6.3 
Male, n (%) 7 (36.8) 3 (60.0) 2 (22.2) 0 (0) 
White, n (%) 19 (100.0) 5 (100.0) 9 (100.0) 3 (100.0) 
Smoking status, n (%) 
 Never 1 (5.3) 1 (20.0) 0 (0) 0 (0) 
 Former 16 (84.2) 4 (80.0) 8 (88.9) 2 (66.7) 
 Current 2 (10.5) 0 (0) 1 (11.1) 1 (33.3) 
Histology, n (%) 
 Adenocarcinoma 14 (73.7) 4 (80.0) 9 (100.0) 1 (33.3) 
 Squamous cell carcinoma 4 (21.1) 0 (0) 0 (0) 2 (66.7) 
 Sarcomatoid carcinoma 1 (5.3) 1 (20.0) 0 (0) 0 (0) 
Stage, n (%) 
 I 10 (52.6) 3 (60.0) 4 (44.4) 2 (66.7) 
 II 5 (26.3) 2 (40.0) 2 (22.2) 1 (33.3) 
 III 2 (10.5) 0 (0) 1 (11.1) 0 (0) 
 IV 2 (10.5) 0 (0) 2 (22.2) 0 (0) 
AllDisease-freeRecurrenceNew primary
Characteristic(n = 19)(n = 5)(n = 9)(n = 3)
Age, mean ± SD 71.6 ± 6.7 73.6 ± 6.3 73.6 ± 6.3 64.3 ± 6.3 
Male, n (%) 7 (36.8) 3 (60.0) 2 (22.2) 0 (0) 
White, n (%) 19 (100.0) 5 (100.0) 9 (100.0) 3 (100.0) 
Smoking status, n (%) 
 Never 1 (5.3) 1 (20.0) 0 (0) 0 (0) 
 Former 16 (84.2) 4 (80.0) 8 (88.9) 2 (66.7) 
 Current 2 (10.5) 0 (0) 1 (11.1) 1 (33.3) 
Histology, n (%) 
 Adenocarcinoma 14 (73.7) 4 (80.0) 9 (100.0) 1 (33.3) 
 Squamous cell carcinoma 4 (21.1) 0 (0) 0 (0) 2 (66.7) 
 Sarcomatoid carcinoma 1 (5.3) 1 (20.0) 0 (0) 0 (0) 
Stage, n (%) 
 I 10 (52.6) 3 (60.0) 4 (44.4) 2 (66.7) 
 II 5 (26.3) 2 (40.0) 2 (22.2) 1 (33.3) 
 III 2 (10.5) 0 (0) 1 (11.1) 0 (0) 
 IV 2 (10.5) 0 (0) 2 (22.2) 0 (0) 

Normal lung tissue microbiome diversity and composition is associated with RFS and DFS

Patients with recurrence or a new primary during follow-up had greater bacterial richness (P = 0.01) and diversity (P = 0.06), in their normal lung tissue, than disease-free patients, at the evenly rarefied depth of 63 sequences per sample (Fig. 1; Supplementary Table S2). Consistently, higher richness and diversity in normal tissue were significantly associated with reduced RFS and DFS in Cox proportional hazard models (RFS: P = 0.08 for richness, P = 0.03 for Shannon index; DFS: P = 0.03 for richness, P = 0.02 for Shannon index; Supplementary Table S2). Results remained largely consistent in the sensitivity analyses (Supplementary Table S3).

Figure 1.

α-diversity in normal lung tissue and survival. A, Distribution of number of OTUs at an even depth of 63 sequence reads per sample in normal lung tissue by recurrence status of patients (P values are from Kruskal–Wallis tests). B and C, RFS and DFS curves for patients grouped in tertiles of number of OTUs at an even depth of 63 sequence reads per sample in normal lung tissue (P values are from log-rank tests for trend). D, Distribution of the Shannon index at an even depth of 63 sequence reads per sample in normal lung tissue by recurrence status of patients (P values are from Kruskal–Wallis tests). E and F, RFS and DFS survival curves for patients grouped in tertiles of the Shannon index at an even depth of 63 sequence reads per sample in normal lung tissue (P values are from log-rank tests for trend).

Figure 1.

α-diversity in normal lung tissue and survival. A, Distribution of number of OTUs at an even depth of 63 sequence reads per sample in normal lung tissue by recurrence status of patients (P values are from Kruskal–Wallis tests). B and C, RFS and DFS curves for patients grouped in tertiles of number of OTUs at an even depth of 63 sequence reads per sample in normal lung tissue (P values are from log-rank tests for trend). D, Distribution of the Shannon index at an even depth of 63 sequence reads per sample in normal lung tissue by recurrence status of patients (P values are from Kruskal–Wallis tests). E and F, RFS and DFS survival curves for patients grouped in tertiles of the Shannon index at an even depth of 63 sequence reads per sample in normal lung tissue (P values are from log-rank tests for trend).

Close modal

Overall microbiome composition in normal lung tissue was associated with RFS and DFS according to several distance measures with the MiRKAT-S test (RFS P ≤ 0.09 and DFS P ≤ 0.04 for unweighted and weighted UniFrac distances, Bray–Curtis dissimilarity, and Jaccard index; Supplementary Table S4), though not with the OMiSA test (RFS P = 0.20, DFS P = 0.12). Results were similar in the sensitivity analyses (Supplementary Table S3). Results were also similar when rarefying to an even depth for the UniFrac distances, but somewhat attenuated for the Bray–Curtis dissimilarity and Jaccard index (Supplementary Table S4). Principal coordinate analysis of the Bray–Curtis dissimilarity from the normal tissue revealed clustering of patients by recurrence status (Fig. 2A and B); results were similar for the unweighted and weighted UniFrac distances and the Jaccard index (Supplementary Fig. S3). We grouped patients into four discrete clusters based on the Bray–Curtis dissimilarity in normal tissue (Fig. 2C), and observed that these clusters were significantly related to RFS and DFS as well (RFS P = 0.03, DFS P = 0.015; Fig. 2D and E).

Figure 2.

β-diversity in normal lung tissue and survival. Principal coordinate analysis of the Bray–Curtis dissimilarity, with samples annotated according to recurrence status, histology, and person days: nonrarefied (A), rarefied to an even depth of 63 sequence reads per sample (B). C, Unsupervised clustering (ward.D2 method) of the Bray–Curtis dissimilarity grouped patients into four clusters. These clusters were significantly related to RFS (D; log-rank P = 0.031) and DFS (E; log-rank P = 0.015).

Figure 2.

β-diversity in normal lung tissue and survival. Principal coordinate analysis of the Bray–Curtis dissimilarity, with samples annotated according to recurrence status, histology, and person days: nonrarefied (A), rarefied to an even depth of 63 sequence reads per sample (B). C, Unsupervised clustering (ward.D2 method) of the Bray–Curtis dissimilarity grouped patients into four clusters. These clusters were significantly related to RFS (D; log-rank P = 0.031) and DFS (E; log-rank P = 0.015).

Close modal

We observed several taxa in normal tissue for which relative abundance and/or carriage were associated with both RFS and DFS in Cox proportional hazard models at P < 0.05 (Supplementary Table S5; Fig. 3); these taxa were not significant after FDR adjustment. Greater abundance of family Koribacteraceae in normal tissue was associated with increased RFS and DFS, whereas greater abundance of family Lachnospiraceae, and genera Faecalibacterium and Ruminococcus (from Ruminococcaceae family), and Roseburia and Ruminococcus (from Lachnospiraceae family) were associated with reduced RFS and DFS. Taxa associated only with RFS (P < 0.05) included family S24-7 (increased RFS), and family Bacteroidaceae and genus Bacteroides (reduced RFS). Taxa associated only with DFS (P < 0.05) included family Sphingomonadaceae and genus Sphingomonas (increased DFS), and family Ruminococcaceae (reduced DFS). A heatmap of these 12 taxa in normal tissue clustered patients somewhat by recurrence status (Fig. 3).

Figure 3.

Taxa in normal lung tissue associated with RFS or DFS. Heatmap shows relative abundance of families and genera (F, family; G, genus) with P ≤ 0.05 from Cox proportional hazard models of clr-transformed abundance or carriage (Supplementary Table S4). Heatmap was generated with average linkage clustering and the Manhattan distance method; samples are annotated with recurrence status and Bray–Curtis cluster (from Fig. 2).

Figure 3.

Taxa in normal lung tissue associated with RFS or DFS. Heatmap shows relative abundance of families and genera (F, family; G, genus) with P ≤ 0.05 from Cox proportional hazard models of clr-transformed abundance or carriage (Supplementary Table S4). Heatmap was generated with average linkage clustering and the Manhattan distance method; samples are annotated with recurrence status and Bray–Curtis cluster (from Fig. 2).

Close modal

Lung tumor tissue microbiome is not associated with survival

Tumor tissue richness and diversity were not associated with recurrence status or with RFS and DFS (Supplementary Fig. S4; Supplementary Table S2), and this was consistent in the sensitivity analyses (Supplementary Table S3). In addition, tumor overall microbiome composition was not associated with RFS or DFS (Supplementary Table S4; Supplementary Fig. S5), and this was consistent when rarefying to an even depth and in the sensitivity analyses (Supplementary Table S3; Supplementary Table S4). In tumor tissue, only families Koribacteraceae and Lachnospiraceae were associated with reduced RFS and DFS (P < 0.05; Supplementary Table S5).

Lung tumor tissue microbiome is less diverse than, but compositionally similar to, paired normal tissue microbiome

Tumor tissue samples had significantly lower bacterial richness (observed OTUs; P = 0.0001) and diversity (Shannon index; P < 0.0001) than paired normal tissue samples at the evenly rarefied depth of 63 sequences per sample (Fig. 4; Supplementary Table S6). Significance remained at higher sequencing depths despite dropped samples with lower depths (15 tumor/normal pairs at 124 sequence reads per sample: P = 0.02 for number of OTUs, P < 0.0001 for Shannon index). Results were consistent when restricting to adenocarcinoma histology, restricting to patients in batch 2, excluding stage III and IV cases, or excluding current smokers (P < 0.05).

Figure 4.

α-diversity in relation to lung tissue type (tumor vs. normal). A, Number of OTUs for tumor/normal pairs by patient ID at an even depth of 63 sequence reads per sample. B, Distribution of number of OTUs at 63 sequence reads per sample for normal and tumor samples (P values are from Wilcoxon signed-rank test). C, Shannon index for tumor/normal pairs by patient ID at an even depth of 63 sequence reads per sample. D, Distribution of the Shannon index at 63 sequence reads per sample for normal and tumor samples (P values are from Wilcoxon signed-rank test).

Figure 4.

α-diversity in relation to lung tissue type (tumor vs. normal). A, Number of OTUs for tumor/normal pairs by patient ID at an even depth of 63 sequence reads per sample. B, Distribution of number of OTUs at 63 sequence reads per sample for normal and tumor samples (P values are from Wilcoxon signed-rank test). C, Shannon index for tumor/normal pairs by patient ID at an even depth of 63 sequence reads per sample. D, Distribution of the Shannon index at 63 sequence reads per sample for normal and tumor samples (P values are from Wilcoxon signed-rank test).

Close modal

Overall microbiome composition did not differ significantly between paired lung tumor and normal samples according to unweighted and weighted UniFrac distance, Bray–Curtis dissimilarity, or the Jaccard index (Supplementary Fig. S6; Supplementary Table S6). Results were consistent when rarefying to an even depth and when restricting to adenocarcinoma histology, restricting to patients in batch 2, excluding stage III and IV cancers, excluding current smokers, or excluding samples with low sequencing depth. Moreover, paired tumor and normal samples were significantly more alike than random pairings of tumor and normal samples from different patients, according to the unweighted and weighted UniFrac distance, Bray–Curtis dissimilarity, and Jaccard index (all P ≤ 0.02). Lung tumor samples had higher abundance of family Veillonellaceae, lower abundance of genus Cloacibacterium, and lower carriage of family Erysipelotrichaceae, than paired normal samples (P < 0.05; Fig. 5; Supplementary Table S7); these taxa were not significant after FDR adjustment.

Figure 5.

Taxa associated with lung tissue type (tumor vs. normal). Heatmap shows relative abundance of families (A), genera (B), and species (C) in paired normal (N) and tumor (T) samples (only taxa present in >25% of samples are shown). Normal and tumor samples are sorted left to right by patient ID. Taxa with * indicate P < 0.05 from Wilcoxon signed-rank test for pair difference in relative abundance or McNemar test for pair difference in carriage (Supplementary Table S6).

Figure 5.

Taxa associated with lung tissue type (tumor vs. normal). Heatmap shows relative abundance of families (A), genera (B), and species (C) in paired normal (N) and tumor (T) samples (only taxa present in >25% of samples are shown). Normal and tumor samples are sorted left to right by patient ID. Taxa with * indicate P < 0.05 from Wilcoxon signed-rank test for pair difference in relative abundance or McNemar test for pair difference in carriage (Supplementary Table S6).

Close modal

In this pilot study of the lung microbiome and lung cancer prognosis, we showed, for the first time, that increased diversity and altered composition of the normal lung tissue was associated with reduced DFS and RFS. This important novel observation suggests that the microbiome of normal lung tissue may be used as a biomarker of lung cancer prognosis, which could guide clinical practice to improve survival outcomes for patients with lung cancer. We also observed a clear reduction in bacterial richness and diversity in lung tumor samples compared with paired normal tissue samples, indicating dysbiosis of the lung tumor microbiome.

Few studies have reported on the microbiome in lung cancer, and even fewer characterized the microbiome in actual lung tumor tissue. We have reported that lower airway brushes of patients with lung cancer (n = 39) were enriched in Veillonella and Streptococcus compared with patients with benign lung disease (n = 36) and healthy controls (n = 10; ref. 9). A study of lung cancer attributed to household coal burning in China found that sputum samples of lung cancer cases (n = 8) had lower diversity and enrichment of Granulicatella, Abiotrophia, and Streptococcus compared with healthy controls (n = 8; ref. 10). Similarly, another study from China reported decreased diversity and increased Streptococcus abundance in bronchial brush specimens from cancerous lung sites compared to paired noncancerous lung sites (n = 24) and healthy controls (n = 18; ref. 8). A third report from China found family Veillonellaceae and genera Veillonella, Capnocytophaga, and Selenomonas were more abundant in saliva of patients with lung cancer (n = 20) compared with controls (n = 10; ref. 12). A study in Korea observed that Veillonella and Megasphaera were more abundant in bronchoalveolar lavage fluid from patients with lung cancer (n = 20) compared with patients with benign lung mass-like lesions (n = 8; ref. 11). A study of Italian patients with lung cancer found lower bacterial diversity in lung tumor tissue samples (n = 31) compared with nonmalignant lung tissue (n = 165), and no differences in overall composition (β-diversity) between the tumor and nonmalignant samples (13). Finally, a recent study of lung tissue samples from patients with lung cancer (tumor and adjacent normal) and hospital controls observed increased bacterial diversity in tumor and adjacent normal tissue from patients with lung cancer compared with the controls (14).

From this previous literature, it is apparent that the airway and lung microbiome is perturbed in patients with lung cancer, which may have implications for prognosis. We observed that greater bacterial diversity and greater abundance of families Bacteroidaceae, Lachnospiraceae, and Ruminococcaceae, and genera Bacteroides, Faecalibacterium, Roseburia, (Ruminococcus), and Ruminococcus in normal lung tissue were associated with reduced survival, whereas greater abundance of Koribacteraceae and Sphingomonadaceae were associated with increased survival. Interestingly, the majority of our findings were similar for the RFS and DFS outcomes; this may suggest that the normal lung microbiome is related to both recurrences and new primary cancers. Members of Lachnospiraceae and Ruminococcaceae, particularly Roseburia and Faecalibacterium, are known to produce antiinflammatory short-chain fatty acids (e.g., butyrate; ref. 37), making the association of these bacteria with reduced survival unexpected. Bacteroides abundance in the gut has been associated with impaired antitumor immune responses in patients with melanoma (38), and may play a similar cancer-promoting role in the lungs. Though our conclusions are limited by small sample size, these valuable preliminary results suggest that bacteria in resected normal lung tissue may serve as biomarkers of recurrence risk in early-stage NSCLC. Moreover, if these identified microbiota are determined to be causally related to cancer recurrence in future investigations, they may serve as novel targets for therapeutic intervention (7) to improve RFS in patients with lung cancer.

The results of our analysis comparing paired tumor and normal samples are similar to the previous literature in that we observed significant reductions in bacterial diversity and enrichment of Veillonellaceae in lung tumor compared with normal lung tissue, which has been observed by many (8–13), but not all (14) studies comparing lung cancer cases to controls. It is not clear from our observational study whether the identified bacterial differences are causally related to lung carcinogenesis, or are merely reflective of disease processes in the lung. However, there are several mechanisms by which the lung microbiota could contribute to lung carcinogenesis, including genotoxic pathways, bacterial metabolite effects, and induction of host inflammatory pathways (15). For example, intranasal administration of lipopolysaccharide (a membrane component of Gram-negative bacteria) in a mouse model of lung cancer significantly enhanced pulmonary inflammation and lung tumorigenesis (39). We previously showed in a human study that airway Veillonella and Streptococcus were associated with upregulation of ERK and PI3K signaling pathways in the airway, pathways regulating cell proliferation, survival, and differentiation, which are upregulated in patients with lung cancer (9). Interestingly, we have previously reported that these two genera are enriched in the mouths of current smokers compared with never smokers (40), suggesting a further mechanism by which smoking causes lung cancer. Taken together, there is an accumulating support for specific bacteria as biomarkers of lung cancer presence; further study of the causal role of these bacteria in lung carcinogenesis may provide therapeutic targets for lung cancer prevention.

In summary, we showed in a small pilot study that diversity and composition of the normal lung tissue microbiome may be associated with RFS and DFS, and observed differential microbiome signatures between lung tumor and normal tissue that were consistent with previous research. The strengths of our study include the availability of fresh-frozen tumor and normal lung tissue for paired analysis, and prospective long-term follow-up for survival analysis. However, our study conclusions were limited by small sample size and lack of a replication dataset, and therefore findings will require confirmation in a larger study. In addition, though the 16S rRNA gene sequencing assay provides a snapshot of what bacteria are present in the normal and lung tumor samples, localization of bacteria in these tissue samples (e.g., using fluorescence in situ hybridization; ref. 41) could provide additional insight into bacterial mechanisms of action in lung cancer. Continued study of the role of the lung microbiome in lung cancer may yield several promising future applications, including biomarkers of lung cancer risk, recurrence, and prognosis, and therapeutic targets for lung cancer primary and tertiary prevention.

No potential conflicts of interest were disclosed.

The funders had no involvement in the study design, the collection, analysis, and interpretation of data, the writing of this report, and the decision to submit for publication.

Conception and design: B.A. Peters, H.I. Pass, J. Ahn

Development of methodology: J. Ahn

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): C. Reid, H.I. Pass, J. Ahn

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): B.A. Peters, R.B. Hayes, H.I. Pass, J. Ahn

Writing, review, and/or revision of the manuscript: B.A. Peters, R.B. Hayes, H.I. Pass, J. Ahn

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): C. Goparaju, C. Reid, H.I. Pass

Study supervision: H.I. Pass, J. Ahn

Lung tissue samples underwent 16S rRNA gene sequencing at the Environmental Sample Preparation and Sequencing Facility at Argonne National Laboratory. This work was supported by the NIH NCI Early Detection Research Network grant 5U01CA111295-07 (to H.I. Pass), and NCI grant R01CA164964 (to J. Ahn).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Ferlay
J
,
Soerjomataram
I
,
Dikshit
R
,
Eser
S
,
Mathers
C
,
Rebelo
M
, et al
Cancer incidence and mortality worldwide: sources, methods and major patterns in GLOBOCAN 2012
.
Int J Cancer
2015
;
136
:
E359
86
.
2.
Siegel
RL
,
Miller
KD
,
Jemal
A
. 
Cancer statistics, 2017
.
CA Cancer J Clin
2017
;
67
:
7
30
.
3.
American Cancer Society
.
Cancer Facts & Figures 2017
.
Atlanta, GA
:
American Cancer Society
; 
2017
.
4.
Detterbeck
FC
,
Chansky
K
,
Groome
P
,
Bolejack
V
,
Crowley
J
,
Shemanski
L
, et al
The IASLC lung cancer staging project: methodology and validation used in the development of proposals for revision of the stage classification of NSCLC in the forthcoming (eighth) edition of the TNM classification of lung cancer
.
J Thorac Oncol
2016
;
11
:
1433
46
.
5.
Aberle
DR
,
Adams
AM
,
Berg
CD
,
Black
WC
,
Clapp
JD
,
Fagerstrom
RM
, et al
Reduced lung-cancer mortality with low-dose computed tomographic screening
.
N Engl J Med
2011
;
365
:
395
409
.
6.
Dickson
RP
,
Huffnagle
GB
. 
The lung microbiome: new principles for respiratory bacteriology in health and disease
.
PLoS Pathog
2015
;
11
:
e1004923
.
7.
Dickson
RP
,
Erb-Downward
JR
,
Huffnagle
GB
. 
The role of the bacterial microbiome in lung disease
.
Expert Rev Respir Med
2013
;
7
:
245
57
.
8.
Liu
HX
,
Tao
LL
,
Zhang
J
,
Zhu
YG
,
Zheng
Y
,
Liu
D
, et al
Difference of lower airway microbiome in bilateral protected specimen brush between lung cancer patients with unilateral lobar masses and control subjects
.
Int J Cancer
2018
;
142
:
769
78
.
9.
Tsay
JJ
,
Wu
BG
,
Badri
MH
,
Clemente
JC
,
Shen
N
,
Meyn
P
, et al
Airway microbiota is associated with up-regulation of the PI3K pathway in lung cancer
.
Am J Respir Crit Care Med
2018
;
198
:
1188
98
.
10.
Hosgood
HD
 3rd
,
Sapkota
AR
,
Rothman
N
,
Rohan
T
,
Hu
W
,
Xu
J
, et al
The potential role of lung microbiota in lung cancer attributed to household coal burning exposures
.
Environ Mol Mutagen
2014
;
55
:
643
51
.
11.
Lee
SH
,
Sung
JY
,
Yong
D
,
Chun
J
,
Kim
SY
,
Song
JH
, et al
Characterization of microbiome in bronchoalveolar lavage fluid of patients with lung cancer comparing with benign mass like lesions
.
Lung Cancer
2016
;
102
:
89
95
.
12.
Yan
X
,
Yang
M
,
Liu
J
,
Gao
R
,
Hu
J
,
Li
J
, et al
Discovery and validation of potential bacterial biomarkers for lung cancer
.
Am J Cancer Res
2015
;
5
:
3111
22
.
13.
Yu
G
,
Gail
MH
,
Consonni
D
,
Carugno
M
,
Humphrys
M
,
Pesatori
AC
, et al
Characterizing human lung tissue microbiota and its relationship to epidemiological and clinical features
.
Genome Biol
2016
;
17
:
163
.
14.
Greathouse
KL
,
White
JR
,
Vargas
AJ
,
Bliskovsky
VV
,
Beck
JA
,
von Muhlinen
N
, et al
Interaction between the microbiome and TP53 in human lung cancer
.
Genome Biol
2018
;
19
:
123
.
15.
Mao
Q
,
Jiang
F
,
Yin
R
,
Wang
J
,
Xia
W
,
Dong
G
, et al
Interplay between the lung microbiome and lung cancer
.
Cancer Lett
2018
;
415
:
40
8
.
16.
Fouhy
F
,
Deane
J
,
Rea
MC
,
O'Sullivan
Ó
,
Ross
RP
,
O'Callaghan
G
, et al
The effects of freezing on faecal microbiota as determined using MiSeq sequencing and culture-based investigations
.
PLoS One
2015
;
10
:
e0119355
.
17.
Shaw
AG
,
Sim
K
,
Powell
E
,
Cornwell
E
,
Cramer
T
,
McClure
ZE
, et al
Latitude in sample handling and storage for infant faecal microbiota studies: the elephant in the room?
Microbiome
2016
;
4
:
40
.
18.
Kia
E
,
Wagner Mackenzie
B
,
Middleton
D
,
Lau
A
,
Waite
DW
,
Lewis
G
, et al
Integrity of the human faecal microbiota following long-term sample storage
.
PLoS One
2016
;
11
:
e0163666
.
19.
Punt
CJ
,
Buyse
M
,
Kohne
CH
,
Hohenberger
P
,
Labianca
R
,
Schmoll
HJ
, et al
Endpoints in adjuvant treatment trials: a systematic review of the literature in colon cancer and proposed definitions for future trials
.
J Natl Cancer Inst
2007
;
99
:
998
1003
.
20.
Caporaso
JG
,
Lauber
CL
,
Walters
WA
,
Berg-Lyons
D
,
Lozupone
CA
,
Turnbaugh
PJ
, et al
Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample
.
Proc Natl Acad Sci U S A
2011
;
108
:
4516
22
.
21.
Caporaso
JG
,
Lauber
CL
,
Walters
WA
,
Berg-Lyons
D
,
Huntley
J
,
Fierer
N
, et al
Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms
.
ISME J
2012
;
6
:
1621
4
.
22.
Caporaso
JG
,
Kuczynski
J
,
Stombaugh
J
,
Bittinger
K
,
Bushman
FD
,
Costello
EK
, et al
QIIME allows analysis of high-throughput community sequencing data
.
Nat Methods
2010
;
7
:
335
6
.
23.
Bokulich
NA
,
Subramanian
S
,
Faith
JJ
,
Gevers
D
,
Gordon
JI
,
Knight
R
, et al
Quality-filtering vastly improves diversity estimates from Illumina amplicon sequencing
.
Nat Methods
2013
;
10
:
57
9
.
24.
Amir
A
,
McDonald
D
,
Navas-Molina
JA
,
Kopylova
E
,
Morton
JT
,
Zech Xu
Z
, et al
Deblur rapidly resolves single-nucleotide community sequence patterns
.
mSystems
2017
;
2
:
pii: e00191-16
.
25.
DeSantis
TZ
,
Hugenholtz
P
,
Larsen
N
,
Rojas
M
,
Brodie
EL
,
Keller
K
, et al
Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB
.
Appl Environ Microbiol
2006
;
72
:
5069
72
.
26.
Katoh
K
,
Misawa
K
,
Kuma
K
,
Miyata
T
. 
MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform
.
Nucleic Acids Res
2002
;
30
:
3059
66
.
27.
Price
MN
,
Dehal
PS
,
Arkin
AP
. 
FastTree: computing large minimum evolution trees with profiles instead of a distance matrix
.
Mol Biol Evol
2009
;
26
:
1641
50
.
28.
Lozupone
C
,
Lladser
ME
,
Knights
D
,
Stombaugh
J
,
Knight
R
. 
UniFrac: an effective distance metric for microbial community comparison
.
ISME J
2011
;
5
:
169
72
.
29.
Gower
JC
. 
Some distance properties of latent root and vector methods used in multivariate analysis
.
Biometrika
1966
;
53
:
325
38
.
30.
Plantinga
A
,
Zhan
X
,
Zhao
N
,
Chen
J
,
Jenq
RR
,
Wu
MC
. 
MiRKAT-S: a community-level test of association between the microbiota and survival times
.
Microbiome
2017
;
5
:
17
.
31.
Koh
H
,
Livanos
AE
,
Blaser
MJ
,
Li
H
. 
A highly adaptive microbiome-based association test for survival traits
.
BMC Genomics
2018
;
19
:
210
.
32.
Murtagh
F
,
Legendre
P
. 
Ward's hierarchical agglomerative clustering method: which algorithms implement Ward's criterion?
J Classification
2014
;
31
:
274
95
.
33.
Anderson
MJ
. 
A new method for non-parametric multivariate analysis of variance
.
Austral Ecol
2001
;
26
:
32
46
.
34.
Weiss
S
,
Xu
ZZ
,
Peddada
S
,
Amir
A
,
Bittinger
K
,
Gonzalez
A
, et al
Normalization and microbial differential abundance strategies depend upon data characteristics
.
Microbiome
2017
;
5
:
27
.
35.
Kurtz
ZD
,
Muller
CL
,
Miraldi
ER
,
Littman
DR
,
Blaser
MJ
,
Bonneau
RA
. 
Sparse and compositionally robust inference of microbial ecological networks
.
PLoS Comput Biol
2015
;
11
:
e1004226
.
36.
Fernandes
AD
,
Reid
JN
,
Macklaim
JM
,
McMurrough
TA
,
Edgell
DR
,
Gloor
GB
. 
Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis
.
Microbiome
2014
;
2
:
15
.
37.
Louis
P
,
Flint
HJ
. 
Diversity, metabolism and microbial ecology of butyrate-producing bacteria from the human large intestine
.
FEMS Microbiol Lett
2009
;
294
:
1
8
.
38.
Gopalakrishnan
V
,
Spencer
CN
,
Nezi
L
,
Reuben
A
,
Andrews
MC
,
Karpinets
TV
, et al
Gut microbiome modulates response to anti-PD-1 immunotherapy in melanoma patients
.
Science
2018
;
359
:
97
103
.
39.
Melkamu
T
,
Qian
X
,
Upadhyaya
P
,
O'Sullivan
MG
,
Kassie
F
. 
Lipopolysaccharide enhances mouse lung tumorigenesis: a model for inflammation-driven lung cancer
.
Vet Pathol
2013
;
50
:
895
902
.
40.
Wu
J
,
Peters
BA
,
Dominianni
C
,
Zhang
Y
,
Pei
Z
,
Yang
L
, et al
Cigarette smoking and the oral microbiome in a large study of American adults
.
ISME J
2016
;
10
:
2435
46
.
41.
Dejea
CM
,
Wick
EC
,
Hechenbleikner
EM
,
White
JR
,
Mark Welch
JL
,
Rossetti
BJ
, et al
Microbiota organization is a distinct feature of proximal colorectal cancers
.
Proc Natl Acad Sci U S A
2014
;
111
:
18321
6
.

Supplementary data