Abstract
Purpose: Cell-free DNA (cfDNA) sequencing provides a noninvasive method for obtaining actionable genomic information to guide personalized cancer treatment, but the presence of multiple alterations in circulation related to treatment and tumor heterogeneity complicate the interpretation of the observed variants.
Experimental Design: We describe the somatic mutation landscape of 70 cancer genes from cfDNA deep-sequencing analysis of 21,807 patients with treated, late-stage cancers across >50 cancer types. To facilitate interpretation of the genomic complexity of circulating tumor DNA in advanced, treated cancer patients, we developed methods to identify cfDNA copy-number driver alterations and cfDNA clonality.
Results: Patterns and prevalence of cfDNA alterations in major driver genes for non–small cell lung, breast, and colorectal cancer largely recapitulated those from tumor tissue sequencing compendia (The Cancer Genome Atlas and COSMIC; r = 0.90–0.99), with the principal differences in alteration prevalence being due to patient treatment. This highly sensitive cfDNA sequencing assay revealed numerous subclonal tumor-derived alterations, expected as a result of clonal evolution, but leading to an apparent departure from mutual exclusivity in treatment-naïve tumors. Upon applying novel cfDNA clonality and copy-number driver identification methods, robust mutual exclusivity was observed among predicted truncal driver cfDNA alterations (FDR = 5 × 10−7 for EGFR and ERBB2), in effect distinguishing tumor-initiating alterations from secondary alterations. Treatment-associated resistance, including both novel alterations and parallel evolution, was common in the cfDNA cohort and was enriched in patients with targetable driver alterations (>18.6% patients).
Conclusions: Together, these retrospective analyses of a large cfDNA sequencing data set reveal subclonal structures and emerging resistance in advanced solid tumors. Clin Cancer Res; 24(15); 3528–38. ©2018 AACR.
This study describes genomic alterations from the largest cell-free circulating tumor DNA cohort to date, as derived from regular clinical practice. The high prevalence of resistance alterations found in advanced, treated cancer patients necessitated accurate methods for determining mutation clonality and driver/resistance status from plasma. We provide such methods, thereby extending the utility of cell-free DNA sequencing analysis. Our finding of an association between estimated circulating tumor DNA (ctDNA) levels and tumor mutational burden ascertained from plasma suggests that ctDNA level is likely an important variable to consider for immunotherapy applications of ctDNA analysis. Although cell-free DNA can provide a summary of tumor heterogeneity across multiple metastatic sites in a patient, our findings of high variability in ctDNA levels across patients, and its impact on variant detection, highlight the need for an improved understanding of factors influencing ctDNA levels and safe methods for maximizing them at the time of ctDNA testing.
Introduction
Genomic analysis of cell-free DNA (cfDNA) from advanced cancer patients allows the identification of actionable alterations shed into the circulation and may provide a global summary of tumor heterogeneity without an invasive biopsy (1). Plasma cfDNA analysis can provide insights from genomic information shed from multiple lesions within a patient, but this broader level of insight can introduce added complexity. Indeed, most clinical cfDNA sequencing is performed on patients with advanced or metastatic disease, often at the second line of therapy or later. Retrospective analyses of large-scale cfDNA sequencing data obtained in the clinical setting may open avenues for learning how to navigate these complexities.
As a recently developed testing method, clinical cfDNA sequencing has repeatedly been benchmarked against tissue sequencing, but these performance comparisons are confounded by temporal and spatial heterogeneity in tumors (2–6). In addition, circulating tumor DNA (ctDNA) may be undetectable when shedding of tumor DNA is nominal, such as when therapy stabilizes tumor growth (7, 8). Recent efforts to globally characterize tumor heterogeneity using both plasma cfDNA analysis and multiregion tumor sequencing highlight the complementary nature of the two approaches (9–11). However, there is a paucity of cfDNA data sets large enough to evaluate the similarity of tumor-initiating alterations (“truncal drivers”) in solid tumor cancers to those found in the cfDNA of advanced cancer patients. Among the various methods available for cfDNA analysis, targeted panel deep-sequencing assays that utilize extensive error-correction methods provide the depth (sensitivity) and genomic breadth necessary to optimally survey tumor-derived genomic alterations in plasma cfDNA, even at low allelic fractions (12–15).
In order to elucidate the landscape of truncal driver mutations in cfDNA, we first evaluated the extent of detectable tumor heterogeneity in a large cohort of patients subject to deep sequencing of cfDNA to assess how ctDNA levels across patients might impact cfDNA variant detection. To distinguish truncal driver mutations from secondary resistance mutations, we developed methods to infer clonality and driver status of tumor mutations from cfDNA. We then examined the similarity of cfDNA patterns of common driver alterations in a large cohort of advanced, previously treated solid tumor patients to those found in treatment-naïve tumor tissue compendia. Finally, we explored the landscape of resistance variants to targeted therapies in the large cfDNA cohort.
Materials and Methods
Characteristics of the cfDNA cohort
A summary of the cfDNA cohort used in this study is provided in Table 1. The cohort was assembled from 21,807 consecutive cancer patients (25,578 total samples) tested on the Guardant Health cfDNA sequencing platform as part of clinical care (in a Clinical Laboratory Improvement Amendments (CLIA)-certified, College of American Pathologists (CAP)-accredited, New York State Department of Health-approved clinical laboratory at Guardant Health, Inc.). As such, this was an observational, noninterventional study and was conducted in accordance with recognized ethical guidelines. All patient data were deidentified as per an institutional review board-approved protocol (Quorum Review IRB Protocol 30-001: Research Use of De-Identified Specimens and Data), which waived the requirement for individual patient consent. Tests were ordered by 3,283 oncologists across the United States, Europe, Asia, and the Middle East from June 2014 to September 2016 (data freeze at September 22, 2016). Disease stage was confirmed for each patient by the provider to be advanced disease (stage III/IV). Treatment histories and survival information were generally not available. Over 50 solid tumor types were represented [the most common were non–small cell lung cancer (NSCLC, 37%), breast cancer (16%), colorectal cancer (9%)], although some cancer types were represented by only a small number of samples. See Supplementary Tables S3 and S4 for comparisons of the cfDNA cohort characteristics with those of the pan-cancer TCGA (The Cancer Genome Atlas) cohort. The clinical cfDNA sequencing platform (Guardant360) used in this study has been previously described (13), and additional information on the assay and variant calling is provided in the Supplementary Methods and in the accompanying paper by Odegaard and colleagues (16). For analysis purposes, variants were categorized as clinically actionable based on whether their presence would be reasonably expected to inform standard-of-care treatment decisions as judged by the clinical oncologists and/or molecular pathologist participating in the study. Descriptions of the methods for cfDNA clonality estimation, cfDNA copy number driver analysis, comparisons of cfDNA to TCGA alterations, resistance alterations and longitudinal analysis, and statistical analysis can be found in the Supplemental Methods. The reported cfDNA alterations data and associated analysis code have been deposited in an open-access GitHub repository and are available at https://github.com/guardant/Zill_2018.
Summary of the cfDNA cohort used in this study
Clinical characteristic . | Statistics . |
---|---|
Number of patients | 21,807 |
Number of samples | 25,578 |
Number of patients with alterations | 18,503 |
Number of patients with multiple tests | 2,222 |
Number of cancer types | >50 |
Alterations per sample | 3 (median); 0–166 (range) |
Days from diagnosis to blood draw | 738 (mean); 335 (median) |
Gender proportion | 56% female; 44% male |
Age range | 23–92 (median, 64) |
Clinical characteristic . | Statistics . |
---|---|
Number of patients | 21,807 |
Number of samples | 25,578 |
Number of patients with alterations | 18,503 |
Number of patients with multiple tests | 2,222 |
Number of cancer types | >50 |
Alterations per sample | 3 (median); 0–166 (range) |
Days from diagnosis to blood draw | 738 (mean); 335 (median) |
Gender proportion | 56% female; 44% male |
Age range | 23–92 (median, 64) |
NOTE: No treatment, follow-up, or outcome data were available.
Results
Somatic genomic alterations in cfDNA across 21,807 patients
Somatic cfDNA alterations were detected in 85% (18,503/21,807) of patients across all cancer types, ranging from 51% for glioblastoma to 93% for small cell lung cancer (Fig. 1A). Half of the reported somatic cfDNA alterations had VAF <0.41% (range, 0.03%–97.6%; Fig. 1B). Alteration-positive samples had on average three or four alterations detected (median = 3; mean = 4.3; range, 1–166), including copy-number amplifications (CNA; Supplementary Fig. S1B). For subsequent analyses, we assumed that the fraction of tumor-derived cfDNA molecules within the total population of circulating cfDNA molecules (“ctDNA level”) in a sample was proportional to the copy-number–adjusted maximum somatic VAF. We then examined the distribution of estimated ctDNA levels per indication. Although most of the major cancers (bladder, liver, prostate, gastric, NSCLC, melanoma, breast) had similar average estimated ctDNA levels, brain cancers had significantly lower levels (Wilcoxon P = 0.006 in comparison with renal) putatively owing to the blood–brain barrier, whereas colorectal cancer and SCLC had significantly higher levels than all other indications examined (Wilcoxon P < 0.008 in comparison with bladder). (Similar patterns have been previously reported by Bettegowda and colleagues (17), but without sufficient sample sizes to determine statistical significance.) This variation in ctDNA levels suggested the possibility of interindication variability in variant detection and, therefore, in ability to estimate tumor mutational burden from cfDNA.
CfDNA alteration detection and estimated ctDNA levels in 21,807 advanced-stage cancer patients. A, Somatic cfDNA alteration detection rates per cancer type in the 21,807-patient cfDNA cohort. Percentages of alteration-positive samples are indicated. Note that the last 16,939 consecutive samples (November 2015–September 2016) were analyzed with version 2.9 of the cfDNA test, whereas the previous 8,639 samples were analyzed with earlier versions of the cfDNA test (see Supplementary Table S3). SCLC, small cell lung cancer; CUP, cancer of unknown primary; GBM, glioblastoma. B, VAF distribution for all somatic SNVs, indels, and fusions detected by the cfDNA test. C, Distributions of estimated ctDNA level per indication. CtDNA levels were significantly higher in colorectal cancer and SCLC and significantly lower in glioma/GBM (“Glioma*”) than in the other cancers shown (Wilcoxon rank sum test). Numbers of SNV/indel/fusion-positive samples per indication are colorectal, 1,991; SCLC, 267; bladder, 210; liver, 210; prostate, 909; gastric, 260; NSCLC, 8,078; melanoma, 410; breast, 3,301; ovarian, 594; pancreas, 867; renal, 220; glioma/GBM, 107. D, Number of somatic cfDNA SNVs per sample versus estimated ctDNA level, which is binned on the x-axis, in alteration-positive NSCLC samples (n = 8,078). Asterisks indicate significance levels from pairwise comparisons using the Wilcoxon rank sum test (“ns,” not significant).
CfDNA alteration detection and estimated ctDNA levels in 21,807 advanced-stage cancer patients. A, Somatic cfDNA alteration detection rates per cancer type in the 21,807-patient cfDNA cohort. Percentages of alteration-positive samples are indicated. Note that the last 16,939 consecutive samples (November 2015–September 2016) were analyzed with version 2.9 of the cfDNA test, whereas the previous 8,639 samples were analyzed with earlier versions of the cfDNA test (see Supplementary Table S3). SCLC, small cell lung cancer; CUP, cancer of unknown primary; GBM, glioblastoma. B, VAF distribution for all somatic SNVs, indels, and fusions detected by the cfDNA test. C, Distributions of estimated ctDNA level per indication. CtDNA levels were significantly higher in colorectal cancer and SCLC and significantly lower in glioma/GBM (“Glioma*”) than in the other cancers shown (Wilcoxon rank sum test). Numbers of SNV/indel/fusion-positive samples per indication are colorectal, 1,991; SCLC, 267; bladder, 210; liver, 210; prostate, 909; gastric, 260; NSCLC, 8,078; melanoma, 410; breast, 3,301; ovarian, 594; pancreas, 867; renal, 220; glioma/GBM, 107. D, Number of somatic cfDNA SNVs per sample versus estimated ctDNA level, which is binned on the x-axis, in alteration-positive NSCLC samples (n = 8,078). Asterisks indicate significance levels from pairwise comparisons using the Wilcoxon rank sum test (“ns,” not significant).
Relationship between estimated ctDNA level and tumor mutational burden
We tested whether ctDNA levels affected tumor mutation burden estimates by examining the number of SNVs per sample across cancer indications in the cfDNA cohort. Notably, when tumor mutation burden in this cfDNA cohort was examined, the indication ordering established with TCGA/International Cancer Genome Consortium (ICGC) data of average tumor mutation burden was not recapitulated (Supplementary Fig. S1B; refs. 18, 19). Additionally, across all cancer indications, the mutation burden distributions were shifted upward, and the dynamic ranges were compressed, relative to mutation burdens derived from TCGA/ICGC whole-exome sequencing. These differences were likely due to several combined factors: the relatively narrow cfDNA panel being heavily biased toward exons with known cancer mutations, differences in cohort demographics such as disease stage and treatment, and differences in variant detection between tumor tissue sequencing and cfDNA sequencing. When the mutations from TCGA NSCLC cases were filtered to those lying within the cfDNA panel regions (107 kb of sequence for reported variants), this cohort's average mutation burden was 18 mutations/Mb rather than the value of 9 mutations/Mb derived from whole-exome analysis, consistent with the notion that higher frequencies of mutations were expected per base pair of the cfDNA panel relative to the whole exome.
Nonetheless, the median tumor mutation burden estimated from cfDNA steadily increased from 18.7 mutations/Mb (2 SNVs per sample) in low-ctDNA NSCLC samples to 37.4 mutations/Mb (4 SNVs per sample) in high-ctDNA samples (Fig. 1D; Supplementary Fig. S1D). (The overall median was 28 mutations/Mb, or 3 SNVs per sample.) The positive and statistically significant relationship between the number of cfDNA alterations per sample and ctDNA level held in breast cancer and colorectal cancer (Supplementary Fig. S1E and S1F), likely reflecting improved detection of genomic alterations when tumors shed more DNA into circulation. As described below, at least part of the observed increase in mutation burdens in high-ctDNA samples was due to increased detection of subclonal variants (Supplementary Fig. S2).
Comparisons of alteration patterns across alteration types in cfDNA versus TCGA
To determine whether alteration patterns found in cfDNA recapitulated those found in published tissue sequencing studies, the frequencies of SNVs and indels in commonly mutated driver genes were compared with the frequencies found in TCGA. Highly similar mutation patterns were observed for TP53 and EGFR (Pearson r = 0.94 and r = 0.78, respectively; Fig. 2A and B), as well as for KRAS, BRAF, and PIK3CA (r = 0.99, 0.99, and 0.94, respectively). EGFRT790M and EGFRC797S, treatment-induced resistance mutations, were more frequent in the heavily pretreated cfDNA NSCLC cases (10%) relative to the untreated TCGA NSCLC cases (0.3%). Excluding T790M and C797S resistance alterations, the Pearson correlation for mutation frequencies in the tyrosine kinase domain of EGFR (exons 18–24) rose from 0.78 to 0.90.
Comparison of cfDNA alteration patterns to tumor tissue alteration patterns in TCGA and COSMIC. A, Per-codon mutation frequencies for SNVs in the TP53 coding sequence [cfDNA n = 14,696 SNVs (10,574 samples); tissue/TCGA n = 1,951 SNVs (1,845 samples)]. B, Per-codon mutation frequencies for the EGFR tyrosine kinase domain (exons 18–24) [cfDNA n = 3,098 SNVs/indels (2,095 samples); tissue/TCGA n = 112 SNVs/indels (96 samples)]. C, Rank-by-rank comparison of amplification frequencies in breast cancer from the cfDNA cohort (1,010 patients with amplifications out of 2,808 patients) versus the tissue/TCGA cohort (413 samples with amplifications out of 816 profiled samples). D, Comparison of EML4–ALKfusion breakpoints for cfDNA versus tissue (COSMIC). Top, Schematic showing breakpoints versus VAF, expressed as cfDNA percentage, for EML4–ALK fusions detected in cfDNA. Bottom, Breakpoint frequency per EML4 intron; tissue data were compiled by the COSMIC database (http://cancer.sanger.ac.uk/cosmic) from various literature sources.
Comparison of cfDNA alteration patterns to tumor tissue alteration patterns in TCGA and COSMIC. A, Per-codon mutation frequencies for SNVs in the TP53 coding sequence [cfDNA n = 14,696 SNVs (10,574 samples); tissue/TCGA n = 1,951 SNVs (1,845 samples)]. B, Per-codon mutation frequencies for the EGFR tyrosine kinase domain (exons 18–24) [cfDNA n = 3,098 SNVs/indels (2,095 samples); tissue/TCGA n = 112 SNVs/indels (96 samples)]. C, Rank-by-rank comparison of amplification frequencies in breast cancer from the cfDNA cohort (1,010 patients with amplifications out of 2,808 patients) versus the tissue/TCGA cohort (413 samples with amplifications out of 816 profiled samples). D, Comparison of EML4–ALKfusion breakpoints for cfDNA versus tissue (COSMIC). Top, Schematic showing breakpoints versus VAF, expressed as cfDNA percentage, for EML4–ALK fusions detected in cfDNA. Bottom, Breakpoint frequency per EML4 intron; tissue data were compiled by the COSMIC database (http://cancer.sanger.ac.uk/cosmic) from various literature sources.
Breast cancer often harbors therapeutically targetable CNAs in ERBB2 (HER2). We compared the ranks of amplification frequencies in breast cancer patients for the 18 CNA genes assayed by the cfDNA assay to the same genes in TCGA (amplification status determined by GISTIC), and found high rank correlation (ρ = 0.86; Fig. 2C). Similarly, 5% to 10% of lung adenocarcinoma (LUAD) is driven by targetable kinase gene fusions. To compare the patterns of gene fusions found in cfDNA with those found in tissue, we determined the frequencies per intron of breakpoints in the three most commonly observed fusions among lung cancer patients in the cfDNA cohort: EML4–ALK, CCDC6–RET, and KIF5B–RET. Breakpoint locations for all three fusions were strongly correlated with the frequencies of breakpoints found in published tissue data (r = 0.98; Fig. 2D; Supplementary Table S5).
A more detailed analysis of cfDNA CNAs of 18 genes across four major cancer indications (lung, breast, colorectal, prostate) revealed amplification patterns consistent with known driver alterations in each indication (Supplementary Fig. S3). For example, EGFR was the most commonly amplified gene in lung cancers, MYC and FGFR1 were the most commonly amplified gene in breast cancer, and AR was the most commonly amplified gene in prostate cancer. Notably, some established driver genes tended to have higher amplification levels per sample than other genes that reflected indication-specific biology. For instance, ERBB2 (HER2) had the highest average amplification levels in breast cancer and colorectal cancer but had middling amplification levels in lung and prostate cancers.
Estimated cfDNA clonality and driver alteration prevalence in cfDNA versus TCGA
The abundance of advanced, treated cancer cases in the cfDNA cohort was expected to contribute additional subclonal variants when variant detection in cfDNA was not limited by low ctDNA levels. The trend toward increased numbers of cfDNA variants in high-ctDNA samples (Fig. 1D) and the observation of frequent resistance alterations (Fig. 2B) suggested that comparisons of this cfDNA cohort with large tumor tissue cohorts such as TCGA should account for a higher level of mutational heterogeneity in the cfDNA cases. In order to compare the prevalences of common alterations between the large tissue cohorts of earlier-stage tumors (TCGA) with those of the advanced-stage tumors in the cfDNA cohort, accounting for the potentially increased mutational heterogeneity in cfDNA, we derived an estimated cfDNA clonality metric using the VAF/maximum VAF ratio that would allow us to infer the likely cancer-cell fraction of mutations present in the tumor (see Materials and Methods). We noted that mutated oncogenes such as EGFR could be subsequently amplified, which could inflate the cfDNA VAF, leading to an inaccurate clonality estimate. Closer examination of the VAF/CN relationship revealed two separate nonlinear behaviors: log linearity of amplified driver mutations at high VAF and high copy number, and clearly subclonal alterations with low VAF that occurred subsequent to, or in a separate subclone from, the amplification (Fig. 3A; Supplementary Figs. S6–S9). Our model therefore takes into account these nonlinearities by normalizing the VAF by log-transformed copy number for driver variants and by holding out variants that initially appear subclonal from the normalization procedure. We then examined the cfDNA clonality distributions for the most frequently mutated genes in LUAD, breast cancer, and colorectal cancer to understand how this metric related to well-known biological properties among cancer mutations (Supplementary Fig. S4).
Estimated cfDNA clonality reveals trends consistent with indication-specific biology. A, Copy number versus VAF for mutant EGFR alleles with amplifications in NSCLC. Note the distinct population of low-VAF/high CNAs, and the log-linear behavior among high-VAF alterations. B, Estimated cfDNA clonality is plotted for all EGFR mutations in LUAD (blue) or colorectal cancer (pink). Thresholds for clonality filtering are indicated by vertical gray lines. The median clonality and percentage of mutations that were clonal (clonality > 0.9) or subclonal (clonality < 0.1) for all EGFR mutations, for L858R alone in LUAD, or for recurrent ectodomain mutations (“ecto”) in colorectal cancer are shown below the histogram. C, CfDNA clonality for all APC SNVs in colorectal cancer cases. Nonsense variants are colored red, missense variants are green, and synonymous are gray. D, CfDNA clonality of a canonical EGFR driver alteration (L858R) and two resistance alterations (T790M, C797S) in cfDNA from 1,119 NSCLC samples. Note the relationship between variant clonality and presumed order of treatment with a given therapy is consistent with the sequential emergence of each resistance alteration (erlotinib is given to patients with EGFRL858R; EGFRT790M confers resistance to erlotinib, and patients with EGFRL858R,T790M can then be given osimertinib; EGFRC797S confers resistance to osimertinib).
Estimated cfDNA clonality reveals trends consistent with indication-specific biology. A, Copy number versus VAF for mutant EGFR alleles with amplifications in NSCLC. Note the distinct population of low-VAF/high CNAs, and the log-linear behavior among high-VAF alterations. B, Estimated cfDNA clonality is plotted for all EGFR mutations in LUAD (blue) or colorectal cancer (pink). Thresholds for clonality filtering are indicated by vertical gray lines. The median clonality and percentage of mutations that were clonal (clonality > 0.9) or subclonal (clonality < 0.1) for all EGFR mutations, for L858R alone in LUAD, or for recurrent ectodomain mutations (“ecto”) in colorectal cancer are shown below the histogram. C, CfDNA clonality for all APC SNVs in colorectal cancer cases. Nonsense variants are colored red, missense variants are green, and synonymous are gray. D, CfDNA clonality of a canonical EGFR driver alteration (L858R) and two resistance alterations (T790M, C797S) in cfDNA from 1,119 NSCLC samples. Note the relationship between variant clonality and presumed order of treatment with a given therapy is consistent with the sequential emergence of each resistance alteration (erlotinib is given to patients with EGFRL858R; EGFRT790M confers resistance to erlotinib, and patients with EGFRL858R,T790M can then be given osimertinib; EGFRC797S confers resistance to osimertinib).
EGFR mutations were among the most prevalent alteration across the cfDNA cohort, but had different expected cohort-level behaviors in LUAD versus colorectal cancer. In LUAD, EGFR-activating mutations should occur frequently as drivers and these mutations should tend to be clonal. In colorectal cancer, recurrent EGFR extracellular domain mutations would generally be expected to be acquired resistance alterations in patients treated with anti-EGFR antibodies such as cetuximab and, therefore, should tend to be subclonal. As predicted, cfDNA EGFR alterations in LUAD were predominantly clonal, whereas in colorectal cancer they were predominantly subclonal (Fig. 3B). Direct comparison of the clonality distributions of EGFRL858R in LUAD versus EGFR ectodomain mutations in colorectal cancer showed an even more striking dichotomy (Fig. 3B). In colorectal cancer, alterations in the common driver genes APC, TP53, and KRAS were predominantly clonal (Supplementary Fig. S4). Strikingly, nonsense mutations in APC had a strong tendency to be clonal (median clonality = 0.72) and had significantly higher average clonality than APC missense or synonymous alterations (median clonality = 0.07; Wilcoxon P < 10−6), further confirming that the cfDNA clonality metric reflected the expected behaviors of tumor-derived alterations (Fig. 3C). In breast cancer, alterations in the common driver genes PIK3CA, AKT1, and TP53 showed strong tendencies toward clonality (Supplementary Fig. S4). Additionally, we compared the clonality distributions in LUAD of known EGFR driver (e19 del, L858R, etc.) and EGFR resistance (T790M, C797S) alterations. Again, as expected, the driver alterations showed a strong tendency toward clonality and the resistance alterations showed a strong tendency toward subclonality (Fig. 3D; Supplementary Fig. S4).
Comparisons of mutation prevalence per gene between the cfDNA and TCGA cohorts, accounting for cfDNA clonality, revealed that the prevalences of most major driver alterations in NSCLC, breast cancer, and colorectal cancer were overall similar (Pearson correlation, r = 0.85; Supplementary Fig. S5 and Supplementary Table S6). Some genes had significant differences in mutation prevalence between cohorts (χ2 test), which largely reflected differences in patient demographics (i.e., prior treatment). Notable differences included EGFR and KRAS alterations in NSCLC (cfDNA: 43% EGFR-mutant, 16% KRAS-mutant; TCGA/tissue: 14% EGFR-mutant, 33% KRAS-mutant), a much higher frequency of ESR1 mutations in cfDNA breast cancer samples (14%) than in TCGA samples (0.5%), and a substantially higher frequency of TP53 mutations in cfDNA colorectal cancer samples.
Mutual exclusivity analysis of driver alterations in cfDNA
To determine whether truncal driver alterations followed patterns of mutual exclusivity established in early-stage disease (i.e., TCGA studies), we performed mutual exclusivity analysis on common cfDNA alterations in LUAD, breast cancer, and colorectal cancer samples. Because driver status for CNAs is often unclear or ambiguously reported across studies, we developed and applied a cohort-level CNA driver identification method that retains statistical outliers relative to background aneuploidies to enrich the initial set of CNA calls for likely driver alterations (Supplementary Fig. S10, see Materials and Methods). Similarly, cfDNA SNVs, indels, and fusions were filtered to clonal alterations (clonality > 0.9) to enrich for likely truncal drivers (see Materials and Methods).
In LUAD, strong evidence for mutual exclusivity was observed in cfDNA across several pairs of genes (Fig. 4). Importantly, the tendency for mutual exclusivity increased when comparing the post-clonality-filtering alterations to the pre-filtering alterations (Supplementary Figs. S11 and S12; Supplementary Tables S7–S12). Of note, KRAS and EGFR were highly mutually exclusive in both cases, but with a 30× drop in the proportion of double-mutant [KRAS-alt; EGFR-alt] genotypes after filtering to clonal alterations. For MET and EGFR, a tendency toward alteration co-occurrence pre-filtering (FDR = 6 × 10−6, log OR = 0.6) was flipped to one of exclusivity after filtering (FDR = 0.03, log OR = −1.0), suggesting that mutation co-occurrence before filtering was caused by subclonal resistance alterations, as opposed to co-occurring truncal mutations (the pre-filtered data had a high prevalence of MET amplifications and EGFRT790M, both of which are associated with resistance to erlotinib). A similar pattern of flipping from co-occurrence (nonsignificant) to exclusivity was seen for ERBB2 and EGFR alterations (FDR = 2 × 10−5, log OR = −1.9).
Mutual exclusivity analysis of cfDNA alterations for LUAD before and after filtering for clonality and CNA driver status. Numbers of patients are indicated at bottom right of each plot. The top oncoprint shows all alterations in alteration-positive patients, whereas the bottom oncoprint shows truncal SNVs, indels, and fusions (clonality > 0.9), and likely driver CNAs across patients that have at least one clonal driver alteration. Gray boxes indicate the absence of alterations, and the color/shape combinations corresponding to the various alteration types are indicated below each oncoprint. Frequencies of gene alterations within each plot are indicated at left (samples lacking clonal alterations in the selected genes were omitted).
Mutual exclusivity analysis of cfDNA alterations for LUAD before and after filtering for clonality and CNA driver status. Numbers of patients are indicated at bottom right of each plot. The top oncoprint shows all alterations in alteration-positive patients, whereas the bottom oncoprint shows truncal SNVs, indels, and fusions (clonality > 0.9), and likely driver CNAs across patients that have at least one clonal driver alteration. Gray boxes indicate the absence of alterations, and the color/shape combinations corresponding to the various alteration types are indicated below each oncoprint. Frequencies of gene alterations within each plot are indicated at left (samples lacking clonal alterations in the selected genes were omitted).
In breast cancer, five driver genes (ERBB2, FGFR1, BRCA1, BRCA2, and AKT1) showed tendencies toward mutual exclusivity with PIK3CA after clonality filtering. Exclusivity was not necessarily expected for PIK3CA alterations except with respect to AKT1 mutations (20), reflecting the more complementary nature of driver alterations in this disease (Supplementary Figs. S11 and S12; Supplementary Tables S7–S12). In colorectal cancer, mutual exclusivity was observed between KRAS and BRAF, ERBB2, and NRAS, similar to reports in tumor tissue. In the pre-filtered data, KRAS and PIK3CA tended to co-occur, but in the post-filtering data they showed a weak trend toward exclusivity, suggesting the presence of subclonal KRAS resistance alterations in the cfDNA colorectal cancer samples. We also noted that filtered FGFR1 amplifications and ERBB2 (HER2) amplifications showed a weak trend toward exclusivity in breast cancer and colorectal cancer, although in the latter case, significance could not be readily assessed owing to the small number of certain genotype classes.
The landscape of actionable resistance alterations in cfDNA
The large cfDNA cohort provided a unique opportunity to explore qualitatively and quantitatively the evolution of resistance alterations in patients who have progressed on targeted therapies in regular clinical practice. To estimate the frequency of actionable resistance alterations (defined as alterations that might influence a physician's choice of therapy at disease progression) in advanced, previously treated cancer patients, we identified known resistance alterations in the cfDNA cohort across six cancer types: NSCLC, breast cancer, colorectal cancer, prostate cancer, melanoma, and GIST. A total of 3,397 samples of the 14,998 samples analyzed (22.6%) had at least one of the 134 known resistance alterations that were identified by the cfDNA test. The proportion of samples harboring likely resistance alterations increased when each indication was limited to samples harboring driver alterations with associated FDA-approved targeted drugs (hereafter, “on-label targetable driver alterations”), consistent with the resistance alterations having arisen due to therapy (Fig. 5A and B; Supplementary Table S13). The most common resistance alterations were EGFRT790M and MET CNA in NSCLC, AR ligand-binding-domain SNVs in prostate cancer, KRASG12/G13/Q61 in colorectal cancer, and ESR1L536/Y537/D538 in breast cancer (Fig. 5B; Supplementary Table S13).
CfDNA landscape of resistance to on-label therapies across cancer types. A, Landscape of resistance alterations in cfDNA. Numbers of patients with various resistance alterations (y-axis, left) to targeted therapies (x-axis, bottom) in 6 common cancer types (top) are plotted. Note that some patients harbored multiple distinct resistance alterations. Cancer type/genotype categories are AR-mutant prostate cancer, ALK-fusion–positive NSCLC, EGFR-mutant NSCLC, breast cancer with ESR1 or ERBB2 (HER2) mutations, colorectal cancer, BRAF-mutant melanoma, and KIT-mutant gastrointestinal stromal tumor (GIST). The "EGFRmut" category for NSCLC includes variants A722V, L747P, L747S, V769M, T854A, T854S (18 mutations in total). For each indication, the “Actionable samples” group showed a highly significant enrichment for resistance alterations (P < 10−6, χ2 test). B, The numbers of samples harboring putative resistance mutations found in cfDNA and corresponding on-label targeted therapies to which these mutations would confer resistance across six cancer types. “Diagnosis” indicates the patient diagnosis provided by ordering clinician on the test requisition form. See Supplementary Table S13 for the complete details of the candidate resistance alterations. C, Longitudinal monitoring analysis of an NSCLC patient with emerging resistance (T790M) in the third draw after presumptive EGFR inhibitor therapy indicated by the L858R mutation. Colored lines track the VAF (cfDNA %) of each mutation across three consecutive blood draws. Note that the y-axis is log scale. D, Monitoring analysis showing polyclonal ESR1 mutations in a breast cancer patient, which confer resistance to aromatase inhibitors, and possible patient response to therapy over time. Asterisk indicates four additional amplifications (not in PIK3CA) were detected in the first sample. E, Monitoring analysis showing stability of the cfDNA clonal structure (relative VAF) over consecutive draws in an NSCLC patient lacking on-label-therapy-indicating mutations.
CfDNA landscape of resistance to on-label therapies across cancer types. A, Landscape of resistance alterations in cfDNA. Numbers of patients with various resistance alterations (y-axis, left) to targeted therapies (x-axis, bottom) in 6 common cancer types (top) are plotted. Note that some patients harbored multiple distinct resistance alterations. Cancer type/genotype categories are AR-mutant prostate cancer, ALK-fusion–positive NSCLC, EGFR-mutant NSCLC, breast cancer with ESR1 or ERBB2 (HER2) mutations, colorectal cancer, BRAF-mutant melanoma, and KIT-mutant gastrointestinal stromal tumor (GIST). The "EGFRmut" category for NSCLC includes variants A722V, L747P, L747S, V769M, T854A, T854S (18 mutations in total). For each indication, the “Actionable samples” group showed a highly significant enrichment for resistance alterations (P < 10−6, χ2 test). B, The numbers of samples harboring putative resistance mutations found in cfDNA and corresponding on-label targeted therapies to which these mutations would confer resistance across six cancer types. “Diagnosis” indicates the patient diagnosis provided by ordering clinician on the test requisition form. See Supplementary Table S13 for the complete details of the candidate resistance alterations. C, Longitudinal monitoring analysis of an NSCLC patient with emerging resistance (T790M) in the third draw after presumptive EGFR inhibitor therapy indicated by the L858R mutation. Colored lines track the VAF (cfDNA %) of each mutation across three consecutive blood draws. Note that the y-axis is log scale. D, Monitoring analysis showing polyclonal ESR1 mutations in a breast cancer patient, which confer resistance to aromatase inhibitors, and possible patient response to therapy over time. Asterisk indicates four additional amplifications (not in PIK3CA) were detected in the first sample. E, Monitoring analysis showing stability of the cfDNA clonal structure (relative VAF) over consecutive draws in an NSCLC patient lacking on-label-therapy-indicating mutations.
Although some resistance mutations can either occur as primary, truncal drivers or emerge secondarily upon treatment (e.g., KRASG12X/G13X in colorectal cancer can be a truncal driver or emerge upon treatment with cetuximab), estimated cfDNA clonality helped distinguish resistance alterations whose emergence was likely caused by therapy pressure (Supplementary Fig. S13). A conservative estimate, focusing on clearly subclonal SNVs (clonality <0.1), was that at least 18.6% (ranging 10%–34% across cancer types) of samples with on-label targetable alterations (381/2,053) had emerging secondary resistance alterations to those on-label therapies. Further, 24% of those resistance-harboring samples (91/381) had >1 alteration associated with resistance to the same therapy, suggesting independent evolution in distinct tumor lesions (21) or sequential treatment with distinct therapies targeted to the same gene. For example, one NSCLC patient had an EML4–ALK fusion (VAF of 7.1%) and ALK SNVs reported to confer resistance to crizotinib (L1196M, 2.5%), crizotinib/alectinib (I1171T, 0.1%), and crizotinib/ceritinib/alectinib (G1202R, 5%). In another example, the treatment history of certain patients harboring EGFRL858R or EGFRe19del driver alterations was immediately apparent by the combined presence of secondary EGFRT790M and tertiary EGFRC797S resistance alterations (24 patients had both EGFRT790M and EGFRC797S—21 patients had these two variants in cis, the other 3 were in trans). The cfDNA clonality of EGFRC797S was generally lower in those cases than that of EGFRT790M (Fig. 3C), consistent with tumor evolution following sequential lines of treatment with erlotinib/afatinib/gefitinib, followed by osimertinib at progression.
Novel resistance alterations were also identified in this clinical cohort, including ERBB2T798I (analogous to EGFRT790M), which causes resistance to an ERBB2 tyrosine kinase inhibitor; METD1228N, METY1230H, and METG1163R (analogous to ALKG1202R and ROS1G2302R) causing resistance of MET exon 14-mutated NSCLC to a next-generation MET inhibitor; and five FGFR2 mutations (V564F, N549H, K641R, E565A, and L617V) shown to drive resistance to a selective pan-FGFR inhibitor (22–26); and the recurrent EGFR ectodomain mutations V441D/G, which arise in the setting of cetuximab resistance in colorectal cancer but are not yet characterized as functional (25). These putative resistance alterations were consistently subclonal relative to the original driver alteration and many were missed by single-metastatic-site tissue biopsy but confirmed by repeat biopsy or biopsy of multiple metastases at autopsy (22, 23, 26).
To illustrate the temporal dimension of the cfDNA landscape, we identified patients with multiple tests and significant clonal structure apparent in their ctDNA. These longitudinal cases illustrated emerging or polyclonal resistance after presumptive targeted therapy (Fig. 5C and D), as well as stability of VAF estimates and clonality estimates over time (Fig. 5E).
Discussion
Much of our understanding of cancer genomes is derived from early-stage, treatment-naïve cancers via consortia efforts such as TCGA. However, the desire to increase treatment efficacy in advanced cancers that likely have evolved considerably from baseline has led to a recent shift to “real world” cancer genomics studies focused on the realities of the clinic, yet grounded in lessons from earlier-stage cancers (27, 28). It is becoming increasingly clear that obtaining comprehensive genomic assessments, across heterogeneous tumor subclones, will be necessary for tailoring effective therapies for advanced cancer patients (9, 10).
We have provided the largest cohort-level snapshot of genomic alterations in advanced cancer patients by cfDNA analysis in real-life clinical practice. Our results demonstrate that patterns and frequencies of truncal driver alterations in advanced cancers reflect patterns found in early-stage disease, but also reflect the increased complexity of advanced, treated cancers. We found that cfDNA alterations (SNVs and small, activating indels) in TP53, EGFR, KRAS, PIK3CA, and BRAF strongly correlated with TCGA tissue alterations (r = 0.90–0.99, Fig. 2A and B), and that correlations for amplification frequency ranks in breast cancer and locations of intronic fusion breakpoints in NSCLC were similarly high (Fig. 2C and D). The high sensitivity of the cfDNA assay combined with the more evolved advanced cancers tested at progression, which have greater numbers of mutations than earlier-stage, treatment-naïve cancers, may contribute to differences in estimated tumor mutation burden versus TCGA (10, 28). Importantly, we show that accurate estimation of tumor mutation burden from plasma cfDNA will require taking ctDNA level into account, as the two factors are correlated (Fig. 1D). Our estimates of ctDNA levels are based on the copy-number-adjusted VAF of cfDNA somatic alterations, and future studies should also consider allele-specific molecule counts (germline allele imbalance) in estimates of tumor DNA in circulation.
Our inference of tumor mutation clonality based on copy-number-adjusted relative cfDNA VAF (Fig. 3) enabled a recapitulation of mutual exclusivity among truncal driver mutations and facilitated identification of subclonal emerging resistance alterations (Figs. 4 and 5C; Supplementary Figs. S6–S8). These results suggest that mutation clonality, as it exists in tumor tissue, can be estimated from properly normalized relative cfDNA VAFs, as has been previously hypothesized (29). High accuracy in VAF estimation is likely key to the success of this approach, and notably, VAFs measured by the cfDNA NGS assay used in this study show good agreement with digital droplet PCR (30, 31). This approach points to the possibility of analyzing the clonal structures of tumors from cfDNA sequencing data, unencumbered by the complications of tumor heterogeneity and tumor impurity introduced by single-region tissue sampling. However, the estimation of tumor mutation clonality from cfDNA is subject to several sources of inaccuracy, including absence of sequence coverage from the cfDNA panel, nonuniform shedding of cfDNA across tumor subclones, and low ctDNA levels. Future studies of underexplored biological factors, such as the variability of cfDNA shedding via tumor-cell death across patients and the uniformity of cfDNA shedding across distinct tumor sites harboring genetically distinct clones, could enable statistical modeling of tumor clonal structures using cfDNA VAFs and cfDNA molecule counts per locus or per allele.
The most notable differences in prevalence of driver alterations between cfDNA and tissue cohorts were EGFR and KRAS alterations in LUAD (whose prevalences were flipped), and the higher frequency of ESR1 mutations in cfDNA breast cancer samples and of TP53 mutations in the cfDNA colorectal cancer samples. The higher EGFR alteration prevalence in LUAD cfDNA was likely due to a population bias resulting from clinicians ordering the cfDNA test at progression on an EGFR TKI (median time between diagnosis and plasma collection of 335 days). This is supported by EGFRT790M being the one of the most common EGFR variants in the cfDNA cohort, second only to EGFR exon 19 deletion driver mutations. Screening known EGFR-mutant NSCLC patients at progression for resistance mutations is routine practice whereas re-profiling KRAS-mutant NSCLC patients would generally not be done, leading to an overrepresentation of EGFR driver mutations, and the concomitant underrepresentation of KRAS mutations, in this cohort. Similarly, the higher frequency of ESR1 mutations (a documented resistance mechanism to aromatase inhibitors) likely reflects the clinical application of ctDNA assays at progression. There are several possible explanations for the higher TP53 prevalence in cfDNA colorectal cancer samples relative to TCGA: stage III/IV tumors, which predominate the cfDNA cohort, may have higher frequencies of TP53 alterations than stage I/II tumors; more subclonal TP53 mutations may have been detected due to the high sensitivity of the cfDNA assay; or some TP53 mutations may stem from somatic myeloid malignancies known as clonal hematopoiesis of indeterminate potential (32, 33). The most likely explanation is that the TCGA cohort (< 300 samples) underestimates TP53 mutation prevalence in colorectal cancer relative to the larger 1,374 sample GENIE cohort, the latter reporting a 68.5% mutation prevalence for this gene (34).
The prevalence of resistance alterations that are informative for FDA-approved therapies or clinical trials of novel targeted agents is a key, and clinically important finding of this cfDNA study. Nearly 1 in 4 cfDNA alteration–positive patients (22.6%) across 6 cancer indications had one or more alterations previously suggested to confer resistance to an FDA-approved on-label therapy (Fig. 5A and B), which would inform clinical decision-making. The significant enrichments for these candidate resistance alterations when cohorts were subset to patients with therapeutically informative driver alterations suggested that they were indeed linked to prior patient treatment. Our estimates of the frequencies of secondary, rather than primary, resistance alterations (10%–34% of patients across the 6 cancer types) were likely conservative, as we examined only low-level subclonal SNVs (clonality < 0.1), in part because accurate assessment of clonality for CNAs remains difficult. As expected, the prevalence of resistance alterations was higher in cfDNA than TCGA/tissue, as these alterations would not be present in early-stage tissue biopsies. For instance, the high frequency of cfDNA ESR1 mutations in breast cancer patients likely reflected prior treatment with aromatase inhibitors. Additionally, EGFRT790M was one of the most common EGFR mutations found in the cfDNA NSCLC cohort (8% of patients), but was seen in only two patients from the TCGA tissue NSCLC cohort (0.3%).
Although our cohort of clinically ordered cfDNA tests is uncontrolled, its large size provides a realistic cross-section of patients with advanced disease at the forefront of cancer care. Interpretation of our findings should take into consideration the selection biases related to cfDNA test ordering patterns in clinical practice and other potential limitations. Genomic alteration prevalence may be biased by preferential ordering of the cfDNA test for patients with certain demographic characteristics, such as nonsmoking females with NSCLC (thereby enriching for EGFR mutation over KRAS mutation). Plasma-based genotyping is often ordered at progression in advanced cancer and thus is biased toward higher prevalence of resistance alterations, as discussed above. Although the cfDNA panel (70 genes) was focused primarily on the therapeutically informative portion of the cancer genome, a tradeoff of its relatively small size may be somewhat reduced accuracy for estimating mutation clonality relative to an assay with a larger genomic footprint. However, it is currently impractical to perform whole-exome sequencing of cfDNA at >15,000× coverage.
This clinical cohort represents the largest sequencing landscape of resistance in advanced cancer patients and builds upon the body of primary driver alterations characterized by the TCGA, GENIE, and other projects. As such, a portion of this database has been included in the Blood Profiling Atlas in Cancer, a National Cancer Moonshot Initiative (35). Improved detection of resistance mutations may facilitate enrollment in clinical trials and enable the development of more accurate biomarkers of response to therapy (22, 36, 37). Therefore, cfDNA and other minimally invasive techniques address a real and unmet need, as it is essential to provide real-time tumor genotyping at the time of progression to guide subsequent therapeutic strategies.
Disclosure of Potential Conflicts of Interest
O.A. Zill, S.R. Fairclough, and D.R. Gandara have ownership interests (including patents) in Guardant Health. P.C. Mack is a consultant/advisory board member for Apton Biosystems, AstraZeneca, Celgene, and Guardant Health and reports receiving commercial research support from Boehringer Ingelheim. A.M. Baca is a consultant/advisory board member for and has ownership interests (including patents) at Guardant Health. D.I. Chudova has ownership interests (including patents) in Guardant Health. R.B. Lanman has ownership interests (including patents) at Guardant Health. A. Talasaz has ownership interests (including patents) at Guardant Health. No potential conflicts of interest were disclosed by the other authors.
Authors' Contributions
Conception and design: O.A. Zill, K.C. Banks, S.R. Fairclough, D.R. Gandara, P.C. Mack, J.I. Odegaard, D.I. Chudova, R.B. Lanman, A. Talasaz
Development of methodology: O.A. Zill, K.C. Banks, S.R. Fairclough, S.A. Mortimer, J.V. Vowles, R. Mokhtari, D.R. Gandara, P.C. Mack, J.I. Odegaard, H. Eltoukhy, R.B. Lanman, A. Talasaz
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): K.C. Banks, J.V. Vowles, D.R. Gandara, J.I. Odegaard, R.J. Nagy, H. Eltoukhy, R.B. Lanman, A. Talasaz
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): O.A. Zill, K.C. Banks, S.R. Fairclough, R. Mokhtari, D.R. Gandara, P.C. Mack, J.I. Odegaard, A.M. Baca, D.I. Chudova, R.B. Lanman, A. Talasaz
Writing, review, and/or revision of the manuscript: O.A. Zill, K.C. Banks, S.R. Fairclough, S.A. Mortimer, J.V. Vowles, D.R. Gandara, P.C. Mack, J.I. Odegaard, R.J. Nagy, D.I. Chudova, R.B. Lanman, A. Talasaz
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): S.R. Fairclough, J.V. Vowles, D.R. Gandara, P.C. Mack, J.I. Odegaard, R.J. Nagy, A.M. Baca, H. Eltoukhy, D.I. Chudova
Study supervision: D.R. Gandara, J.I. Odegaard, H. Eltoukhy, R.B. Lanman
Acknowledgments
We thank the patients and the physicians who submitted samples that were deidentified and used in this analysis. We thank all our colleagues at Guardant Health for their support, inspiration, and helpful discussions. The results published here are in part based upon data generated by the TCGA Research Network: http://cancergenome.nih.gov/.
The study was funded by and conducted at Guardant Health, Inc. No additional grant support or administrative support was provided for the study.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.