Much effort has been dedicated to developing circulating tumor cells (CTC) as a noninvasive cancer biopsy, but with limited success as yet. In this study, we combine a method for isolation of highly pure CTCs using immunomagnetic enrichment/fluorescence-activated cell sorting with advanced whole genome sequencing (WGS), based on long fragment read technology, to illustrate the utility of an accurate, comprehensive, phased, and quantitative genomic analysis platform for CTCs. Whole genomes of 34 CTCs from a patient with metastatic breast cancer were analyzed as 3,072 barcoded subgenomic compartments of long DNA. WGS resulted in a read coverage of 23× per cell and an ensemble call rate of >95%. These barcoded reads enabled accurate detection of somatic mutations present in as few as 12% of CTCs. We found in CTCs a total of 2,766 somatic single-nucleotide variants and 543 indels and multi-base substitutions, 23 of which altered amino acid sequences. Another 16,961 somatic single nucleotide variant and 8,408 indels and multi-base substitutions, 77 of which were nonsynonymous, were detected with varying degrees of prevalence across the 34 CTCs. On the basis of our whole genome data of mutations found in all CTCs, we identified driver mutations and the tissue of origin of these cells, suggesting personalized combination therapies beyond the scope of most gene panels. Taken together, our results show how advanced WGS of CTCs can lead to high-resolution analyses of cancers that can reliably guide personalized therapy. Cancer Res; 77(16); 4530–41. ©2017 AACR.

Emerging evidence point to the potential role of circulating tumor cells (CTC) as biomarkers for patient stratification and selection of targeted therapy (1). Given their easy accessibility via blood draws, CTCs could be an ideal source of cancer cells for genomic analysis, versus more invasive tissue-based biopsy. In patients with advanced cancer, molecular profiling of CTCs have shown that these cells provide a better representation of tumor diversity than a single biopsy (2); this finding is most relevant in patients with multisite metastases where collection of tumor material from each location is not feasible. In addition, higher levels of CTCs in metastatic cancers can provide more cells for in-depth molecular analyses to discover common and lineage-specific genomic alterations that can help guide personalized treatment (3). In contrast to circulating tumor DNA analysis, which is mainly limited to interrogating small panels of genes (4), CTCs allow complete genomic and gene expression analysis providing opportunities for evaluation of clinically relevant mutations and gene expression–based molecular subtypes, and the identification of novel therapeutic targets.

Over the past decade, technologies for isolating CTCs have dramatically improved (5). For example, the development of an EPCAM-based immunomagnetic enrichment and fluorescence-activated cell sorting (IE/FACS) approach has enabled the isolation of highly pure CTCs that are amenable to complex molecular analyses (6–11). At the same time methods for analyzing the entire genomic content of small numbers of cells, such as long fragment read (LFR) technology, are now available (12–14). LFR is a novel sequencing technology that allows for comprehensive, phased, accurate, and quantitative analysis of genomic variations, from large structural changes to single-base variants (12, 13). In this study, these two powerful approaches (IE/FACS and LFR) are combined to generate high-accuracy, high-coverage whole genome sequencing (WGS) data from CTCs isolated from a female patient diagnosed with ER-positive/HER2-negative metastatic breast cancer. On the basis of both driver and passenger mutations, possible personalized combination therapies were selected that could potentially be effective for targeting the many different cell lineages that make up her highly heterogeneous metastatic disease.

Patient history and sample collection

Blood and tissue biopsies were collected from a 61-year-old female patient diagnosed with ER-positive/HER2-negative metastatic breast cancer being treated at the University of California San Francisco Helen Diller Family Comprehensive Cancer Center. The patient consented to having her samples analyzed and collection of these samples for genomic analysis was approved by the University of California San Francisco institutional review board (Committee on Human Research). This study was conducted following U.S. Common Rule ethical guidelines. She had a previous history in 2001 of ductal carcinoma in situ (DCIS) in her right breast. The patient underwent lumpectomy followed by radiotherapy. She then received five years of adjuvant tamoxifen treatment. In December of 2011, the patient presented with multiple new symptoms. She was found to have a mass in the left breast, as well as diffuse bone metastasis. Biopsies of the left breast and lymph nodes revealed invasive lobular carcinoma. Because of her aggressive and symptomatic presentation, she received first-line metastatic treatment with paclitaxel for 6 months (1/2012–6/2012). She then received fulvestrant therapy. Subsequently, upon disease progression, her treatment was changed to anastrozole, which she had been receiving at the time of the first blood draw. Approximately 8 months later, the second blood draw was taken. The patient passed away 11 months later due to complications from her disease (Fig. 1A).

CTC isolation and WGS library preparation

CTCs were isolated via IE/FACS as previously described in detail (7–11). Briefly, blood samples were subjected to immunomagnetic enrichment to capture and enrich for CTCs using beads coated with mAbs to EPCAM. This was followed by FACS to isolate highly pure CTCs, defined as nucleated cells that are EPCAM-positive and CD45-negative. CTCs were collected at two different time points separated by 8 months (Fig. 1A). In total, 34 CTCs and 5 CD45+ “normal” cells were selected and processed for WGS using LFR (12). Briefly, 3–5 cells were lysed in a pool and denatured genomic DNA was dispersed across 384 physically distinct compartments taking care to keep single stranded genomic DNA fragments as long as possible. A total of 3,072 and 384 individual compartments were used to analyze the 34 CTCs and 5 CD45+ cells, respectively, resulting in about 5% of a haploid genome (150 Mb of single stranded genomic DNA) per compartment. Multiple displacement amplification (MDA), fragmenting, and barcode adapter ligation were performed in each compartment, as described previously (12), followed by pooling and sequencing on Complete Genomics' nanoarray platform (Fig. 1B; ref. 15). Using this process allowed for haplotyping and error correction (12, 13). A combined 2.2 terabases (Tb), 788× total genome coverage, was generated for the 34 CTCs, resulting in 23× coverage per cell. In each compartment, long genomic fragments ranging from 10–500 kb (35 kb average) were covered with an average depth of nearly 6× (23× divided by 2 as the cells were mostly diploid in nature and another factor of 2 as a result of denaturing into single stranded DNA prior to compartmentalization). An additional 290 gigabases were sequenced for the CD45+ cells and a standard short mate-pair WGS library (15) was made from 5 micrograms of DNA isolated from the blood of the patient (STD blood sample). Sequence data were not generated for the primary tumor or lymph node metastasis due to the quality of DNA isolated from the archival specimens being insufficient for WGS libraries. As a control for testing somatic variant detection, DNA from CEPH sample NA12877 was processed with LFR using 1,920 compartments and sequenced to an average depth of approximately 30× per cell. Methods for data analysis can be found in the Supplementary Materials.

Sanger sequencing and Miseq confirmation of somatic mutations

Seventy-seven somatic mutations identified in CTCs were selected for confirmation and to determine which of these represented true founder or early passenger mutations. This list included all coding variants and several noncoding variants that could be potentially important for tumor development. Approximately 250 base pairs surrounding each mutation were amplified using an AccuPrime Taq DNA polymerase kit (Thermo Fisher Scientific) with primers containing M13 forward and reverse sequence tags with template DNA from formalin-fixed paraffin-embedded (FFPE) material of the primary and the lymph node metastasis as well as whole genome amplified (WGA) product from 5 CTCs collected at the first blood draw. As a control, genomic DNA from NA12878 was also amplified using these primers. PCR products were purified with Agencourt AMPpure XP (Beckman Coulter Life Sciences) beads following the manufacturer's protocol and Sanger sequencing with M13 primers was performed on capillary sequencers by McLab. Sanger sequencing data were analyzed using Mutation Surveyor (SoftGenetics) with manual inspection of all results. In addition, approximately 1 ng of each PCR product was pooled, ligated to adapters following the manufacturer's protocol (Illumina), and sequenced by McLab on a MiSeq instrument (Illumina) using 150 base paired end sequencing. A total of 2.59 gigabases of sequence data were generated. To remove M13 sequences 16 bases from the end of each read were removed with Trimmomatic (16). Only reads with a Q score of at least 38 were mapped to build 37 of the reference genome using Bowtie 2 (17). Mapping resulted in an average fold coverage of each amplicon of over 55,000×. Reads were viewed using IGVTools (18) and a custom script was written to sum the total number of reads with an A, C, G, T, INS, or DEL at the somatic mutation position in each amplicon. Somatic mutations were considered confirmed if the mutated base was called 2-fold higher than the third most frequently called base.

Optimized detection of copy number variants and structural variants

Sequence data from all CTCs were analyzed in aggregate to generate an “ensemble” genome with more than 95% coverage of the reference genome. Amplification of chr1q and deletions of chr13 and chr16q, commonly identified in breast cancer, were discovered within all CTCs (Fig. 2) through read coverage analysis. Tumor suppressor genes BRCA2, CDH1, and RB1 are located in these deleted regions. None of these abnormalities were found in normal cells from the same patient and their detection in almost all collected CTCs indicates that the enrichment and isolation strategy (IE/FACS) resulted in a highly pure pool of cancer cells. In addition, this result suggests that these were early somatic events in the development of this tumor. In general, read coverage was too variable and limited our ability to detect smaller copy number changes (<1 Mb). By comparing the ratio of compartments that support variant and reference calls, as opposed to reads, and comparing compartment counts between phased contigs a more accurate assessment of copy number variation can be determined. Likewise, for detection of translocations we developed a novel algorithm based on unexpected mate pair mappings identified across multiple compartments (Supplementary Methods). In both cases, improved detection is possible because compartments reflect the initial state of the genomic DNA, whereas reads reflect the postamplification state and are affected by the biases caused by MDA. Using this ratio allowed for the detection of 133 focal amplifications and deletions on the order of 1 Mb or less and allowed for the discovery of 2 genes (H3F3A and MDM4) previously shown to be amplified in cancer (Supplementary Table S1). In addition, our algorithm discovered a complex series of deletions, amplifications, and translocations between chromosome 1 and 6 (Fig. 3). As further demonstration of the sensitivity of our methods we found an allele ratio of 0.1 in all of the large deletion regions (chr13 and chr16q) discovered above through read coverage; however, the expected ratio would be zero as these regions are haploid. This allele ratio suggests that up to 20% of the cells in our CTC ensemble genome could be normal cells (Supplementary Table S1). To confirm this was the case, we determined the haplotype of approximately 60,000 heterozygous SNPs on chr13 that were called in the STD blood sample, found in dbSNP, and phased by the CD45+ sample. This allowed us to confirm that the base calls with an allele ratio of approximately 0.1 all came from the same haplotype. The most parsimonious explanation for this is that these variants all come from the parental chromosome lost in the tumor but not in the normal cells.

Accurate, phased, and quantitative analysis of single-base somatic mutations

As demonstrated above, using compartment information, as opposed to reads, is a powerful approach for generating preamplification state genomic information. We used allele fraction (prevalence), the ratio of compartments with reads supporting variant calls versus total calls to further analyze small variants. We first calculated inherited heterozygous SNPs to determine whether the expected allele fraction of 50% was found. To correct for amplifications and deletions and to only select true heterozygous SNPs, we removed chromosomes 1, 13, and 16 from the analysis and only examined SNPs called in the standard library of whole blood and found in dbSNP. This resulted in 1,456,341 inherited SNPs that could be analyzed further. Plotting the percent of total variants versus the allele fraction resulted in the expected peak near 0.5 (M = 0.44; SD = 0.09, Fig. 4A). To evaluate somatic variants, we took all variants called in any CTC, but not found in the CD45+ or standard blood cell libraries, and plotted them in a similar manner. As expected, this curve had a dramatically different shape with a maximum approaching an allele fraction of zero (Supplementary Fig. S1). This is likely the result of some somatic variants being present in a few cells and polymerase errors from single compartments incorporated early in the amplification process, the latter being the major contributor to the shape of this graph. Most of these false positive variants can be removed by applying a quality control filter based on compartment count thresholds, compartment distribution expectancy, and allele frequencies in control and healthy genomes (Supplementary Methods). We used inherited variants as a model to extract the filter parameters to eliminate false positives without introducing a high false negative rate. For example, introducing an exclusive compartment count threshold of 7 (the minimum number of compartments in which a variant has to be detected) removes 85% of putative somatic variants but only 0.3% of inherited variants (Fig. 4B). Filtering reduced the total number of potential somatic single nucleotide variants (SNV) from 1,429,299 to 19,727 (Supplementary Table S2) and identified 8,951 somatic indels and multi-base substitutions (Supplementary Table S3). We also used the same LFR methods and compartment thresholds on 31 cells from CEPH sample NA12877; a cell line generated from human B cells and expected to have a small number of somatic variants when compared with its genome sequence generated from micrograms of DNA. Our analysis discovered only 812 somatic variants (Supplementary Table S4). On the basis of these findings and a prior study using LFR to remove false positive errors by applying a compartment threshold of six or greater (13), the false positive rate is expected to be very low. Importantly, the application of this compartment count threshold still allowed for the discovery of variants present in 12% or more of cells with a detection rate of 98% for somatic mutations present in all CTCs (Supplementary Table S5).

Detecting somatic mutations present in all cells

Resistance to targeted therapies may occur due to the selection of tumor subclones that do not express the target of interest (19). For this reason, it is important to identify variants that are present in all cancer cells. In an effort to discover variants shared across all clones, we developed a method based on the somatic mutation prevalence distribution (see Materials and Methods). Applying this algorithm resulted in the discovery of 2,766 mutations found in all cells. Comparing the prevalence distribution curves of mutations in all cells and inherited heterozygous variants shows a substantial overlap with a slight shift toward lower values for these somatic mutations (Fig. 4A). This finding is expected on the basis of our measurements that these CTC samples have approximately 20% normal cell contamination. In addition, there are 16,961 somatic mutations that exist only in subsets of CTCs. These mutations have a similar chromosomal distribution as the mutations found in all cells with slightly lower coverage in GC-rich chromosomes (Fig. 2), in agreement with lower read coverage of GC-rich sequences. The majority of these mutations are present in less than 30% of CTCs (<15% allele fraction) with a subset present in approximately 50% of CTCs. This suggests differential growth rates evolved between clones after establishing the founding tumor lineage. There are 21 nonsynonymous and 8 synonymous coding mutations found in all cells, resulting in a nonsynonymous to synonymous ratio of 2.6. This ratio is greater than the theoretical expectation (∼2) based on which bases alter codons as well as that of inherited SNPs (∼1, Supplementary Table S5), which are under selective pressure against new mutations causing coding changes. This suggests that at least a few of these coding mutations were selected for and are most likely driver events. Of those mutations with varying levels of prevalence across CTCs, only 51 nonsynonymous and 20 synonymous coding mutations were discovered.

Analysis of the prevalence of somatic mutations collected at two different time points spanning eight months showed large overlap (Pearson correlation, r = 0.42; Supplementary Fig. S2) despite active cancer treatment during this period. This suggests that most of the tumor lineages survived this treatment, which is consistent with a lack of tumor response between these time points as determined by CT scans.

To validate all coding mutations as being present in CTCs, PCR primers were designed and amplification was performed on whole genome amplified (WGA) material from approximately 5 CTCs collected at the first blood draw. In addition, the same primers were used to amplify genomic DNA isolated from formalin-fixed paraffin-embedded biopsies of the primary tumor and the lymph node metastasis. All amplicons were Sanger sequenced and those amplicons from the CTC WGA material were also analyzed by next-generation sequencing (Fig. 5). High-quality data was generated for 69 of the 77 SNVs, for which primers were designed (Supplementary Table S6). Of those, 65 were found at varying levels of prevalence in the next-generation sequencing data of the CTC WGA. As expected, many of these SNVs could not be found by Sanger sequencing of the same material. Of the 4 mutations that could not be confirmed in the CTC WGA by NGS sequencing two were identified in the read data but did not meet the threshold criteria for calling them and two were obvious mismappings to pseudogenes that could easily have been removed from our somatic mutation list by applying better filtering methods. In addition, only a subset of those variants found in all CTC cells were found in the primary and metastasis Sanger sequencing data, but those that were discovered existed at a prevalence of approximately 50%, suggesting they were either driver or early passenger mutations (Fig. 5).

Revealing the tissue of origin of the tumor using the mutation signature of CTCs

If CTCs are to be used to detect and diagnose cancer from a patient's blood sample it is important to be able to identify the location of the tumor from the CTCs. Previous studies (20, 21) have shown that different cancer types have unique somatic base change spectrums and we posited that these could be used to identify the primary organ site from which the cancer arose. In addition, many cancers carry specific patterns of driver mutations and/or large structural variations that could also help ascertain the tissue of origin. Using these methods, we analyzed the genomes of our CTCs to confirm the tissue of origin. On the basis of the mutation spectrum (21), we observed that our CTC genome most closely resembled breast cancer data from the Cancer Genome Atlas (TCGA) project (http://cancergenome.nih.gov/) and was least similar to lung adenocarcinoma (Mann–Whitney test, P = 0.063, Supplementary Fig. S3; Supplementary Table S6). We also compared our CTC copy number analysis profile (chr1q amplification, chr13 deletion, and chr16q deletion) to results from TCGA low-pass WGS data of 14 cancer types. Again, breast cancer was found to be the closest match to our CTC profile (P = 0.011, t test, Supplementary Table S7). In addition, somatic mutations and copy number loss in CDH1 are characteristic of the lobular subtype of breast cancer (22). These analyses provided strong evidence that the breast was the tissue of origin of these CTCs.

Somatic alterations point towards mechanisms of tumor development

Of the 51 nonsynonymous coding somatic mutations present in 12% to 100% of CTCs, nine coding mutations were found in eight genes (CDH1, DDX3X, EBF1, FOXO3, MLL4, PDGFRA, SF3B1, and SRGAP3) previously identified as being associated with cancer (23, 24). Of these, a frameshift mutation in the tumor suppressor CDH1 was found in all CTCs as well as the primary tumor and metastasis (Table 1). The mutation in CDH1 combined with a loss of one copy of 16q, which contains CDH1, results in a complete loss of function of the E-cadherin gene product. Loss of CDH1 activity is a frequent and well-established event in lobular breast cancer (22) and is likely to be a driver event for our patient's tumor. The remaining 19 somatic mutations found in all cells affect genes lacking strong evidence for cancer involvement. However, we observed several potentially interesting genes with mutations, such as PDPK1 and MAPK8, whose products are part of the PI3K and MAPK pathways, respectively; and XRCC4 and GEN1, genes involved in DNA repair. GEN1 is an especially interesting potential driver as the mutation was found at approximately 50% prevalence in both the primary and metastatic tumors and our patient harbored an inherited nonsense mutation in this gene on the opposite allele. Another intriguing mutation found in all CTCs, the primary, and the lymph node metastasis was in the gene CDK5RAP1. This gene product is a regulator of cyclin-dependent kinase 5 (CDK5). Recent studies have shown that CDK5 expression is associated with tumor proliferation, migration, and chemotherapy resistance (25).

The CTCs also showed loss of heterozygosity coupled with a number of inactivating and/or potentially inactivating mutations and inherited variants in cadherin cell adhesion molecules (CDH1, CDH5, and CDH16). As mentioned above, we discovered a series of complex structural rearrangements between chromosomes 1 and 6, including translocations near the ARFGEF3, which could potentially result in the over expression of this gene (Fig. 3). Our patient failed to respond to five years of tamoxifen treatment and her tumor was resistant to additional estrogen receptor inhibition. High expression of the protein product of ARFGEF3 has previously been associated with resistance to estrogen receptor inhibition (26) and this is a potential explanation for why our patient became resistant to this type of therapy.

Recently, large studies using whole-genome sequencing in cancer samples have discovered recurrent noncoding somatic variants in cancer (27, 28). Using recently described analysis tools Funseq2 (29) and Oncocis (30), we analyzed all somatic variants found in noncoding regions of the CTCs, resulting in the identification of 113 high scoring variants found in the Encyclopedia of DNA Elements (ENCODE) database. Of these, 28 were found in all cells, of which 15 variants were in regions highly occupied by transcription factors and 10 were located in promoter regions (Supplementary Table S8). The top-scoring variant identified in all cells was found to negatively alter two JASPAR (31) transcription factor binding motifs for RFX5 and BRCA1 in the promoter region of RFC5 (Supplementary Methods). The protein product of RFC5 is a subunit of the RFC complex, which in association with BRCA1 recognizes and repairs DNA damage. An additional five somatic noncoding variants found in all cells were identified by SPANR (32) to have a significant impact on splicing (top 1%, Supplementary Table S9) including somatic variants in IFNAR1 (ΔPsi = 7.57, 99.8%) and PKHD1 (ΔPsi = −34.6, 0.03%). The top candidate in variants observed with varying prevalence across CTCs was MYO1C (ΔPsi = −54.91, 0.02%). In contrast, the annotation of noncoding NA12877 variants did not identify any candidates.

Somatic alterations suggest potential treatment options

Several therapy options for this patient are suggested from this retrospective comprehensive genomic analysis. Potential somatic activating mutations in PDPK1, MAPK8, and PDGFRA were identified in all CTCs in this patient (Supplementary Table S2), but are infrequently found in breast cancer (23, 24, 33). On the basis of response rates in other tumor types with these kinase mutations, it is possible that a patient with this profile might benefit from kinase inhibitor therapy. In addition, the mutation found in CDK5RAP1 could result in overexpression of CDK5 and sensitivity to inhibitors of this gene product. Several clinical trials are currently testing inhibitors of CDK5 (34). Our patient also had two inherited common SNPs in BRCA2 that have been shown to confer increased risk of breast cancer in some studies (35). The protein product of BRCA2 binds to DNA at the sites of double strand breaks and is critical for the process of DNA repair through homologous recombination. This combined with the loss of one copy of chromosome 13, on which BRCA2 resides, possible complete inactivation of GEN1 activity, a damaging mutation in XRCC4, and a somatic noncoding mutation most likely resulting in decreased expression of RFC5, a BRCA1-associated protein, suggests that our patient's tumor may have had severely reduced DNA repair activity. Although currently in clinical trials, PARP1 and DNA-PKc inhibitors have shown promising results in tumors with DNA repair defects (36). In addition, a recent clinical trial in patients with DNA repair defects demonstrated a much stronger response to immune checkpoint inhibitors targeting CTLA-4 or PD-1/PD-L1 (37), presumably due to the increased number of expressed somatic mutations. In both cases, combining these therapies with DNA-damaging agents could increase the effectiveness of the therapy (38).

Recent studies (39, 40) have demonstrated that vaccines can be developed against a subset of proteins with cancer-specific amino acid changes. We analyzed all coding somatic variants for expression in breast cancer samples using TCGA data, as well as determining the likelihood of binding and presentation by the MHC class II molecules, specifically the HLA-DRB5*01:01 allele our patient was determined to have, using the immune epitope database (http://www.iedb.org/). This resulted in the discovery of 37 mutations (Supplementary Table S10) for which a vaccine could be designed to create a personalized and specific cancer therapy.

Finally, we also confirmed the lack of somatic alterations in our patient's tumor in genes for which targeted therapies have been developed (Supplementary Table S11), and there were no other instances of loss of one or both genes that resulted in a susceptibility to a specific drug (Supplementary Tables S11 and S12). In total, WGS of 34 CTCs from this patient resulted in the discovery of 9 potential treatment options (Table 2).

Five CTCs are sufficient for selecting efficient personalized combination therapy

In the analyses above, we used massive coverage of 34 CTCs to evaluate whether tumor heterogeneity exists in CTCs. In a research setting, this could be a reasonable cost for studying a large cohort of patients with CTCs at extremely high-resolution, especially for those cancers in which tissue biopsies are not possible. In a clinical setting, however, sequencing cost may be prohibitive and in many cases 34 CTCs may not available for analysis. To further understand what could be learned from a more clinically relevant scenario, we analyzed five different batches of five CTCs, each with an overall sequence coverage of 100×. Using a compartment threshold of three or greater, all six somatic founder mutations were found in each batch, and between 21 and 28 of the 37 MHC-II binding mutations were identified (Supplementary Tables S2, S3, and S10). Furthermore, the complex structural rearrangements between chromosome 1 and 6 were readily identified as well as the amplification of chr1q and the major deletions of chr13 and chr16q (Fig. 2; Supplementary Fig. S4). Finally, comparing the spectrum of somatic base mutations between TCGA data and five cells resulted in breast still being the most likely tissue of origin (Mann–Whitney test, P = 0.066, Supplementary Fig. S3; Supplementary Table S7) and as expected these five cells shared the same spectrum as somatic SNVs from all 34 CTCs (Mann–Whitney test, P = 0.009).

Previous studies on next-generation DNA sequencing of CTCs have involved targeted sequencing of cancer-related genes (11, 41), whole exome (2, 42), low-pass WGS (0.1×; ref. 42), and whole genome sequencing of a small number of CTCs (43). In this study, we expand on these previous efforts by deeply whole genome sequencing 34 CTCs collected at two different time points and applying advanced WGS library processes and analyses to the data. Isolation of extremely rare CTCs from an overwhelming background (106-fold excess) of leukocytes can be very challenging; however, our approach involving isolation of CTCs via IE/FACS was able to achieve the necessary purity for this WGS analysis to be feasible. In addition, our results highlight the power of compartmentalizing the DNA of a cell into subgenome fractions to help remove false-positive SNVs and improve the analysis of heterogeneity. Previous studies sequencing multiple single cells (2, 14, 43, 44) have employed some of these concepts by requiring mutations to be found in more than one cell and as a result have been able to analyze to some degree low prevalence and founder variants. However, these strategies are unable to phase variants; a critical requirement for determining whether a functional protein is expressed when multiple inactivating variants are found within a gene. Furthermore, the compartment ratio strategies employed here to dramatically improve variant heterogeneity frequency determination and structural variant detection are not possible using only single-cell data.

In agreement with results from a previous whole-exome analysis by Lohr and colleagues (2) we found a large amount of heterogeneity among cells, suggesting that CTCs may reflect much of the variability within a tumor and distant metastases. Our results suggest that for cases, where traditional tissue biopsies are difficult to obtain, isolation of highly pure CTCs and robust sequencing approaches such as demonstrated here, may provide an alternative method for comprehensive genome studies to analyze tumor heterogeneity. Furthermore, using CTCs we demonstrate that somatic alterations detected in all tumor can provide information to select for optimal targets for therapeutic intervention are readily detectable. These results also highlight the importance of comprehensive whole genome analyses in a clinical setting as many of the somatic and inherited alterations we identified as being targetable would have been missed using common strategies of low pass whole genome and high-depth gene panel sequencing.

Finally, analyzing as few as five cells we found the majority of high prevalence and potentially treatment actionable somatic mutations. According to a recent pooled analysis in breast cancer (45), five cells could be expected in about half of the patients with metastatic disease (46). It should also be noted that as the sensitivity of technologies for detection and isolation of CTCs continue to improve, the percentage of patients with detectable CTCs is expected to increase. We estimate that in the next few years the analysis we performed on five cells will cost approximately $3,000. From a clinical standpoint, this cost is in line with current oncology diagnostic tests and would be a tiny fraction of the total cost of treating a cancer patient. Importantly, using comprehensive and complete genetic evidence, provided by approaches such as described in this study, to guide treatment decisions that are effective, will ultimately lower medical costs and improve patient outcomes.

R. Chin is a senior research associate at Complete Genomics, Inc. Y. Li is the development manager at BGI Research. J. Park has received speakers bureau honoraria from Genentech and Pfizer and has ownership interest (including patents) in Merrimack Pharma. B.A. Peters is the senior director at and has ownership interest (including patents) in Complete Genomics. No potential conflicts of interest were disclosed by the other authors.

Conception and design: N. Gulbahce, M.J.M. Magbanua, J. Liu, E. Park, S. Drmanac, H.S. Rugo, J. Park, J. Park, R. Drmanac, B.A. Peters

Development of methodology: M.J.M. Magbanua, R. Chin, D. Hayden, Z. Li, X. Chen, Y. Li, R. Zhang, J. Park, R. Drmanac, B.A. Peters

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): M.J.M. Magbanua, R. Chin, X. Luo, D. Hayden, R. Zhang, H.S. Rugo, J. Park, B.A. Peters

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): N. Gulbahce, M.J.M. Magbanua, R. Chin, M.R. Agarwal, X. Luo, Q. Mao, S. Ciotlos, X. Chen, R. Tearle, J. Park, R. Drmanac, B.A. Peters

Writing, review, and/or revision of the manuscript: N. Gulbahce, M.J.M. Magbanua, R. Chin, M.R. Agarwal, J. Liu, H.S. Rugo, J. Park, R. Drmanac, B.A. Peters

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): M.J.M. Magbanua, Q. Mao, S. Ciotlos, R. Zhang, K. Lee, J. Park, B.A. Peters

Study supervision: M.J.M. Magbanua, J. Park, R. Drmanac, B.A. Peters

We would like to thank the patient and her family who contributed to this study and acknowledge the ongoing contributions and support of all Complete Genomics and BGI-Shenzhen employees, in particular the many highly skilled individuals that work in the libraries, reagents, and sequencing groups that make it possible to generate high-quality whole genome data. We thank Dr. Jin Sun Lee for review of patient medical records. Read and mapping data has been submitted to the database of Genotypes and Phenotypes (dbGaP, http://www.ncbi.nlm.nih.gov/gap/) under accession identifier phs001028.v1.p1.

M.J. Magbanua, J.W. Park, and H.S. Rugo received funding for this study from the Breast Cancer Research Foundation. R. Drmanac and B.A. Peters received support from the Shenzhen Municipal Government of China Peacock Plan NO. KQTD20150330171505310.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Lee
JS
,
Magbanua
MJ
,
Park
JW
. 
Circulating tumor cells in breast cancer: applications in personalized medicine
.
Breast Cancer Res Treat
2016
;
160
:
411
24
.
2.
Lohr
JG
,
Adalsteinsson
VA
,
Cibulskis
K
,
Choudhury
AD
,
Rosenberg
M
,
Cruz-Gordillo
P
, et al
Whole-exome sequencing of circulating tumor cells provides a window into metastatic prostate cancer
.
Nat Biotechnol
2014
;
32
:
479
84
.
3.
Shaw
JA
,
Guttery
DS
,
Hills
A
,
Fernandez-Garcia
D
,
Page
K
,
Rosales
BM
, et al
Mutation analysis of cell-free DNA and single circulating tumor cells in metastatic breast cancer patients with high circulating tumor cell counts
.
Clin Cancer Res
2017
;
23
:
88
96
.
4.
Lebofsky
R
,
Decraene
C
,
Bernard
V
,
Kamal
M
,
Blin
A
,
Leroy
Q
, et al
Circulating tumor DNA as a non-invasive substitute to metastasis biopsy for tumor genotyping and personalized medicine in a prospective trial across all tumor types
.
Mol Oncol
2015
;
9
:
783
90
.
5.
Magbanua
MJ
,
Park
JW
. 
Advances in genomic characterization of circulating tumor cells
.
Cancer Metastasis Rev
2014
;
33
:
757
69
.
6.
Magbanua
MJ
,
Sosa
EV
,
Scott
JH
,
Simko
J
,
Collins
C
,
Pinkel
D
, et al
Isolation and genomic analysis of circulating tumor cells from castration resistant metastatic prostate cancer
.
BMC Cancer
2012
;
12
:
78
.
7.
Magbanua
MJ
,
Park
JW
. 
Isolation of circulating tumor cells by immunomagnetic enrichment and fluorescence-activated cell sorting (IE/FACS) for molecular profiling
.
Methods
2013
;
64
:
114
8
.
8.
Magbanua
MJ
,
Sosa
EV
,
Roy
R
,
Eisenbud
LE
,
Scott
JH
,
Olshen
A
, et al
Genomic profiling of isolated circulating tumor cells from metastatic breast cancer patients
.
Cancer Res
2013
;
73
:
30
40
.
9.
Lang
JE
,
Scott
JH
,
Wolf
DM
,
Novak
P
,
Punj
V
,
Magbanua
MJ
, et al
Expression profiling of circulating tumor cells in metastatic breast cancer
.
Breast Cancer Res Treat
2015
;
149
:
121
31
.
10.
Paszek
MJ
,
DuFort
CC
,
Rossier
O
,
Bainer
R
,
Mouw
JK
,
Godula
K
, et al
The cancer glycocalyx mechanically primes integrin-mediated growth and survival
.
Nature
2014
;
511
:
319
25
.
11.
Kelley
RK
,
Magbanua
MJ
,
Butler
TM
,
Collisson
EA
,
Hwang
J
,
Sidiropoulos
N
, et al
Circulating tumor cells in hepatocellular carcinoma: a pilot study of detection, enumeration, and next-generation sequencing in cases and controls
.
BMC Cancer
2015
;
15
:
206
.
12.
Peters
BA
,
Kermani
BG
,
Sparks
AB
,
Alferov
O
,
Hong
P
,
Alexeev
A
, et al
Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells
.
Nature
2012
;
487
:
190
5
.
13.
Peters
BA
,
Kermani
BG
,
Alferov
O
,
Agarwal
MR
,
McElwain
MA
,
Gulbahce
N
, et al
Detection and phasing of single base de novo mutations in biopsies from human in vitro fertilized embryos by advanced whole-genome sequencing
.
Genome Res
2015
;
25
:
426
34
.
14.
Hou
Y
,
Song
L
,
Zhu
P
,
Zhang
B
,
Tao
Y
,
Xu
X
, et al
Single-cell exome sequencing and monoclonal evolution of a JAK2-negative myeloproliferative neoplasm
.
Cell
2012
;
148
:
873
85
.
15.
Drmanac
R
,
Sparks
AB
,
Callow
MJ
,
Halpern
AL
,
Burns
NL
,
Kermani
BG
, et al
Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays
.
Science
2010
;
327
:
78
81
.
16.
Bolger
AM
,
Lohse
M
,
Usadel
B
. 
Trimmomatic: a flexible trimmer for Illumina sequence data
.
Bioinformatics
2014
;
30
:
2114
20
.
17.
Langmead
B
,
Salzberg
SL
. 
Fast gapped-read alignment with Bowtie 2
.
Nat Methods
2012
;
9
:
357
9
.
18.
Thorvaldsdottir
H
,
Robinson
JT
,
Mesirov
JP
. 
Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration
.
Brief Bioinform
2013
;
14
:
178
92
.
19.
McGranahan
N
,
Favero
F
,
de Bruin
EC
,
Birkbak
NJ
,
Szallasi
Z
,
Swanton
C
. 
Clonal status of actionable driver events and the timing of mutational processes in cancer evolution
.
Sci Transl Med
2015
;
7
:
283ra54
.
20.
Hoadley
KA
,
Yau
C
,
Wolf
DM
,
Cherniack
AD
,
Tamborero
D
,
Ng
S
, et al
Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin
.
Cell
2014
;
158
:
929
44
.
21.
Kandoth
C
,
McLellan
MD
,
Vandin
F
,
Ye
K
,
Niu
B
,
Lu
C
, et al
Mutational landscape and significance across 12 major cancer types
.
Nature
2013
;
502
:
333
9
.
22.
Ciriello
G
,
Gatza
ML
,
Beck
AH
,
Wilkerson
MD
,
Rhie
SK
,
Pastore
A
, et al
Comprehensive molecular portraits of invasive lobular breast cancer
.
Cell
2015
;
163
:
506
19
.
23.
Lawrence
MS
,
Stojanov
P
,
Mermel
CH
,
Robinson
JT
,
Garraway
LA
,
Golub
TR
, et al
Discovery and saturation analysis of cancer genes across 21 tumour types
.
Nature
2014
;
505
:
495
501
.
24.
Vogelstein
B
,
Papadopoulos
N
,
Velculescu
VE
,
Zhou
S
,
Diaz
LA
 Jr
,
Kinzler
KW
. 
Cancer genome landscapes
.
Science
2013
;
339
:
1546
58
.
25.
Pozo
K
,
Bibb
JA
. 
The emerging role of Cdk5 in cancer
.
Trends Cancer
2016
;
2
:
606
18
.
26.
Yoshimaru
T
,
Komatsu
M
,
Matsuo
T
,
Chen
YA
,
Murakami
Y
,
Mizuguchi
K
, et al
Targeting BIG3-PHB2 interaction to overcome tamoxifen resistance in breast cancer cells
.
Nat Commun
2013
;
4
:
2443
.
27.
Weinhold
N
,
Jacobsen
A
,
Schultz
N
,
Sander
C
,
Lee
W
. 
Genome-wide analysis of noncoding regulatory mutations in cancer
.
Nat Genet
2014
;
46
:
1160
5
.
28.
Melton
C
,
Reuter
JA
,
Spacek
DV
,
Snyder
M
. 
Recurrent somatic mutations in regulatory regions of human cancer genomes
.
Nat Genet
2015
;
47
:
710
6
.
29.
Fu
Y
,
Liu
Z
,
Lou
S
,
Bedford
J
,
Mu
XJ
,
Yip
KY
, et al
FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer
.
Genome Biol
2014
;
15
:
480
.
30.
Perera
D
,
Chacon
D
,
Thoms
JA
,
Poulos
RC
,
Shlien
A
,
Beck
D
, et al
OncoCis: annotation of cis-regulatory mutations in cancer
.
Genome Biol
2014
;
15
:
485
.
31.
Mathelier
A
,
Zhao
X
,
Zhang
AW
,
Parcy
F
,
Worsley-Hunt
R
,
Arenillas
DJ
, et al
JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles
.
Nucleic Acids Res
2014
;
42
:
D142
7
.
32.
Xiong
HY
,
Alipanahi
B
,
Lee
LJ
,
Bretschneider
H
,
Merico
D
,
Yuen
RK
, et al
RNA splicing. The human splicing code reveals new insights into the genetic determinants of disease
.
Science
2015
;
347
:
1254806
.
33.
Cancer Genome Atlas N
. 
Comprehensive molecular portraits of human breast tumours
.
Nature
2012
;
490
:
61
70
.
34.
Roskoski
R
 Jr
. 
Cyclin-dependent protein kinase inhibitors including palbociclib as anticancer drugs
.
Pharmacol Res
2016
;
107
:
249
75
.
35.
Xue
WQ
,
He
YQ
,
Zhu
JH
,
Ma
JQ
,
He
J
,
Jia
WH
. 
Association of BRCA2 N372H polymorphism with cancer susceptibility: a comprehensive review and meta-analysis
.
Sci Rep
2014
;
4
:
6791
.
36.
Dietlein
F
,
Thelen
L
,
Reinhardt
HC
. 
Cancer-specific defects in DNA repair pathways as targets for personalized therapeutic approaches
.
Trends Genet
2014
;
30
:
326
39
.
37.
Le
DT
,
Uram
JN
,
Wang
H
,
Bartlett
BR
,
Kemberling
H
,
Eyring
AD
, et al
PD-1 blockade in tumors with mismatch-repair deficiency
.
N Engl J Med
2015
;
372
:
2509
20
.
38.
Higuchi
T
,
Flies
DB
,
Marjon
NA
,
Mantia-Smaldone
G
,
Ronner
L
,
Gimotty
PA
, et al
CTLA-4 blockade synergizes therapeutically with PARP inhibition in BRCA1-deficient ovarian cancer
.
Cancer Immunol Res
2015
;
3
:
1257
68
.
39.
Kreiter
S
,
Vormehr
M
,
van de Roemer
N
,
Diken
M
,
Lower
M
,
Diekmann
J
, et al
Mutant MHC class II epitopes drive therapeutic immune responses to cancer
.
Nature
2015
;
520
:
692
6
.
40.
Carreno
BM
,
Magrini
V
,
Becker-Hapak
M
,
Kaabinejadian
S
,
Hundal
J
,
Petti
AA
, et al
Cancer immunotherapy. A dendritic cell vaccine increases the breadth and diversity of melanoma neoantigen-specific T cells
.
Science
2015
;
348
:
803
8
.
41.
Heitzer
E
,
Auer
M
,
Gasch
C
,
Pichler
M
,
Ulz
P
,
Hoffmann
EM
, et al
Complex tumor genomes inferred from single circulating tumor cells by array-CGH and next-generation sequencing
.
Cancer Res
2013
;
73
:
2965
75
.
42.
Ni
X
,
Zhuo
M
,
Su
Z
,
Duan
J
,
Gao
Y
,
Wang
Z
, et al
Reproducible copy number variation patterns among single circulating tumor cells of lung cancer patients
.
Proc Natl Acad Sci U S A
2013
;
110
:
21083
8
.
43.
Jiang
R
,
Lu
YT
,
Ho
H
,
Li
B
,
Chen
JF
,
Lin
M
, et al
A comparison of isolated circulating tumor cells and tissue biopsies using whole-genome sequencing in prostate cancer
.
Oncotarget
2015
;
6
:
44781
93
.
44.
Xu
X
,
Hou
Y
,
Yin
X
,
Bao
L
,
Tang
A
,
Song
L
, et al
Single-cell exome sequencing reveals single-nucleotide mutation characteristics of a kidney tumor
.
Cell
2012
;
148
:
886
95
.
45.
Rack
B
,
Schindlbeck
C
,
Juckstock
J
,
Andergassen
U
,
Hepp
P
,
Zwingers
T
, et al
Circulating tumor cells predict survival in early average-to-high risk breast cancer patients
.
J Natl Cancer Inst
2014
;
106
:pii:
dju066
.
46.
Bidard
FC
,
Peeters
DJ
,
Fehm
T
,
Nole
F
,
Gisbert-Criado
R
,
Mavroudis
D
, et al
Clinical validity of circulating tumour cells in patients with metastatic breast cancer: a pooled analysis of individual patient data
.
Lancet Oncol
2014
;
15
:
406
14
.
47.
Davis
MI
,
Hunt
JP
,
Herrgard
S
,
Ciceri
P
,
Wodicka
LM
,
Pallares
G
, et al
Comprehensive analysis of kinase inhibitor selectivity
.
Nat Biotechnol
2011
;
29
:
1046
51
.

Supplementary data