Abstract
Advances in genomic technology have enabled the faithful detection and measurement of mutations and the gene expression profile of cancer cells at the single-cell level. Recently, several single-cell sequencing methods have been developed that permit the comprehensive and precise analysis of the cancer-cell genome, transcriptome, and epigenome. The use of these methods to analyze cancer cells has led to a series of unanticipated discoveries, such as the high heterogeneity and stochastic changes in cancer-cell populations, the new driver mutations and the complicated clonal evolution mechanisms, and the novel identification of biomarkers of variant tumors. These methods and the knowledge gained from their utilization could potentially improve the early detection and monitoring of rare cancer cells, such as circulating tumor cells and disseminated tumor cells, and promote the development of personalized and highly precise cancer therapy. Here, we discuss the current methods for single cancer-cell sequencing, with a strong focus on those practically used or potentially valuable in cancer research, including single-cell isolation, whole genome and transcriptome amplification, epigenome profiling, multi-dimensional sequencing, and next-generation sequencing and analysis. We also examine the current applications, challenges, and prospects of single cancer-cell sequencing. Cancer Res; 76(6); 1305–12. ©2016 AACR.
Introduction
Cancer is a significant cause of mortality worldwide. Currently, clinical therapies do not produce complete resolution for most cancer patients due to insufficient understanding of the molecular mechanisms for carcinogenesis and tumor progression (1). Single-cell sequencing (SCS) includes a set of powerful and comprehensive techniques that can provide insight into these unknowns.
Important types of cancer cells include primary tumor cells, metastatic tumor cells, cancer stem cells (CSC), circulating tumor cells (CTC), and disseminated tumor cells (DTC; Fig. 1A). Primary tumor cells can often be described as variants of subpopulations (1). Metastatic tumor cells usually include the original tumor cell clones and differentially invasive and secondarily mutated cell clones. CSCs, which are often very rare, can undergo self-renewal, are therapy resistant, and may cause the recurrence of cancer, can also be characterized into different cell subpopulations (2). CTCs and DTCs play a vital role in cancer dissemination, self-renewal, and distant metastases (3). Cancer cells dispersed in an individual or even forming a single tumor or a clone are often heterogeneous, and this heterogeneity changes dynamically with disease progression (4). However, the molecular profiling data, especially heterogeneity analysis, of these important cancer-cell populations are limited. When dominant clones are killed in cancer therapy, other minor clones may replace them and lead to cancer recurrence (5). Dynamic changes in cancer niches, which comprise various cell types, can affect the development, progression, and metastasis of cancer (2), but the identification, characterization, and isolation of the distinct cancer niche components is very difficult. Thus, a high-resolution strategy to effectively screen and systematically analyze the heterogeneity of these different cancer cells and cancer niche cells is desired and expected to lead to a wave of unanticipated discoveries.
The skeleton diagram of this review. A, an assemblage of the distinct cell types that constitute cancer niches (cancer microenvironment), primary tumor (including hematologic tumor and solid tumor), metastatic tumor and the peripheral blood of a cancer patient. B, methods for isolation of single (cancer) cells. B1, preparation of single-cell suspension. B1-1, solid tumor single-cell suspension by cutting biopsy material or surgical tumor into pieces, which are then dissociated by collagenases or papain dissociation system, etc. B1-2, single-cell suspension of the bone marrow “cloudy” interface layer for tumor of the blood system. B1-3, single-cell suspension of the peripheral blood “cloudy” interface layer for CTCs, which mainly are of solid tumors. B2, four popular methods for single (cancer) cell isolation: FACS, microfluidics systems, micromanipulation system, and LCM (27). C, single-cell sequencing. Amplification methods for SCS of whole genome (scWGS), whole exome (scWES), whole transcriptome (scRNA-seq), whole DNA methylation (scM-seq), and whole assay for transposase-accessible chromatin (scATAC). A method for codetection of the whole genome and whole transcriptome from a single cell is also listed (scGT-seq). A putative method for non–bisufite-based scM-seq (maybe less damaging and with better coverage) and a method for scMT-seq (codetection of DNA methylome and transcriptome of a single cell) are proposed (no reports seen yet). After amplification, seq library for the genome-wide material or a panel of targets (not shown) is prepared and next-generation sequencing, such as Illumina HiSeq/Miseq platform or Thermo Fisher Scientific Ion Torrent Sequencer, is applied. Finally, bioinformatics analysis is conducted, and important biologic information is revealed.
The skeleton diagram of this review. A, an assemblage of the distinct cell types that constitute cancer niches (cancer microenvironment), primary tumor (including hematologic tumor and solid tumor), metastatic tumor and the peripheral blood of a cancer patient. B, methods for isolation of single (cancer) cells. B1, preparation of single-cell suspension. B1-1, solid tumor single-cell suspension by cutting biopsy material or surgical tumor into pieces, which are then dissociated by collagenases or papain dissociation system, etc. B1-2, single-cell suspension of the bone marrow “cloudy” interface layer for tumor of the blood system. B1-3, single-cell suspension of the peripheral blood “cloudy” interface layer for CTCs, which mainly are of solid tumors. B2, four popular methods for single (cancer) cell isolation: FACS, microfluidics systems, micromanipulation system, and LCM (27). C, single-cell sequencing. Amplification methods for SCS of whole genome (scWGS), whole exome (scWES), whole transcriptome (scRNA-seq), whole DNA methylation (scM-seq), and whole assay for transposase-accessible chromatin (scATAC). A method for codetection of the whole genome and whole transcriptome from a single cell is also listed (scGT-seq). A putative method for non–bisufite-based scM-seq (maybe less damaging and with better coverage) and a method for scMT-seq (codetection of DNA methylome and transcriptome of a single cell) are proposed (no reports seen yet). After amplification, seq library for the genome-wide material or a panel of targets (not shown) is prepared and next-generation sequencing, such as Illumina HiSeq/Miseq platform or Thermo Fisher Scientific Ion Torrent Sequencer, is applied. Finally, bioinformatics analysis is conducted, and important biologic information is revealed.
Because mutagenesis, gene regulation, and genetic modification occur intrinsically at the single-cell level, SCS will enable highly precise, molecular characterization of carcinogenesis and will hopefully resolve the important problems described above. This will lead to the identification of new biomarkers and drug targets and, in turn promote cancer diagnosis, monitoring, and therapy (6). In this review, we summarize SCS methods recently developed, including those that could be extremely useful, but have not been applied in sequencing of single cancer cells. The clinical applications as well as the current problems and limitations of these technologies for use in the study of cancer are also discussed.
Isolation of Single Cancer Cells
Analysis of a single cancer cell initially requires the isolation of the desired type of cells based on classic phenotypes or surface markers, in a way that preserves the biologic integrity of the cell. When isolating single cells from solid tumors, dispersion of the tumor tissue into a single-cell suspension is necessary. Mechanical separation and enzymatic treatments are often used (7, 8; Fig. 1B1). Isolation of CTCs from bodily fluids involves two types of methods (Fig. 1B2). The first is biomarker dependent, such as the CellSearch magnetic bead system. The second uses density-gradient centrifugation, microfiltration, or microfluidics systems. The CTC capture platform CTC-IchIP, which relies on the inherent differences in morphology or size between leukocytes and CTCs (9), can be used to isolate some CTCs without known biomarkers. For DTC isolation, needle aspiration is used to collect bone marrow followed by single-cell sorting (10).
Single-cell sorting methods, including marker/phenotype-based methods for isolating single cells from bulk cell populations [FACS, microfluidics, micromanipulation, and laser capture microdissection (LCM); Fig. 1B2] and from rare cell populations (CellSearch, DEP-Array, CellCelector, MagSweeper, and nanofilters), have been well summarized previously (10, 11). Unbiased, marker-free decomposition of a tissue or organ, such as an intact tumor, into different types of single cells (12) may omit the FACS process but still needs dissociation of the tissues into single cells, or localization of the single cells (13), and usually requires the analysis of a very large number of cells. These types of methods would allow us to clarify the architecture of a whole tumor with its variants of cancer niches, such as immune cells. Ultimately, the technology chosen for the isolation of single cells is dependent upon the scientific conditions of the study and sample type.
Single-Cell Whole Genome Amplification and Sequencing and Whole Exome Sequencing
Until recently, most cancer SCS applications utilized whole genome sequencing (WGS) or whole exome sequencing (WES) to identify specific single-nucleotide variations/mutations (SNV), copy number aberrations (CNA), and structural variants (SV). This work provided insights into the driver mutation, biomarkers, and the detailed route and mechanism of carcinogenesis and metastasis (6).
Methods for whole genome amplification
A normal human cell contains only ∼6 pg of DNA; this amount is insufficient for genomic analysis (6). Therefore, whole genome amplification (WGA) is needed to amplify the DNA hundreds of thousand fold. Three major types of WGA methods (Fig. 1C) have been applied with SCS: multiple displacement amplification (MDA), PCR, and a comprehensive method that combines displacement amplification and PCR, such as the PicoPLEX single-cell WGA kit and multiple annealing and looping-based amplification cycles (MALBAC; ref. 6).
The MDA method is most popularly applied in WGA due to its high fidelity and simplicity. Several updated MDA kits are suitable for single-cell WGA (scWGA), such as REPLI-g Single Cell Kit, GenomiPhi DNA Amplification Kit, and AmpliQ Genomic Amplifier Kit (14). Recently, a MDA system, called the microwell displacement amplification system (MIDAS), enabled the simultaneous amplification of thousands of single cells through the use of hundreds to thousands of nanoliter wells, and the result was more robust with improved uniformity (15). A modified MDA approach, Nuc-seq, took advantage of the fact that a single cell in the G2–M stage of the cell cycle has four copies of the genome and significantly increased the genome coverage (16). For PCR-based scWGA methods, the formulated kits such as the Sigma GenomePlex kit and the Silicon Ampli 1 WGA Kit are often used for CNV/SV analysis; however, these are generally used less than MDA methods because they may generate nonspecific artifacts, have higher error rates, and give incomplete coverage of the genome (17). MALBAC was recently used to amplify the DNA from single SW480 cells (a colon cancer cell line). The results showed uniform, high sequencing coverage (>90%) across the human genome, with a considerable increase in the detection of CNA and SV over MDA performed in the same laboratory (18).
Single-cell WES versus single-cell WGS
The exome constitutes approximately 1% of the whole genome and contains ∼85% of the known disease-causing variants (19). WES involves exome capture and enrichment after genome amplification, and it is significantly more efficient than WGS for sequencing of coding sequences. Single-cell WES (scWES) is even more useful for coding mutation detection because scWES normalizes the profound locus-to-locus bias introduced during scWGA and increases sequence base coverage. Single-cell WGS (scWGS) has been applied for CNA and SNV screening, and is particularly powerful for CNA and non-coding sequence detection (20). scWES has mostly been used for SNV analysis of coding sequences, but can also be used for CNA detection, although it is less efficient (21).
For scWES, the bait libraries used in exome capture can be created using established commercial kits, such as NimbleGen's SeqCap, Agilent SureSelect, Illumina TruSeq, and Nextera Exome. A comparison of these four kits with the same human tumor DNA sample (22) showed that the Illumina kits were the most efficient—they captured more bases in coding and translated regions. The Nextera Exome Enrichment Kit is a fast and efficient system that combines easy sample preparation with TruSeq's exon enrichment process. It does not necessitate mechanical DNA fragmentation and so requires a small amount of input genomic DNA. However, it has been found to introduce bias when the target regions have a high GC content (22).
Currently, massively parallel sequencing is predominately performed on two platforms, Illumina's HiSeq/MiSeq platform and Thermo Fisher Scientific's Ion Torrent sequencer. HiSeq is widely used because it can produce the largest amount of sequencing reads at relatively low cost (23). On the other hand, Ion Torrent provides easy target-sequencing and is the gold standard for mutation detection in clinical genetic testing. The sequencing depth required depends on the goals of the specific experiment. For example, CNA or SV analysis in scWGS requires relatively light sequencing (as little as 0.1–2 fold depth), whereas SNV in scWGS/scWES needs deeper sequencing (typically, 30–50 fold; ref. 24). In addition, the Pacific Biosciences' PacBio RS II and the Oxford Nanopore's MiniON technologies provide super length, single-molecule real-time (SMRT) sequencing, which is extremely useful in de novo and full-length sequencing and may lead to novel discovery of haplotypes, isoforms, etc., in cancer genomics. Once the sequencing is complete, the raw sequencing reads are aligned to the reference genome and then mapped, followed by downstream bioinformatic analysis (20).
Single-Cell Whole Transcriptome Amplification and scRNA-Seq
Single-cell RNA sequencing (scRNA-seq) data reveal not only the quantitative and global expression profile, but exon splicing and allele-specific expression with appropriate sequencing depth and analysis (25, 26). scRNA-seq analysis provides high resolution of the molecular phenotype and plays an important role in revealing the functional heterogeneity or diversity of a population of cells, marker-based types, and uncovering the molecular characters, specific signals, and pathways underlying the process of cancer. scRNA-seq data can also be used to detect mutations (27).
Generally, a human cell contains <1 pg mRNA, and more than 85% of the transcripts are present at less than 100 copies (28). Single-cell whole transcriptome amplification (scWTA) amplifies transcript abundance allowing for characterization by sequencing. A number of approaches have been developed for scWTA (Fig. 1C; ref. 29). Three of the scWTA methods have been able to amplify full-length transcripts (6): Smart-Seq or Smart-Seq2 (commercial kit: SMARTer Ultra Low RNA Kit; refs. 30, 31), Phi29-transcriptome amplification (PTA) and semi-random primed PCR-based transcriptome amplification (STA; ref. 32). However, at the single-cell level, full-length sequences are rarely obtained for most transcripts. In addition, most scRNA-seq methods suffer from amplification bias (33). Islam and colleagues (34) established a unique molecular identification (UMI) technique to directly count the original molecules in single cells with WTA. UMI greatly improved the accuracy, quantification, and digital feature of WTA and was highly efficient. Furthermore, in order to retain the spatial and temporal information of RNAs in cells, several new RNA sequencing methods have been developed, including fluorescent in situ RNA sequencing (FISSEQ; ref. 35), Padlock-RCA (36), single molecule fluorescent in situ hybridization (smFISH; refs. 37, 38), and single-cell transcriptome in vivo analysis (TIVA; ref. 39).
The analysis of thousands or tens of thousand individual cells promises to decode the architecture of a complicated system (12, 13). This is particularly true for cancer studies due to the highly heterogeneous nature of cancers. The early effort to promote high-throughput RNA-seq such as STRT-seq (40, 41), CEL-seq (42), and Quartz-seq (43) involved the pooling/multiplexing of single cells each barcoded at an early stage of scWTA (29). In recent years, novel instruments with automatic operation in combination with cell barcoding, UMI, and nanoliter-size reaction volume stretch the multiplexing ability to parallel analysis of 96 or hundreds (Fluidigm C1; refs. 12, 44) to thousands of single cells at very low cost per cell. The latter is represented by the recently reported droplet-based sequencing, or Drop-seq (45, 46). A shallow RNA-seq could faithfully represent the transcriptomic character of single cells and detect the cellular heterogeneity and activated signaling pathways; as low as ∼50,000 reads per cell is sufficient for unsupervised cell-type classification (44). Therefore, these highly multiplex methodologies allow for super–high-throughput and financially efficient analysis of many cancer cells, potentially revealing inherent dynamic architecture, heterogeneous molecular mechanisms and biomarkers.
qRT-PCR is used to quantify a panel of genes in single cells and may have even higher sensitivity for these selected targets compared with scRNA-seq (47, 48); sequencing of a panel of PCR-selected target genes is another choice in expression profiling of single cells.
Finally, the scRNA-seq data have to be interpreted by appropriate computational and statistical methods, which has been a challenge and the focus of much attention (49).
Single-Cell Epigenomic Sequencing
Epigenetics is defined as the sum of heritable changes to the DNA that do not cause direct alteration of the primary DNA sequence. Such changes include DNA methylation, histone modification, chromatin-binding structural and regulatory proteins, spatial structures of the chromosomes and spatial interactions of components forming transcriptional complexes. Epigenomic analysis reveals different aspects of gene regulation, including gene interactions at different levels. A great number of diseases are attributed to epigenetic abnormalities (50). Epigenetic alterations are reversible and dictate gene expression and are responsible for cellular heterogeneity (51). Therefore, single-cell epigenetics will have an important role in cancer studies and clinical investigation.
Single-cell DNA methylation sequencing
DNA methylation predominately refers to the addition of the methyl group to the cytosine (5-methylcytosine, 5mC) of a CpG dinucleotide in CpG islands (CGI) by DNA methyltransferases (52), although the role of C methylation outside of CGIs and non-CpG methylation is also recognized (53) as well as 5-hydroxymethylcytosine (54). Specific tumor-suppressor genes are usually significantly hypermethylated at their promoter regions or associated CGIs during carcinogenesis (55). Meanwhile, the hypomethylation of repetitive DNA sequences across the genome and of specific oncogenes is the other side of cancer epigenomic regulation, and overall dominates cancer genome (55).
Many DNA methylation detection methods require a large number of cells. Thus, the epigenetic profiles at present are averaged, and the cellular heterogeneity of the population escapes evaluation. Since 2013, three single-cell DNA methylation sequencing (scM-seq) methods have been reported (Fig. 1C). Single-cell reduced representation bisulfite sequencing (scRRBS) developed by Guo and colleagues (56, 57) enabled the detection of 0.5 to 1.5 million CpG sites (CpGs) within the genome of a single cell, which on average was 40% of the CpGs that bulk RRBS can detect, corresponding to 10% of all CpGs in a mouse genome. Smallwood and colleagues (58) reported a single-cell genome-wide bisulfite sequencing method (scBS), covering 3.7 million CpGs (range, 1.8–7.7 million) corresponding to 17.7% of all CpGs in a mouse genome. scBS used a post-bisulfite adaptor tagging (PBAT) protocol, avoiding damage of the two ligated adapters during the bisulfite treatment and improved PCR amplification. Farlik and colleagues (59) further extended the scBS method, named scWGBS, with direct library construction after bisulfite conversion, low-coverage sequencing (capturing 1.54 million CpGs with 4.6 million reads on average), and a supervised bioinformatics strategy using in silico merging of the data from multiple single cells to increase the overall coverage.
Overall, these approaches provide single-nucleotide resolution of CpG methylation patterns, representing an exciting start to scM-seq and showing its significance in biologic analysis. However, the consistency of CpG, CGI, or promoter detection across multiple single cells represents only less than a few percent of the sequences of the human genome, which is too low to enable an unsupervised clustering of single cells for a heterogeneous population like cancer. This probably is attributable to the nature of the bisulfite-conversion–based procedure, and the stochastic sequencing of the genomic sequences, especially in scBS and scWGBS. The scBS covers CpGs and CGIs better than scWGBS and scRRBS, with better consistency across multiple cells, but with the cost of deep sequencing of the whole genome. scRRBS is the most cost efficient choice for consistent detection of CGIs across a panel of single cells. The supervised analysis, using in silico merging of multiple single cells that are known to be in the same subpopulation, or aggregating cohorts of sequence blocks with presumably similar functions, surely increased the CpGs and probably also CGIs coverage, and consistent detection among the single cells in parallel. However, for scM-seq application in cancer, we will not know in advance any subpopulation structure of the single cells analyzed, and we primarily require an unsupervised clustering analysis. Therefore, a scM-seq method robust enough to cover sufficient CGIs or CpGs consistently across many single cells is still needed in spite of the great value of these available methods. Further development could involve a substantially revised bisulfite-sequencing method, or probably a non-BS–based method. Other extensions of scM-seq, such as detection of DNA hydroxymethylation (5hmC; ref. 54), will also be an interesting direction of development.
Single-cell chromatin structure sequencing
The dynamics of chromatin architecture regulates cellular processes, including carcinogenesis and cancer evolution. Buenrostro and colleagues (60) adapted transposase-accessible chromatin analysis for single cells on a microfluidics platform (scATAC-Seq). Here, a nucleus is incubated with Tn5 transposase, which enables adaptor incorporation and PCR amplification of the open chromatin regions for sequencing at single nucleotide–pair resolution. Aggregate analysis of 254 individual GM12878 lymphoblastoid cells recapitulated the patterns of accessible chromatin observed in bulk cells (60). Although the consistent sequence coverage is not that high—about 9.4% of promoters are represented in a single cell—scATAC-seq confidently detected the variability of a set of open chromatin peaks identified using the aggregate accessibility track, and found their dynamic association with cell type–specific trans-factors and cis-elements. The single-cell level of detection of chromosome organization in the three-dimensional space of the nucleus showed cell-to-cell variability in genome-wide chromosome conformation capture (Hi-C), while the individual chromosomes maintain domain organization at the megabase scale (61). This method filled the gaps between genomics and microscopy analyses of chromosomes, enabling the study of gene regulation of cancer based on chromatin organization. A few other tools for epigenetic analyses of single-cell chromatin have also been reported, such as the single chromatin molecule analysis at the nanoscale (SCAN) platform (62), and methods using antibody-based individual intact chromatin fiber labeling, capture and image analysis (63).
In summary, single-cell epigenetic assays are very promising tools that can assess the cellular variation of the “regulome” associated with the complicated phenotypes of human cancers. Further progress in the methodology will promote such studies.
Single-Cell Genome and Transcriptome Codetection and Sequencing
The integrated and simultaneous measurement of the genome and transcriptome, and probably epigenome of a single cell, can lead to a multidimensional analysis of heterogeneity, stratification, and phenotype regulation associated with cancer. Currently, three integrated single-cell genome and transcriptome codetection and sequencing (scGT-seq) methods have been reported (Fig. 1C).
In 2014, Han and colleagues (64) first developed a microfluidics-facilitated scGT-seq approach that used an optimized lysis buffer to separate the cytoplasmic content and the intact nucleus of a single cell. After separation, the genomic DNA and mRNA were MDA amplified separately. The power of this technique is in its controlled separation strategy that can be coupled with any kind of existing scWGS and scWTS protocol to obtain the genomic and transcriptomic information from the same cell. In the future, this method could potentially be used to analyze the correlation between DNA mutation (or methylation or chromatin) and transcription. Recently, a similar physical separation-based method, named G&T-seq (genome and transcriptome sequencing), was reported (65). After cell lysis, G&T-seq used a biotinylated oligo-dT primer to separate polyadenylated (polyA) RNA from gDNA and then (polyA) mRNA was amplified by Smart-Seq2, while the gDNA was amplified by MDA or PicoPlex before sequencing (65). In 2015, Dey and colleagues (66) reported a quasilinear amplification-based method, termed gDNA-mRNA sequencing (DR-Seq). DR-Seq is convenient in the sense that it can quantify scGT without physically separating the cytoplasm and nuclear contents before amplification and uses a one-tube reaction during cell lysis and amplification to minimize the loss of mRNA or gDNA (66), although the gDNA may not be fit for DNA methylation profiling. Further, DR-seq developed a computational strategy to determine CNVs by eliminating the interference of the mRNA reads. Undoubtedly, simultaneous analysis of genomics and/or epigenomics (CpG methylation or chromatin) and transcriptomics in single cells will lead to single-cell omics, providing new insights into the molecular understanding of cancer.
Preliminary Applications Have Led to Exciting Discoveries
Recently, a series of exciting studies utilizing SCS in some major cancers were reported, such as breast cancer (16, 47), colon cancer (18, 67), muscle-invasive bladder cancer (68), intestinal cancer (69), lung adenocarcinoma cancer (27), kidney tumor (70), and acute myeloid leukemia (71, 72). These studies made unprecedented progress in understanding carcinogenesis, progression, metastasis, and drug resistance.
Interestingly, some genes discovered by SCS exhibited a high frequency of mutations at the single-cell level, but low prevalence in the tumor mass (16, 67). Some genes exhibited specific alternative splicing of exon isoforms in different single-cell transcriptomes (30). Allele-specific expression occurs in cancer (73), but has not been reported with SCS data, probably due to technical limitations. These are critical in cancer progression and potentially can be used as candidate markers for cancers.
More in-depth single-cell analysis for carcinogenesis exploration will be useful to acquire necessary knowledge of different rare and dynamic cancer cells (CTCs, DTCs, and other tumor cells) and metastatic processes. A comprehensive summary of reported analysis of CTCs, including SCS-based CTCs studies was published recently (74). Remarkably, a recent SCS analysis of CTCs revealed heterogeneous signaling pathways and heterogeneous drug-resistance mechanisms in advanced prostate cancer (75). These results could significantly improve cancer therapeutic efficacy and promote CTCs to become a noninvasive tool for cancer staging, monitoring and precise therapy. DTCs are another important cell type in carcinogenesis exploration. SCS uncovered new and subtle genetic changes in DTCs by comparing them with the primary tumor and normal tissue (76). SCS identified aneuploidy, chromosomal rearrangements, and point mutations where the occurrence influenced the evolution of the cancer (16).
Validation and characterization of the heterogeneity of cancer is a powerful application of SCS. With SCS, some cancers were found to be of biclonal origin (such as colon cancer; ref. 67) or monoclonal-to-biclonal origin (such as muscle-invasive bladder cancer; ref. 68), while others consisted of a monoclonal population of cells (such as kidney tumor; ref. 70). Additionally, the deep SCS of numerous individual cells was able to trace the lineages of a cancer (acute myeloid leukemia; ref. 71). Non-genome scale methods, such as targeted PCR or target sequencing of a panel of markers of mutation, expression, or epigenomics for a number of samples will be a logical follow up to the genome-wide scanning analyses and may be more practically useful in clinical practice, but are beyond the discussion of this current review.
Challenges and Prospects
SCS promises an unprecedented ability to lead to more efficient, precise, and successful cancer therapies. However, the technology still faces several challenges.
The major application drawback of SCS in clinics is the high cost of sequencing and the platforms appropriate for different clinical requirements such as different cancers and different purposes—screening, detection, monitoring, or personal therapy guidance. Luckily, the dramatic progress of technology and marketplace competition have driven down the cost of sequencing, and more kits are now available.
The limitations of the above-discussed SCS methods pose other key challenges. The first limitation is the low coverage, which is a common scenario when scWGA, scWTA, scM, or sc chromatin profile amplicons are sequenced. Aside from current DNA methylation methods that suffer from low coverage, even the commonly used full-length RNA amplification kit, Smart-Seq, when applied to a large number of single cells, generally, gives mostly fragments of sequences that are far from full-length. More critically, with the Smart-Seq method, only ∼10% efficiency of detection can be obtained for any sequence of an expressed transcript. This is due to the damage, loss, and low efficiency during the single-cell processes, and also the locus-to-locus bias and allele bias in the amplification. Therefore, the development of more sensitive methods that increase the overall coverage, and the consistency of the coverage between single cells should be the initial objective for researchers. The second limitation is the core technologies available for the comprehensive molecular analysis of a single cell. For example, up to now, the technology for simultaneously characterizing a cell's genome, transcriptome, and DNA methylome profile is at the infant stage, and few technologies have the ability to robustly detect the whole proteome of a single cell (77). This ability would enable the synchronized delineation of heterogeneity at multidimensional levels and reveal complex relationships. The third challenge is temporal and spatial measurement of the molecular profile in a single cell. In situ sequencing and real-time sequencing, as well as in vivo analysis of the DNA and RNA from single cells, have been developed, but these methods need enhanced sensitivity, coverage, and accessibility, and a reduction in cost (78, 79). The ideal in vivo technology for single-cell analysis would provide real-time dynamics, rather than a snapshot profile as demonstrated with other in vitro technologies. To this end, progress has been made with just a limited number of transcripts (80). The fourth challenge involves data analysis. Most algorithms currently used for SCS analysis were originally designed for bulk cell samples. These analysis methods do not take into account the inherent properties of a single cell or any amplification- or sequencing-introduced biases, noise, incomplete coverage, or errors, which have to be considered before a true conclusion is drawn. Therefore, enhanced algorithms are needed to meet the new era of SCS (20).
In summary, the described SCS techniques are new powerful approaches that provide more precise genomic analysis of cancer. The initial application of some of these methods has brought forth exciting discoveries about carcinogenesis, and clinicians have initiated the use of these tools for cancer detection, diagnostics, and therapeutic targeting. With further technical improvements, SCS methods will provide unprecedented insights into the characteristics of various cancers with the potential to uncover their etiology, new markers and drug targets, and ultimately reduce the morbidity of this plight on humanity.
After this paper was revised, new technologies for single-cell chromatin profiling were published: single-cell chromatin immunoprecipitation sequencing (scChIP-seq; ref. 81) and scDNase-seq (82). These epigenomic parameters (including scChIP-seq, scDNase-seq, and scATAC-seq) can also be codetected with the transcriptome in single cells.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Acknowledgments
The authors thank Hongjin Wu and Ling Yang for their suggestions in single-cell RNA-seq and bioinformatic analysis, respectively.
Grant Support
This work was supported by the National Natural Science Foundation of China (No. 81402529), the Zhejiang Provincial Foundation for Natural Sciences (No. LZ15H220001), and Zhejiang Science and Technology Planning Project of Health and Medicine (No. 2015PYA009), and the US National Institutes of Health Grant 1P01GM099130-01 and R01DK100858.