Abstract
It has long been appreciated that tumors are diverse, varying in mutational status, composition of cellular infiltrate, and organizational architecture. For the most part, the information embedded in this diversity has gone untapped due to the limited resolution and dimensionality of assays for analyzing nucleic acid expression in cells. The advent of high-throughput, next-generation sequencing (NGS) technologies that measure nucleic acids, particularly at the single-cell level, is fueling the characterization of the many components that comprise the tumor microenvironment (TME), with a strong focus on immune composition. Understanding the immune and nonimmune components of the TME, how they interact, and how this shapes their functional properties requires the development of novel computational methods and, eventually, the application of systems-based approaches. The continued development and application of NGS technologies holds great promise for accelerating discovery in the cancer immunology field.
Introduction
The field of cancer immunology was born in the late 19th century, with the recognition by William B. Coley that acute bacterial infections were often associated with remissions in cancer patients. This led him to create “Coley's toxins,” a mixture of bacteria that was administered to patients with the goal of activating the immune system to eradicate tumors (1). Coley's toxins constituted the first cancer immunotherapy and were variably used around the world until the mid-1900s when non–immune-based treatments, such as radiation and chemotherapy, became the prevailing anticancer therapies. Consequently, interest in cancer immunotherapy and, by extension, cancer immunology, diminished.
Cancer immunology has experienced a renaissance due to the advancements in therapies that realize the potential of the immune system to fight cancer. These include blockade of inhibitory or immune-checkpoint receptors, adoptive cell therapies, and personalized cancer vaccines. Alongside these advancements, improvements in microfluidics and high-throughput sequencing technologies have increased the speed, efficiency, and resolution with which the nucleic acid content of cells can be read. This, coupled with advances in computational methods for data normalization and analysis, is enabling the deconvolution of the complex tumor microenvironment (TME), the discovery of tumor antigens, and the annotation of novel therapeutic targets. This article reviews the state of the art of next-generation sequencing (NGS) technologies, with a focus on applications of single-cell transcriptomics in cancer immunology and the challenges raised by the growing complexity of the data being generated.
Application of NGS in Cancer Immunology
NGS, which uses massively parallel sequencing of DNA fragments, introduced high-throughput and low-cost discrete measurement of nucleic acid profiles to the field of molecular biology, and has, for the most part, replaced microarrays. NGS technology has formed the foundation of several technologies, including whole-exome sequencing, RNA sequencing (RNA-seq), single-cell RNA-seq (scRNA-seq), and ATAC-seq. Specific examples of how some of these technologies have been applied in the cancer immunology field are discussed in Hu and colleagues (2). In this review, we focus primarily on single-cell RNA-seq.
scRNA-seq: Methods overview
Single cells are captured for measurement of their transcriptional landscape using either plate-based or microfluidics-based methods. Plate-based methods involve sorting of cells into separate wells (e.g., in a 96-well plate) via fluorescence-activated cell sorting, followed by RNA-seq protocols applied to each of the wells and pooling of samples following cell barcoding (different methods pool at different steps; Table 1). This approach enables freedom with respect to the RNA-seq protocol used and allows for index-sorting (quantification of protein expression in the cells sorted into individual wells) but is limited with respect to the number of cells that can be processed due to its time-consuming nature. Initially, microfluidics-based methods used microfluidic chips to capture single cells into individual chambers (e.g., Fluidigm) followed by lysis, reverse transcription, and amplification for library generation. Microfluidics have been used to pair within droplets single cells with beads carrying cell-identifying barcodes (3). Microfluidic-based capture of single cells and beads carrying cell-identifying barcodes into chambers has been implemented in Seq-well, which is a portable and low-cost alternative to droplet-based methods (4). Droplet-based methods are of higher throughput than microfluidic chip-based and plate-based methods, generating thousands of single-cell transcripts at relatively low cost, but are restricted to either 3′ or 5′ end sequencing protocols.
Comparison of commonly used RNA-seq protocolsa
Protocol . | SMART-Seq2 . | Cel-Seq2, MARS-seq, STRT . | 10X Chromium, Drop-seq, Indrop . |
---|---|---|---|
Capture method | Plate-based | Plate-based | Droplet-based |
Transcript | Full-length | 3′ or 5′ | 3′ or 5′ |
UMI | No | Yes | Yes |
Throughput | Medium | Medium | High |
TCR/BCR annotation | Yes | Possible with additional primer amplification | Specific to method |
Pooling step | Late | Early/Late | Early |
Protocol . | SMART-Seq2 . | Cel-Seq2, MARS-seq, STRT . | 10X Chromium, Drop-seq, Indrop . |
---|---|---|---|
Capture method | Plate-based | Plate-based | Droplet-based |
Transcript | Full-length | 3′ or 5′ | 3′ or 5′ |
UMI | No | Yes | Yes |
Throughput | Medium | Medium | High |
TCR/BCR annotation | Yes | Possible with additional primer amplification | Specific to method |
Pooling step | Late | Early/Late | Early |
aSome methods are not cited due to space constraints.
Two methodologies (SPLiT-seq and Sci-RNA-seq) bypass the need for physical isolation of single cells by using combinatorial barcoding to perform scRNA-seq (5, 6). Although these methods have not been applied yet in the cancer immunology field, they hold great promise for accelerating discovery, given that they can be used with fixed cells. This allows for the contemporaneous processing of samples that are collected longitudinally and mitigates batch effects stemming from serial processing of samples. This feature makes these two methods attractive for the analyses of patient samples.
In addition to the single-cell RNA-seq methods described above, single-nucleus RNA-seq methods have been introduced to overcome technical challenges associated with dissociation of single cells from tissue. Single-nucleus methods profile the mRNA landscape within each nucleus separately and can be performed using both plate-based and droplet microfluidic-based technologies (7–9). Single-nucleus sequencing methods have been repeatedly shown to accurately capture heterogeneity across cells and dynamic cell states, despite profiling the RNA in the nucleus only (10, 11). To date, single-nucleus RNA-seq methods have been used primarily for study of the brain but have also been used to profile tumor cells (11). Although immune cells are difficult to profile with single-nucleus RNA-seq due to their low RNA content, future technical advances could make such methods useful given their applicability to frozen archived samples.
scRNA-seq: Deconvolving the tumor immune microenvironment
The ability to read and annotate transcriptomes at single-cell resolution, coupled with the development of computational methodologies for data analysis (see Box 1 and Fig. 1), has enabled the profiling of the different components of the TME at unprecedented depth: many cells and many transcripts. Naturally, scRNA-seq was quickly leveraged to advance our understanding of the immune component of tumors.
Single-cell data analysis methods. Several analysis steps are taken to generate an initial characterization of single-cell data (following or in parallel to normalization of technical noise and artifacts). A, Various linear (e.g., PCA: principal component analysis) and nonlinear (e.g., tSNE; t-distributed stochastic neighborhood embedding and diffusion maps) dimensionality reduction methods can be used for identifying the main discriminants of the data of interest and for visualization. Clustering of cells by their transcriptomes can identify sets of cells that comprise units within the system (diffusion component illustration based on ref. 37). B, Gene sets that covary across the data identify gene modules of interest with respect to heterogeneity and potential functionality of the cell subpopulations within a sample (figure based on ref. 50). C, Integration of additional data types and sources can enable broader insights into the scRNA-seq data set. Shown are two examples. Left, scoring single cells for the extent to which they express predefined gene signatures to infer function and characteristics of populations identified. Right, integration of single-cell TCR information generated in parallel to the scRNA-seq data (figure based on ref. 12).
Single-cell data analysis methods. Several analysis steps are taken to generate an initial characterization of single-cell data (following or in parallel to normalization of technical noise and artifacts). A, Various linear (e.g., PCA: principal component analysis) and nonlinear (e.g., tSNE; t-distributed stochastic neighborhood embedding and diffusion maps) dimensionality reduction methods can be used for identifying the main discriminants of the data of interest and for visualization. Clustering of cells by their transcriptomes can identify sets of cells that comprise units within the system (diffusion component illustration based on ref. 37). B, Gene sets that covary across the data identify gene modules of interest with respect to heterogeneity and potential functionality of the cell subpopulations within a sample (figure based on ref. 50). C, Integration of additional data types and sources can enable broader insights into the scRNA-seq data set. Shown are two examples. Left, scoring single cells for the extent to which they express predefined gene signatures to infer function and characteristics of populations identified. Right, integration of single-cell TCR information generated in parallel to the scRNA-seq data (figure based on ref. 12).
Box 1: Single-Cell Transcriptomics Analysis: Basic Concepts
Computational tools and packages are now available to efficiently perform a variety of analyses and produce visualizations of single-cell transcriptomics data (30). Software packages [such as Seurat, ref. 31 (R-package); Scanpy, ref. 32 (python toolkit); R Bioconductor, ref. 33; and Biscuit, ref. 12] are used for initial data normalization and batch correction followed by general landscape characterization of the cell population (e.g., via visualizations, clustering, and the detection of highly variable genes).
The characterization of the populations profiled by scRNA-seq includes several steps, with the goal of inferring a cell's identity with respect to different factors of interest and gaining an understanding of the extent of diversity in the given data set (34, 35). Due to the high dimensionality of scRNA-seq data, linear [e.g., principal components analysis (PCA)] and nonlinear [e.g., diffusion components and t-distributed stochastic neighborhood embedding (tSNE)] projections are frequently used to reduce the dimensionality of the input data for subsequent analyses (Fig. 1A). These techniques are useful for visualization, cell clustering, and the annotation of sets of genes that covary across the data. Genes that are specific to each of the identified cell clusters can be annotated using statistical models that vary in their efficiency and the extent to which they account for technical aspects of scRNA-seq (36).
Frequently, cells profiled with scRNA-seq are not naturally organized in clusters, but rather in continuous trajectories. In such cases it is advised to leverage additional methods for the extraction of informative genes and data visualization. Several packages that use diffusion maps and force directed layout or similar techniques include Destiny (implemented in R Bioconductor) (37), Monocle (38), Scanpy (32), SPRING (39), and others (40, 41). An additional approach infers the future state of a cell by leveraging the relative ratio of spliced and unspliced mRNA molecules within each cell, enabling the discovery of branching events in cell differentiation from scRNA-seq data collected at a single time point (42) A technique within Scanpy incorporates both clustering and trajectory inference for visualization in a unified framework (43).
A prominent component of scRNA-seq analysis involves identifying gene modules of interest: sets of genes that covary within the given data set (Fig. 1B). Such gene modules can be annotated in multiple ways and are then utilized by the researcher for analyses, such as inferring the functionality of cell subsets (clusters) or identifying central candidates for perturbation. Gene modules can be identified via annotation of gene sets that are cell-cluster-specific, correlated with a dimension of interest (e.g., a specific PC or diffusion component) or covary across the data (as implemented in PAGODA; ref. 44).
Following initial characterization, additional analyses are used to explore the scRNA-seq landscape. For example, gene sets identified as relevant for a cell population (cluster) or trajectory of interest can be analyzed with bioinformatics techniques to identify dominant pathways and potential regulators (via, e.g., ENRICHR, ref. 45; GORILLA, ref. 46; and MSIGDB, ref. 47). Gene sets of special interest to the researcher can be tested for their relevance to the cell populations or trajectories of interest (e.g., cell-cycle–related genes, ref. 48; or gene sets associated with annotated function, ref. 49; Fig. 1C). Additionally, tailored analyses such as integration of public data sets (e.g., TCGA) or TCR/BCR information (Fig. 1C) can elucidate novel insights of the studied system.
In breast carcinoma, a large-scale scRNA-seq study of over 45,000 cells identified increased heterogeneity of gene expression in intratumoral lymphoid and myeloid cells compared with cells in normal breast tissue, likely reflecting the responses of intratumoral immune cells to the diverse environmental signals present in tumor tissue (12). Other scRNA-seq studies have uncovered previously unappreciated predictive properties of the immune component within the TME. In malignant glioma, scRNA-seq revealed that preestablished ways of distinguishing across glioma subtypes (IDH-A and IDH-O) are accounted for mainly by differences in the TME rather than the malignant cells themselves and that increased tumor grade was associated with differential expression of macrophage over microglia gene programs (13). In metastatic melanoma, Nirschl and colleagues (14) identified a homeostatic IFNγ-dependent program that is enriched in monocytes and dendritic cells and stratifies survival. Future scRNA-seq studies of the TME will continue to advance our knowledge of the immune component of different tumors and its relationship to disease state.
scRNA-seq: Understanding T-cell states in cancer
ScRNA-seq has led to important insights regarding checkpoint receptor expression in tumor-infiltrating lymphocytes (TIL) and the functional states observed in T cells in different cancers. An scRNA-seq study of human breast tumors revealed that the checkpoint receptors TIGIT and Lag-3 were present at a higher frequency on T cells than PD-1, suggesting that the former molecules may be better targets in breast tumors (15). In a melanoma mouse model, Chihara and colleagues used scRNA-seq and mass spectrometry (CyTOF) to identify a coinhibitory gene module in TILs that contains novel checkpoint receptors and is cooperatively regulated by PRDM1 and c-MAF (16). Also, in a melanoma mouse model, Singer and colleagues (17) showed that checkpoint receptor expression can be uncoupled from dysfunctional CD8+ T-cell phenotypes and identified distinct dysfunction and activation gene programs that separated cell populations identified with scRNA-seq. In human melanoma, scRNA-seq was used to identify a gene signature for T-cell dysfunction and inferred cell-to-cell interactions between T cells and cancer-associated fibroblasts (18). Lastly, in non–small cell lung cancer, scRNA-seq of T cells identified “exhausted” and “preexhausted” CD8+ T-cell populations and showed that a high ratio of preexhausted to exhausted cells was associated with better prognosis (19).
Analysis of TCR sequences in single cells is further shedding light on T-cell behavior in tumors. Paired scRNA-seq and TCR sequencing in breast carcinoma showed that different T-cell clones vary in their extent of activation, suggesting the presence of a continuous spectrum of T-cell activation states that is shaped by TCR usage (12). ScRNA-seq and TCR sequence analysis of peripheral blood, tumor, and normal tissue from hepatocellular carcinoma (HCC) patients identified that exhausted CD8+ T cells and regulatory T cells (Treg) are enriched and clonally expanded in HCC compared with normal tissues (19). The development of TCR sequencing protocols compatible with droplet technology will further accelerate the current understanding of the relationship of T-cell clonality to functional T-cell states across different tumor types.
Epigenetics: Understanding the chromatin landscape of CD8+ T cells in cancer
Coupling NGS with chromatin accessibility assays enables determination of the epigenetic and regulatory landscape of cells. Methods such as DNase-seq, Mnase-seq, and FAIRE-seq enable a genome-wide view of the epigenetic landscape but require laborious protocols and large cell counts (100K–1M cells), thus limiting their application in cancer immunology. The introduction of ATAC-seq (20), a method that detects open chromatin by sequencing transposase-accessible regions and enables mapping of transcription factor occupancy for small cell counts and even single cells, has opened the door to the study of TILs from the epigenetic perspective.
Gaining an understanding of the epigenetic landscape of TILs that exhibit different functional states is important for understanding the underlying mechanisms that govern transition between cellular states and the reprogramming potential of TILs. Philip and colleagues (21) used ATAC-seq to study the epigenetic landscape of CD8+ TILs and identified two distinct CD8+ T-cell states in a murine tumor model—one that can be reversed upon in vitro activation and one that cannot. Coupling such analyses with TCR sequence data will assist in determining T-cell differentiation trajectories in the TME and how these may change upon therapeutic modulation.
NGS has been applied to analyze the spatial organization of chromatin using methods such as Hi-C. Hi-C has been used to determine the chromosomal abnormalities present in tumor cells (22, 23). This method can also be applied to study long-range DNA–DNA interactions. Chen and colleagues used Hi-C to identify and validate an enhancer 140 kb downstream of PD-L1, which is active in tumor cells (24). Future studies will likely apply Hi-C to study DNA organization and gene regulation in TILs.
Protein and space: The next frontiers
Two novel technologies leverage NGS to expand the dimensions of data obtained. Cellular indexing of transcriptomes and epitopes or CITE-seq and RNA expression and protein sequencing or REAP-seq integrate single-cell transcriptomics with limited scale proteomics (25, 26). Both of these utilize droplet microfluidics and DNA oligo-tagged antibodies to read out protein and RNA expression profiles in a single workflow. These technologies are useful for assessing transcript to protein relationships but are limited to the examination of surface protein expression only. These technologies are anticipated to be integrated into the cancer immunology field in coming years.
Technologies that allow for high-throughput transcriptomic measurements, while observing the cells' spatial location within tissue, have also been developed. Spatial transcriptomics (ST; ref. 27) allows for the measurement of transcriptomes within spatially resolved areas in tissue sections. This is achieved by positioning frozen tissue sections on a glass slide containing a grid with unique positional barcodes. ST can be paired with multiplex imaging and RNA-seq to map cell types and their niches in situ. Although ST does not measure transcriptomes at the single-cell level, paired scRNA-seq data from the same tissue can be used to computationally deconvolve the composition of each ST location (28). In pancreatic cancer, ST revealed that cancer cells and stromal cells colocalize in regions disparate from pancreatic ductal and acinar cells (28). Although this study did not map TILs, we expect ST to be used for such purposes in the future.
NICHE-seq is a different technology for adding spatial context to transcriptomic data. This technique combines two-photon microscopy, photoactivatable fluorescence, and RNA-seq to annotate single-cell transcriptomes in discrete tissue locations (29). An attractive feature of NICHE-seq is the ability to control the timing of fluorescence photo-activation, which allows for kinetic analyses. However, a limitation of this technique is that it can only be applied to tumors that are engineered to express photoactivatable green-fluorescent protein (PA-GFP).
Perspective and Concluding Remarks
High-throughput data generation methods are transforming the cancer immunology field but also pose several challenges. First, they require the researcher to achieve an understanding of the data generation methods and their limitations. Second, they require the researcher to achieve a solid understanding of the analytical methods and what can be inferred from them. Third, as more hypothesis-generating data are created, experimental systems suitable for validating and testing predictions made from the data will become critical. Lastly, the increasing complexity of data generated from large-scale scRNA-seq efforts, such as the human cell atlas, the immune cell atlas, and the tumor cell atlas (51) together with the rapid increase in the dimensions that can be measured (e.g., protein and spatial variables), requires cross-disciplinary partnerships that can leverage advanced computational and systems biology approaches to discover and characterize connections between genes and cells within the TME. The continued development of NGS-based technologies and companion analytical methods are expected to rapidly propel our understanding of the immune composition of tumors through the lens of high-throughput data.
Disclosure of Potential Conflicts of Interest
A.C. Anderson reports receiving commercial research funding from Idera Pharmaceuticals, Potenza Therapeutics, Tizona Therapeutics, and Sanofi, and is a consultant/advisory board member for Tizona Therapeutics, Compass Therapeutics, Zumutor, Potenza Therapeutics, and Aximmune. No potential conflicts of interest were disclosed by the other author.
Authors' Contributions
Conception and design: M. Singer, A.C. Anderson
Writing, review, and/or revision of the manuscript: M. Singer, A.C. Anderson
Acknowledgments
The authors would like to thank Joshua Levin and Adam Haber for helpful discussions. Work in the author's laboratory is supported by grants from the NIH (R01 CA187975, R01 CA229400, P01 AI073748, and P01 AI039671 to A.C. Anderson).