The heterogeneous composition of solid tumors is known to impact disease progression and response to therapy. Malignant cells coexist in different regulatory states that can be accessed transcriptomically by single-cell RNA sequencing, but these methods have many caveats related to sensitivity, noise, and sample handling. We revised a statistical fluctuation analysis called stochastic profiling to combine with 10-cell RNA sequencing, which was designed for laser-capture microdissection (LCM) and extended here for immuno-LCM. When applied to a cohort of late-onset, early-stage luminal breast cancers, the integrated approach identified thousands of candidate regulatory heterogeneities. Intersecting the candidates from different tumors yielded a relatively stable set of 710 recurrent heterogeneously expressed genes (RHEG), which were significantly variable in >50% of patients. RHEGs were not strongly confounded by dissociation artifacts, cell-cycle oscillations, or driving mutations for breast cancer. Rather, RHEGs were enriched for epithelial-to-mesenchymal transition genes and, unexpectedly, the latest pan-cancer assembly of driver genes across cancer types other than breast. These findings indicate that heterogeneous transcriptional regulation conceivably provides a faster, reversible mechanism for malignant cells to evaluate the effects of potential oncogenes or tumor suppressors on cancer hallmarks.

Significance:

Profiling intratumor heterogeneity of luminal breast carcinoma cells identifies a recurrent set of genes, suggesting sporadic activation of pathways known to drive other types of cancer.

See related articles by Schaff and colleagues, p. 1853 and Sutcliffe and colleagues, p. 1868

Approximately 80% of human tumors are epithelial carcinomas. Epithelial cells and progenitors proliferate considerably during normal tissue development, maintenance, and repair (1). Proper epithelial organization is enforced by basement membrane extracellular matrix (ECM), which becomes compromised during the epithelial cell-state changes underlying carcinomagenesis (2, 3). Advanced carcinomas show considerable cell-to-cell variation in chromosomal gains–losses (4), overall mutational burden (5), and hybrid epithelial–mesenchymal traits (6). However, the state trajectory of carcinoma cells within large, rapidly progressing tumors is not stereotyped (7), complicating general interpretations of this variability. It is not known when intra–carcinoma cell heterogeneity meaningfully emerges, nor whether there might be common themes early in tumorigenesis that go on to diverge at later stages.

Thoroughly deconstructing intratumor heterogeneity requires transcriptomic approaches that can separate lineages and distinguish regulatory states with high cellular resolution (8, 9). Conventional methods for single-cell RNA sequencing (scRNA-seq) either (i) profile dozens–hundreds of cells at the maximum depth afforded by the single-cell sample, or (ii) shallowly scan (tens of) thousands of cells for molecular phenotyping (10). Regardless of the output, all leading approaches dissociate tumors into single-cell suspensions, requiring up to an hour of tissue processing and yielding a range of carcinoma proportions depending on cancer type (Table 1; refs. 11–16). The impact of these preparative steps on the transcriptomes of live cells is recognized [ref. 17 and see also the accompanying study, ref. 18], but they are considered to be an unavoidable tradeoff of the scRNA-seq approach.

Table 1.

Transcriptomic studies of intra–carcinoma cell heterogeneity from primary clinical cases.

Cancer typeMedian stage [range]Dissociation time, minutesCarcinoma yield, %Read depth per cellReference
Head and neck T3N1 [T1N0–T4aN2c] 60 38 1,340K (11) 
Colon T3N2 [T1n0–T4bN0] 30 28 ≥100K (12) 
Pancreas IIb [Ib–IV] 45 7.8 50K (13) 
 T2N1 [T1cN0–T4N2] 40 6.3 50K (14) 
Breast T2N0 [T1N0–T3N1] n.a. 62 5,800K (15) 
 T2N0 [T0N0–T2N3] 60 62 50K (16) 
Luminal breast T1N0 [T1N0–T2N1] ∼100 604K This study 
Cancer typeMedian stage [range]Dissociation time, minutesCarcinoma yield, %Read depth per cellReference
Head and neck T3N1 [T1N0–T4aN2c] 60 38 1,340K (11) 
Colon T3N2 [T1n0–T4bN0] 30 28 ≥100K (12) 
Pancreas IIb [Ib–IV] 45 7.8 50K (13) 
 T2N1 [T1cN0–T4N2] 40 6.3 50K (14) 
Breast T2N0 [T1N0–T3N1] n.a. 62 5,800K (15) 
 T2N0 [T0N0–T2N3] 60 62 50K (16) 
Luminal breast T1N0 [T1N0–T2N1] ∼100 604K This study 

Abbreviation: n.a., not available.

Considering the drawbacks, we (19) and others (20, 21) have developed approaches orthogonal to scRNA-seq, which combine high-sensitivity transcriptomics with laser-capture microdissection (LCM). Using LCM, histologically distinguishable cell types can be isolated with single-cell resolution from cryosections of snap-frozen tissue or tumor. Unfortunately, all LCM-based sequencing methods have substantially decreased sensitivity and technical reproducibility with fewer than 10 cells of microdissected material (20–22). To gain single-cell information, we reasoned that our 10-cell RNA sequencing (10cRNA-seq) method could productively identify carcinoma cell-regulatory heterogeneities if it were implemented as a type of fluctuation analysis called stochastic profiling (23). Previously, stochastic profiling was applied to 10-cell averaged gene-expression profiles collected repeatedly within a clonal cell line and measured by qPCR or microarray, leading to predicted single-cell heterogeneities that were subsequently validated (23–25). The distinct statistical properties of RNA sequencing (RNA-seq) data required design of a customized analytical pipeline for stochastic profiling by 10cRNA-seq.

Here, we combine 10cRNA-seq with an abundance-dependent overdispersion statistic that enables stochastic profiling of tumor cells in situ. Without any sample dissociation, we deeply profile carcinoma cell-to-cell heterogeneity in a cohort of five closely matched, late-onset and early-stage luminal breast cancers (Table 1; Supplementary Table S1), obtaining data on 21,255 genes from 1,400 carcinoma cells. The LCM component of 10cRNA-seq proved critical to obtain carcinoma heterogeneity profiles from cases with extensive immune infiltration. Ten-cell pooling minimized the contribution of periodic transcripts that covary with cell-cycle phases. Stochastic profiling inferred 710 candidate transcripts that were recurrently heterogeneous in ≥50% of tumors. Among these RHEGs, 85% were retained when the same analysis was applied to three additional luminal breast cancers (15), suggesting bounds for reliable variability between similar tumors. The shared set of candidates was largely devoid of detachment-induced artifacts (26) and, surprisingly, breast cancer driver genes (27). Recurrent heterogeneities were instead enriched in the collagen and matricellular constituents of a pan-cancer epithelial-to-mesenchymal transition (EMT) signature (28). The heterogeneities uncoupled canonical EMT marker genes that are usually linked at the population level. Most intriguing was an enrichment for dozens of noncycling driver genes for cancers other than breast (29). Our results raise the possibility that early-stage luminal breast cancers experience a much broader landscape of oncogenes and tumor suppressors through transcriptional heterogeneity than indicated by the genomic lesions characteristic of the subtype.

Tissue procurement and processing

Human sample acquisition and experimental procedures were carried out in compliance with regulations and protocols approved by the Institutional Review Board for Health Sciences Research (IRB-HSR) at the University of Virginia in accordance with the U.S. Common Rule. In accordance with IRB Protocol #19272, breast cancer samples were collected as ultrasound-guided core needle biopsies during diagnostic visits from participants who provided informed written consent. Each core biopsy was cut into two pieces, freshly cryoembedded in NEG-50 medium (Richard-Allan Scientific) in a dry ice–isopentane bath, and stored at −80°C wrapped in aluminum foil. Cryosectioning and slide storage was performed exactly as described previously (19).

Whole-exome sequencing

Genomic DNA was prepared from cryosectioned material by dissolving 8-μm sections in Buffer AL and purifying with the DNeasy Blood & Tissue Kit (Qiagen). Whole-exome sequencing at 100× coverage was performed as a contract service with Genewiz. Raw BCL files were converted to fastq files with bcl2fastq v.2.19 and adapter trimmed with Trimmomatic v.0.38. Trimmed reads were mapped to the human reference genome (Build 37), and somatic variants were called using the Dragen Bio-IT Platform (Illumina) in somatic mode and a panel of normals to remove technical artifacts. Variants were removed if they were marked as common variants in dbSNP (Build 151) or if they had a non_cancer_AC > 5 in the gnomAD exome database v2.1.1. The filtered VCF was annotated with Ensembl Variant Effect Predictor (VEP) v95 for the Ensembl transcripts overlapping with the filtered variants. The set of consequence terms was defined by sequence ontology, and the term with the most severe impact was selected for each variant in downstream analysis. Summary of per-gene mutational annotations is available in Supplementary File S1, and summary of the per-mutation impact predictions and variant allele frequencies is available in Supplementary File S2.

Rapid histology–immunofluorescence and laser-capture microdissection

For samples with low immune infiltration (UVABC1, UVABC2, and UVABC4), slides were stained and dehydrated as described previously (23). For samples with high immune infiltration (UVABC3 and UVABC5), slides were fixed immediately in 75% ethanol for 30 seconds, rehydrated quickly with PBS, and stained with a mixture of Alexa Fluor 488–conjugated KRT8 antibody (Abcam #ab192467, 1:20 dilution), YOPRO3 (Invitrogen #Y3607, 1:1,000 dilution), and 1 U/ml RNAsin-Plus (Promega) in PBS for one minute. Slides were rinsed twice with PBS before dehydrating with 70% ethanol for 15 seconds, 95% ethanol for 15 seconds, and 100% ethanol for 1 minute and finally clearing with xylene for 2 minutes.

Slides were microdissected immediately on an Arcturus XT LCM instrument (Applied Biosystems) using Capsure HS caps (Arcturus). Cells were either visualized by brightfield microscopy (UVABC1, UVABC2, and UVABC4) or with a dual FITC/TRITC filter (UVABC3, UVABC5). The smallest spot size and typical instrument settings (∼50 mW power and ∼2 milliseconds duration) yielded ∼25 μm spot diameters capturing one to three breast carcinoma cells per laser shot.

For each biopsy, adjacent clusters of 10 cells were captured as 10-cell samples throughout multiple cryosections to access different regions of the tumor. In addition, a matched number of cells was captured nearby the 10-cell samples on a single LCM cap, extracted as a pool, and diluted into 10-cell equivalents that serve as measurement controls (pool-and-split controls). For each tumor, 10cRNA-seq datasets include 10-cell samples and matched pool-and-split controls captured across multiple days.

RNA extraction and library preparation

RNA extraction and amplification of microdissected samples was performed as described previously (19). Briefly, RNA was eluted from the LCM caps by digesting with proteinase K, and oligo-dT primed cDNA was synthesized. Residual RNA was degraded by RNAse H (NEB) digestion, and cDNA was poly(A) tailed with terminal transferase (Roche). Poly(A)-cDNA was amplified using an AL1 primer (ATTGGATCCAGGCCGCTCTGGACAAAATATGAATTCTTTTTTTTTTTTTTTTTTTTTTTT) and a blend of Taq polymerase (NEB) and Phusion (NEB) for 30 cycles.

Poly(A) PCR-amplified samples were first assessed by quantitative PCR for exogenous ERCC spike in standards and endogenous genes (GAPDH and RPL8 as loading controls and the epithelial marker KRT8) as described previously (19). New primers for this study were KRT8 (Fwd: GCCGTGGTTGTGAAGAAGAT, Rev: CCCCAGGTAGTAAACTCCCC) and RPL8 (Fwd: CCCAGCTCAACATTGGCAAT, Rev: ACGGGTCTTCTTGGTCTCAG). Samples were retained if the geometric mean of quantification cycles for the GAPDH–RPL8 loading controls was within three times the interquartile range of the median calculated across all 10-cell samples of the biopsy. Samples beyond that range were excluded because of over- or under-capture during LCM. For samples with increased immune infiltration, we additionally excluded samples with a detectable quantification cycle for the T cell marker CD3D (Fwd: TGCTTTGCTGGACATGAGACT, Rev: CAGGTTCACTTGTTCCGAGC).

Libraries for sequencing were reamplified, purified, and tagmented as described previously (19). Briefly, each poly(A) PCR cDNA sample was reamplified for a number of cycles where the amplification remained in the exponential phase (typically, 10–20 cycles). Reamplified cDNA was then twice purified with Ampure Agencourt XP SPRI beads. After bead purification, samples were quantified on a CFX96 real-time PCR instrument (Bio-Rad) using a Qubit BR Assay Kit (Thermo Fisher Scientific). Samples were diluted to 0.2 ng/μL and tagmented with the Nextera XT DNA Library Preparation Kit (Illumina).

RNA-seq

Libraries from 10-cell samples were multiplexed at an equimolar ratio, and 1.3 pmol/L of the multiplexed pool was sequenced on a NextSeq 500 instrument with NextSeq 500/550 Mid/High Output v1/v2/v2.5 kits (Illumina) to obtain 75-bp paired-end reads. From the sequencing reads, adapters were trimmed using fastq-mcf in the EAutils package (version ea-utils.1.1.2–779), and with the following options: -q 10 -t 0.01 -k 0 (quality threshold 10, 0.01% occurrence frequency, no nucleotide skew causing cycle removal). Quality checks were performed using FastQC (version 0.11.8) and MultiQC (version 1.7). Data were aligned to the human transcriptome (GRCh38.84) along with reference sequences for ERCC spike-ins using RSEM (version 1.3.0) and Bowtie 2 (version 2.3.4.3). RSEM options for the 10cRNA-seq data also included the following: –single-cell-prior–paired-end. RSEM read counts were converted to transcripts per million (TPM) by dividing each value by the total read count for each sample and multiplying by 106. Mitochondrial genes and ERCC spike-ins were not counted towards the total read count during TPM normalization.

Molecular subtype assignments

Microarray-based PAM50 centroids and associated code were obtained from the UNC Microarray Database (https://genome.unc.edu/pubsup/breastGEO/PAM50.zip). To adapt the signature for RNA-seq data, RSEM aligned TPM data for The Cancer Genome Atlas (TCGA) breast tumors was obtained from the UCSC Genome Browser (30). Using a balanced number of estrogen receptor (ER)–negative and ER-positive tumors, median RNA-seq values for PAM50 genes were calculated and subtracted from the entire cohort for standardization. Standardized values were used to predict PAM50 subtypes by using downloaded centroids and code from the UNC Microarray Database (31). Successful model training was visualized by a principal component plot showing both training (microarray) and test (TCGA RNA-seq) data clustering by molecular subtype. The same median correction method was attempted for 10cRNA-seq data from UVABC tumors, but model calibration was unsuccessful due to a lack of ER-negative samples and some large differences in overall abundance of PAM50 genes between bulk and 10-cell data. As a substitute, 10cRNA-seq samples were scored for transcriptional modules associated with ESR1 (464 genes), ERBB2 (27 genes), and AURKA (229 genes) using the “subtype.cluster” function within the package “genefu” (version 2.16.0) in R. On the basis of these module scores, samples were subtyped as Luminal A (ESR1+, ERBB2, AURKA), Luminal B (ESR1+, ERBB2, AURKA+), HER2 (ERBB2+), and Basal (ESR1, ERBB2).

UMAP projections

All UMAP projections were generated using the R package “umapr” (version 0.0.0.9001) with the following parameters: neighbors = 4, distance metric = “correlation,” minimum distance = 0. For UVABC embedding, RSEM counts for UVABC tumors were converted to TPM values and projected onto a UMAP using all endogenous transcripts. The same UMAP projection was re-colored by subtype classifications and batch number. Batch effects were excluded by visualizing TPM estimates of exogenous ERCC spike-in expression for all UVABC tumors on a separate UMAP. scRNA-seq data of breast tumors (15) were obtained as RSEM-aligned TPM values from the Gene Expression Omnibus (GSE75688). Epithelial carcinoma cells were separated from infiltrating cell types in the scRNA-seq data by selecting cells that expressed KRT8 and EPCAM at TPM > 1. 10cRNA-seq UVABC samples and filtered scRNA-seq data (78 cells) were merged by transcript and projected on the same UMAP.

To account for tumor-specific differences in overall transcript abundance, we centered the expression of every transcript by the 25th quartile of its expression in 10-cell samples from each tumor. Quartile-centered samples for each tumor were then merged by transcript and projected onto a UMAP. From the merged quartile-centered data, expression of 710 RHEGs was extracted and projected onto a separate UMAP.

Overdispersion-based stochastic profiling

10cRNA-seq analysis consisted of an identical algorithm applied separately to 28 10-cell samples and 20 pool-and-split controls from each UVABC tumor. RSEM values were rounded to integer counts, and transcripts with zero counts throughout were removed. Abundance-dependent expected expression and error models were generated separately for 10-cell samples and pool-and-split controls using the “knn.error.models” function in the package “scde” (version 1.99.4) with k nearest neighbors set to one fourth of the sample size for both sets. Only transcripts that had a minimum transcript count of five (min.count.threshold) in at least five samples (min.non.failed) were considered for model generation. From the abundance-adjusted error models, adjusted-variance estimates of overdispersion were calculated using the “pagoda.varnorm” function of the same package using the generated error models as input. The variance was further adjusted to account for read depth using the “pagoda.subtract.aspect” function. Transcripts with adjusted variances in 10-cell samples above the 5th percentile of adjusted variances in pool-and-split controls (the empirical type I error rate) were considered candidate heterogeneously expressed transcripts. Finally, transcripts with adjusted variances in the top 5th percentile of pool-and-split controls reflecting high measurement variability were filtered out of candidate lists. Overdispersion-based stochastic profiling was applied identically to each of the five UVABC cases to obtain candidate gene lists. Subsampled evaluations drew from the 28 10-cell observations without replacement. The computational assembly of 10-cell pools from scRNA-seq data (15) used the empirical type I error rate of UVABC4, which had the minimum adjusted variance of the cohort. Final candidate gene lists for every tumor are supplied along with RHEGs in Supplementary File S3.

Cohort subsampling for recurrent heterogeneously expressed gene estimation

The five UVABC tumors were exhaustively downsampled as groups of n = 1 (five total possibilities), 2 (10 total), 3 (10 total), 4 (five total), or 5 (one total) and intersected with the operational recurrent heterogeneously expressed gene (RHEG) definition of transcript heterogeneities that recur in ≥50% of the cases considered: one for n = 1 or 2, two for n = 3 or 4, and three for n = 5.

Copy-number variation inference

To predict copy-number variations (CNV) from 10cRNA-seq, we used inferCNV, which corrects the input expression data for average gene expression based on normal reference cells and applies a moving average with a sliding window of 101 genes within each chromosome. Normal breast tissue gene-expression data were obtained from GTEx (32) as a reference dataset for CNV. The reference GTEx data, UVABC 10cRNA-seq data, and a reference genome position file (GRCh38.86) were input to the “CreateInfercnvObject” function in the package “InferCNV” (version 1.0.3) in R (https://github.com/broadinstitute/inferCNV). The inferCNV object was then analyzed with the “infercnv::run” function with dynamic threshold denoising to infer CNVs as described previously (33).

Inference of bulk differential expression

For a normal reference, scRNA-seq data from three reduction mammoplasties (34) were downloaded and aligned by using RSEM identically as described above for 10cRNA-seq data. To generate pseudo-bulk normal profiles, gene counts were summed across all individual cells from the same individual. For a bulk comparison with UVABC luminal breast tumors, gene counts were summed across pool-and-split controls measured by 10cRNA-seq and grouped by experimental day to generate biological replicates comprised of different regions of the same tumor. DESeq2 (version 1.24.0) was used to identify differentially expressed transcripts between pseudo-bulk normal data and tumor data using model design ∼condition separately for each UVABC tumor. Genes with an adjusted P value less than 0.05 were considered significantly up- or downregulated between the two groups and are summarized in Supplementary File S4.

Periodicity of cell-cycle RHEGs versus non-RHEGs

We obtained 361 oscillating transcripts from Cyclebase 3.0 (35), reconciled aliases with official gene names, and intersected with the RHEGs from 10cRNA-seq, obtaining 24 shared transcripts. Next, using microarray data from synchronized HeLa cells (36), we identified probesets for 21 of the 24 shared transcripts along with those of the top 10 cycling transcripts according to Cyclebase 3.0 (35). The centered probeset data and HeLa pseudotime estimates are available through Cyclebase 3.0 for three complete cell cycles, but only the first two cycles show strong synchronization (35). We calculated the time interval above the mean value and compared it as a ratio to each of the adjacent time intervals below the mean value. For the ratio, the larger pseudotime interval (above or below the mean) was placed in the numerator. We calculated the skewness of the ratio distributions (indicating time-interval asymmetry above vs. below the mean) for the 21 RHEGs with identifiable probesets versus the top-10 cycling transcripts and estimated uncertainty by bootstrapping.

Monte Carlo simulations of three-state stochastic profiling

Monte Carlo simulations of stochastic profiling under the assumption of two regulatory states were performed in MATLAB with available software (22). The two-state model assumes a binomial distribution for the cellular regulatory-state dichotomy and log-normal distribution of measured transcripts. To build a three-state model reflecting cell-cycle-regulated variation, we used a multinomial distribution to reflect cell-cycle fractions, two lognormal regulatory states for G1 and G2–M phases, and a uniform “S-phase” interval between the G1 and G2–M regulatory states. Two- or three-state distributions were compared against a null distribution of lognormal variation using the Kolmogorov–Smirnov (K–S) test. The simulations for a parameter set were run 50 times to measure the median P values and the associated nonparametric confidence intervals. Stochastic sampling was deemed effective when the median P value for F ≠ 0 (multiple states) was less than 0.05 and the median P value for F = 0 (one state) was greater than 0.05. MATLAB code for the modified three-state distributions is available in Supplementary File S5.

Gene signature overlaps with RHEGs

All overlaps were viewed using the “venn” function in the R package gplots (version 3.0.1.1), and intersections were obtained using the “intersect” function. Significance of overlap was calculated using the hypergeometric test in R through the “phyper” function and a total gene count of 20,000. For assessing detachment-induced artifacts in the RHEG gene set, we defined a detachment-induced gene signature as transcripts appearing at least twice among upregulated genes among four luminal breast cancer lines grown in ultralow attachment suspension (26). Breast driver genes were obtained from a larger gene list of cancer drivers (27) filtered for tissue of origin. We similarly assessed overlap with transcripts altered by CTCF knockout in luminal breast cancer cells (37), GATA3 target genes (38), EMT signature gene sets (28, 39), and an aggregated pan-cancer driver gene set (27, 40). For the last comparison, nine RHEGs were considered proximal to driver genes and treated as equivalent between the two sets. All mismatched gene aliases were corrected before overlap testing.

Statistical analysis

Sample sizes for stochastic profiling were determined by Monte Carlo simulation (22). Differences in genes detected per sample between 10cRNA-seq and scRNA-seq were assessed by the K–S test using the “ks.test” function in R. Significance of overlaps between candidate genes identified in different UVABC tumors were assessed by Monte Carlo simulations that drew the total number transcripts per tumor randomly from a common pool of 14,824 genes (total transcripts eligible for overdispersion analysis in all tumors). Observed overlaps were compared with 1,000 Monte Carlo simulations to estimate a P value, which was adjusted for multiple comparisons by using the Šidák correction. All overlaps between the RHEG set and other gene sets were assessed for significance by hypergeometric test using the “phyper” function in R and a background of 20,000 genes. K–S tests for Monte Carlo simulations for stochastic profiling were assessed using the “kstest” function in MATLAB. Hierarchical clustering was performed using “pheatmap” in R using Euclidean distance and “ward.D2” linkage.

Data availability

10cRNA-seq from this study are available through the NCBI Gene Expression Omnibus (GSE147356).

Carcinoma-focused 10-cell profiling of early-stage luminal breast cancer

Women were selected for enrollment if they required ultrasound-guided biopsy for a suspected malignancy after screening mammography (BI-RADS 4C and higher). Just before diagnostic biopsy, we obtained informed written consent to collect an additional ultrasound-guided core sample, which was cryoembedded (19) within one minute of acquisition (Fig. 1A; Table 1). After clinical diagnosis of hormone-positive, HER2-negative breast cancer, we selected five cases that were as closely matched as possible (Supplementary Table S1). The median tumor biopsy was stage 1, grade 3, and aged 63 years—the late-onset group for breast cancer. To define the mutational spectrum of each case, we performed whole-exome sequencing on DNA purified from the cryosectioned biopsy. The analysis identified 139 to 263 point mutations in 118 to 230 genes, with UVABC2 showing the highest burden and UVABC5 the lowest (Supplementary Files S1 and S2). Each case harbored mutations in multiple driver genes for luminal breast cancer (41). Albeit very focused, the cohort was both diverse and representative in terms of tumor genetics for the luminal subtype.

Figure 1.

Focused transcriptional profiling of breast carcinoma cells without dissociation in early-stage tumor biopsies. A, Biopsy tissue was immediately cryoembedded and later cryosectioned for LCM. Tumor cells were visualized with either a rapid nuclear stain or KRT8 immunostain (Supplementary Fig. S1) before 10cRNA-seq (19). B, Tumor histology of the UVABC cohort visualized by hematoxylin & eosin staining. Two cases (marigold inset) showed increased tumor-infiltrating lymphocytes requiring KRT8 immunostain-guided LCM. C, Selective capture of epithelial carcinoma cells assessed by marker transcripts for luminal cells (KRT8, ESR1), basal cells (KRT14, KRT5), immune cells (CD3D, CD19, FCGR1A), and stromal content (FAP). Scale bar, 80 μm (B) or 10 μm (B, inset).

Figure 1.

Focused transcriptional profiling of breast carcinoma cells without dissociation in early-stage tumor biopsies. A, Biopsy tissue was immediately cryoembedded and later cryosectioned for LCM. Tumor cells were visualized with either a rapid nuclear stain or KRT8 immunostain (Supplementary Fig. S1) before 10cRNA-seq (19). B, Tumor histology of the UVABC cohort visualized by hematoxylin & eosin staining. Two cases (marigold inset) showed increased tumor-infiltrating lymphocytes requiring KRT8 immunostain-guided LCM. C, Selective capture of epithelial carcinoma cells assessed by marker transcripts for luminal cells (KRT8, ESR1), basal cells (KRT14, KRT5), immune cells (CD3D, CD19, FCGR1A), and stromal content (FAP). Scale bar, 80 μm (B) or 10 μm (B, inset).

Close modal

Despite histologic similarities in gross tumor characteristics, we noted elevated lymphocyte infiltration in two cases (UVABC3 and UVABC5) that rendered them problematic to microdissect by nuclear histology alone (Fig. 1B; Supplementary Table S1). Therefore, we devised an immuno-LCM procedure that combines an Alexa Fluor 488–conjugated, high-affinity monoclonal antibody against KRT8 with the red-orange nucleic acid stain YOPRO3 (see Materials and Methods). A 1-minute incubation with the antibody-dye cocktail was sufficient to resolve KRT8-positive carcinoma cells from KRT8-negative stromal cells and the YOPRO3-negative autofluorescence of tissue collagen (Supplementary Fig. S1A). We could not find any evidence that the antibody or dye interfered with the critical early steps of 10cRNA-seq (Supplementary Fig. S1B; ref. 19).

For each case, we collected 28 random pools of 10 carcinoma cells located throughout cryosections of the core biopsy. Pools were assembled as local 10-cell groups that reflect both clonal and microenvironmental heterogeneity. We recorded the spatial position of all cells microdissected in the 10-cell samples to leave open the possibility of retroactively linking transcriptomic changes to histological or topological features of the tumor. Samples were deeply sequenced at 6.04 ± 0.75 million reads per 10-cell pool to ensure saturation of gene detection and provide maximum sensitivity for identifying noncarcinoma contaminants. Across all cases, we found that the luminal markers KRT8 and ESR1 predominated by transcripts per million, whereas markers for myoepithelial cells (KRT14, KRT5), T cells (CD3D), B cells (CD19), and macrophages (FGCR1A) were rarely detected (Fig. 1C). Even though desmoplasia was marked for all but one case (Supplementary Table S1), fibroblast-like markers were detectable in only ∼14% of 10-cell samples [using log2FAP > 5 TPM ≈ 0.8 copies per cell (19) as a stringent threshold]. Overall, the observations confirmed the fidelity of (immuno-)LCM for isolating spatially resolved, carcinoma-specific transcriptomic profiles with minimal disruption of tumor architecture.

10cRNA-seq transcriptomes retain the intertumor and intratumor heterogeneity of luminal breast carcinoma cells

As a first assessment, the 10cRNA-seq data were visualized by uniform manifold approximation and projection (UMAP; ref. 42). Consistent with past descriptions of carcinoma heterogeneity by scRNA-seq (11, 15, 16), we found that 10cRNA-seq data clustered tightly by patient (Fig. 2A). Batch effects were not evident within the patient clusters (Supplementary Fig. S2A–S2E). Furthermore, we did not observe any clustering in a separate UMAP visualization using only the data from ERCC spike-ins added to every sample at the time of RNA extraction (Supplementary Fig. S2F). The observed separation of 10cRNA-seq transcriptomes (Fig. 2A) thus reflects bona fide intertumor differences between cases.

Figure 2.

Ten-cell transcriptomes of luminal breast carcinomas are heterogeneous among and within tumors. A, UMAP embedding of 10cRNA-seq samples from the UVABC cohort colored by tumor. B, A three-signature classification system for identifying molecular subtypes of breast cancer in 10cRNA-seq data. Module scores were used to classify the subtype of each sample as indicated. C, Molecular subtype classifications of the 10cRNA-seq samples projected as in A. D, Genes detected by 10cRNA-seq in the UVABC cohort compared with three luminal tumors profiled by scRNA-seq (sc-01, sc-02, sc-03; ref. 15). P < 10–15 by K–S test. E, UMAP embedding of tumors profiled by 10cRNA-seq and scRNA-seq.

Figure 2.

Ten-cell transcriptomes of luminal breast carcinomas are heterogeneous among and within tumors. A, UMAP embedding of 10cRNA-seq samples from the UVABC cohort colored by tumor. B, A three-signature classification system for identifying molecular subtypes of breast cancer in 10cRNA-seq data. Module scores were used to classify the subtype of each sample as indicated. C, Molecular subtype classifications of the 10cRNA-seq samples projected as in A. D, Genes detected by 10cRNA-seq in the UVABC cohort compared with three luminal tumors profiled by scRNA-seq (sc-01, sc-02, sc-03; ref. 15). P < 10–15 by K–S test. E, UMAP embedding of tumors profiled by 10cRNA-seq and scRNA-seq.

Close modal

Next, we sought to classify individual 10cRNA-seq samples into intrinsic breast cancer subtypes. We adapted the microarray-based PAM50 classification of subtypes (31) to RNA-seq (30), but the 50-gene signature was not robust enough for 10cRNA-seq observations (Supplementary Fig. S3A and S3B). A similar distinction between bulk and 10-cell projections was noted in the accompanying study (43). As a substitute, we used expression-signature modules (44) associated with ESR1, ERBB2, and AURKA as proxies for the hormone, HER2, and proliferative status of each 10-cell sample (see Materials and Methods). Within patients, there was considerable variability in module scores (Fig. 2B), corroborating an earlier scRNA-seq study of multiple breast cancer subtypes (15). Although most 10-cell samples were classified as luminal A or luminal B subtype, all cases but UVABC4 contained observations scoring more strongly to other subtypes (Fig. 2C). Three cases (UVABC1–3) harbored instances of all four subtypes, analogous to scRNA-seq observations in glioblastoma (33). The clustering of mixed classifications implied that other patient-specific gene programs were dominant in the UMAP organization. Variations in subtype class were repeatedly documented in nearby samples microdissected from the same histologic section (Supplementary Fig. S4A–S4D). Even for early-stage breast tumors, the 10cRNA-seq data suggested that local variations in regulatory state are pervasive.

Toward a more-direct comparison of 10-cell data with single-cell measurements of gene expression, we extracted 78 KRT8+EPCAM+ carcinoma cells from three cases of luminal breast cancer profiled by scRNA-seq (see Materials and Methods; ref. 15). As previously reported by us and others (19, 20, 45, 46), there were significantly more transcripts detected in local 10-cell pools compared with singly isolated cells (10,066 ± 1,416 genes vs. 5,957 ± 1,824 genes; P < 10–15 by K–S test; Fig. 2D). Overall, 3,319 transcripts found in 10cRNA-seq pools were entirely undetected by scRNA-seq. Notwithstanding the differences in gene coverage, when scRNA-seq and 10cRNA-seq samples were projected on a shared UMAP, the separation between methods was comparable to that among patients (see Materials and Methods; Fig. 2E). Together, the data argue that 10-cell pooling does not dilute out the cell-to-cell and tumor-to-tumor heterogeneities recognized by scRNA-seq.

Stochastic profiling by 10cRNA-seq identifies candidate regulatory heterogeneities

To go beyond qualitative descriptions of molecular subtype and intertumor differences, we sought to adapt the theory of stochastic profiling (22, 23) to 10cRNA-seq (19). Stochastic profiling detects single-cell regulatory heterogeneities by statistically analyzing measurement fluctuations among small pools of cells collected randomly within a cell type (23). Samples with different proportions of cells in high versus low regulatory states from a ∼10-cell pool will create skewed distributions, which reflect characteristics of the underlying single-cell population (25). Compared with earlier microarray-based studies, stochastic-profiling fluctuation analysis using RNA-seq brings new opportunities and challenges. One advantage is that RNA-seq is not biased by the position or quality of hybridization probes (as discussed in ref. 19). However, transcript estimates are sensitive to read depth, and low-abundance genes are susceptible to noise from counting statistics. Rare transcripts are a key hurdle, because sample-to-sample fluctuations will be partially obscured by the detection limit, which combines instances that are “absent” with those that are “not counted” (Fig. 3A). We surmounted these hurdles by extracting a module for estimating “abundance-dependent dispersion” from the SCDE package (47, 48) and redeploying it as a separate inference tool for stochastic profiling. The module relates the squared coefficient of variation (CV) among replicates of each gene in a study (here, a patient) to the abundance magnitude of that gene, building a model for the within-study variance expected of a transcript at a given abundance (Fig. 3B). The variance of each transcript is then normalized by the expected variance for that transcript's abundance, yielding an adjusted variance (Varadj) that acts as an overdispersion score. For high-abundance genes, overdispersed transcripts such as MYL12B show multiple modes or heavier tails than expected (compare with GABARAP; Fig. 3C and D). Low-abundance genes with overdispersion such as TXNRD3 are skewed by multiple instances of moderate-to-high TPM (compare with DSCR3; Fig. 3E and F). The dispersion module incorporates discrete random variables (negative binomial and Poisson) to model aligned reads and dropouts. The Varadj overdispersion score thus provides a principled metric for stochastic-profiling analysis of 10cRNA-seq data.

Figure 3.

Stochastic profiling by 10cRNA-seq through abundance-dependent overdispersion statistics. A, Illustration of theoretical left-censored transcript, which is heterogeneously expressed at one copy per cell (filled, +) and measured with 50% efficiency. Low-frequency mixtures will be equivalently not detected (n.d.). B, Transcript dispersion is inversely related to abundance. Example transcripts with similar abundance but different dispersion (red) are annotated by the subpanel in which they appear. The blue trace indicates a smoothing cubic spline fit of the summarized 10cRNA-seq fluctuations per transcript (gray) that relates the coefficient of variation (CV) of each gene to its average abundance. C and D, 10cRNA-seq sampling fluctuations for high-abundance transcripts with expected dispersion (GABARAP; C) and overdispersion (MYL12B; D). E and F, 10cRNA-seq sampling fluctuations for low-abundance transcripts with expected dispersion (DSCR3; E) and overdispersion (TXNRD3; F). G, Distribution of adjusted variance scores for each gene measured transcriptomically in separate 10-cell samples (purple) compared with pool-and-split controls estimating technical variation (black dashed). The arrow indicates the upper 5th percentile of adjusted variance for the pool-and-split controls, which is used as the 10-cell type I error rate for identifying candidate genes. H, Examples of low-abundance (RPTOR) and high-abundance (MYL6) transcripts deemed to be significantly overdispersed given their relative abundance in 10-cell samples (purple) and their technical variation in pool-and-split controls (black dashed). For CF, continuous traces (marigold) indicate the idealized dispersion expected given the observed transcript abundance, and the adjusted variance (Varadj) measure of dispersion is reported. Gray bars represent drop-out events, which are not detected (n.d.) and modeled by a separate posterior (47, 48). For BH, data from UVABC4 were used as representative examples.

Figure 3.

Stochastic profiling by 10cRNA-seq through abundance-dependent overdispersion statistics. A, Illustration of theoretical left-censored transcript, which is heterogeneously expressed at one copy per cell (filled, +) and measured with 50% efficiency. Low-frequency mixtures will be equivalently not detected (n.d.). B, Transcript dispersion is inversely related to abundance. Example transcripts with similar abundance but different dispersion (red) are annotated by the subpanel in which they appear. The blue trace indicates a smoothing cubic spline fit of the summarized 10cRNA-seq fluctuations per transcript (gray) that relates the coefficient of variation (CV) of each gene to its average abundance. C and D, 10cRNA-seq sampling fluctuations for high-abundance transcripts with expected dispersion (GABARAP; C) and overdispersion (MYL12B; D). E and F, 10cRNA-seq sampling fluctuations for low-abundance transcripts with expected dispersion (DSCR3; E) and overdispersion (TXNRD3; F). G, Distribution of adjusted variance scores for each gene measured transcriptomically in separate 10-cell samples (purple) compared with pool-and-split controls estimating technical variation (black dashed). The arrow indicates the upper 5th percentile of adjusted variance for the pool-and-split controls, which is used as the 10-cell type I error rate for identifying candidate genes. H, Examples of low-abundance (RPTOR) and high-abundance (MYL6) transcripts deemed to be significantly overdispersed given their relative abundance in 10-cell samples (purple) and their technical variation in pool-and-split controls (black dashed). For CF, continuous traces (marigold) indicate the idealized dispersion expected given the observed transcript abundance, and the adjusted variance (Varadj) measure of dispersion is reported. Gray bars represent drop-out events, which are not detected (n.d.) and modeled by a separate posterior (47, 48). For BH, data from UVABC4 were used as representative examples.

Close modal

In conventional scRNA-seq, each cell is considered as an N-of-1 observation that convolves biological variability and technical noise. In contrast, stochastic profiling quantifies technical noise explicitly by pool-and-split analysis of 10-cell equivalents from hundreds of carcinoma cells microdissected in the same vicinity as the samples. We sequenced 20 pool-and-split controls in parallel with the 28 10-cell samples, analyzing the controls separately to construct a null distribution for transcript overdispersion in each tumor. The upper 5th percentile in the null model was used as an empirical type I error rate that defined the critical overdispersion cutoff for 10-cell samples—transcripts above this threshold in the 10-cell samples (but not in the null) were considered candidate regulatory heterogeneities (Fig. 3G and H; Supplementary Fig. S5A–S5E).

Stochastic profiling identifies recurrent transcriptional regulatory heterogeneities

By abundance-dependent dispersion, stochastic profiling identified 9,206 candidate heterogeneities in the UVABC cohort, 161 transcripts of which were undetected by scRNA-seq (Supplementary File S3; ref. 15). The extent of 10-cell sampling was more than sufficient to define the prevalent regulatory heterogeneities in each tumor—when datasets were subsampled, the recall of UVABC candidates saturated when 20 or greater 10-cell samples were used (Supplementary Fig. S6A–S6E). We considered the contribution of heritable differences among tumor subclones in two ways, inferring CNVs from either 10cRNA-seq observations or bulk whole-exome sequencing (Supplementary Figs. S7A–S7F and S8A and S8B). There was good agreement between the two approaches, including the inferred loss of 8p (all) and 11q (UVABC1), along with gains in 1q (UVABC1,2,4) and 8q (UVABC2–4). Neither inference indicated that candidate heterogeneities were biased toward amplified or deleted loci. However, UVABC1—the case with the most-extensive chromosomal instability—was suggestive (Padj = 0.07), supporting the study decision to avoid advanced tumors. For early-stage cancer, stochastic profiling by 10cRNA-seq nominates transcriptional regulatory heterogeneities that not appreciably biased by undersampling or confounded by subclonal genomic variation.

We next asked whether candidate heterogeneities showed any bias toward transcript upregulation or downregulation. To estimate bulk tumor transcriptomes for the UVABC cohort, we aggregated 10cRNA-seq pool-and-split controls from different days and spatially separated cryosections (see Materials and Methods). Normal reference samples were compiled bioinformatically using plate-based scRNA-seq of breast epithelia from three reduction mammoplasties (34). Differential expression analysis identified 5,912 to 7,876 transcripts with bulk abundance differences compared with the normal reference (Supplementary File S4). There were more downregulated transcripts by this comparison, but stochastic-profiling candidates were clearly enriched among those that were upregulated (Supplementary Table S2). The bias is expected given that a regulatory burst of transcription will affect cellular mRNA abundances more rapidly than transcriptional silencing. Importantly, 58% to 67% of candidates showed no detectable difference in average abundance, illustrating that most regulatory heterogeneity escapes bulk differential expression, as further corroborated by the accompanying study (43).

Overall, 3,627 candidates by stochastic profiling were exclusive to one of five patients, laying bare the extraordinary challenge of interpreting malignant cell-state heterogeneity beyond well-known markers of differentiation (9). Encouragingly, when the candidates were intersected, we noted a significant enrichment of transcripts that appeared repeatedly in three or more breast cancer cases (Fig. 4A). Such transcripts were classified as RHEG, for which there were 710 in total. Generalizing the RHEG definition to candidates appearing in >50% of the cases considered, we examined the stability of RHEG numbers by subsampling the UVABC cases (see Materials and Methods). As the quantity of patients increased, RHEGs stabilized in the range of 500 to 1,500 for this highly circumscribed cohort (Fig. 4B; Supplementary Table S1). We examined the generality of RHEGs by building simulated datasets of 10 carcinoma cells from scRNA-seq of three luminal breast cancers described earlier (Supplementary Fig. S9A; ref. 15). Among the 610 RHEGs that were detectable by scRNA-seq, 85% were retained as RHEGs when the three additional cases of luminal breast cancer were considered (Supplementary Fig. S9B). RHEGs thus provide a robust conceptual framework for prioritizing cell-state regulatory heterogeneities identified in vivo (18, 43).

Figure 4.

Intersecting candidate regulatory heterogeneities across tumors yields a set of RHEGs. A, Gene overlaps between and among cases in the UVABC cohort. Transcripts belonging to each tumor are available in Supplementary File S1. Enriched categories were assessed statistically by Monte Carlo simulation (see Materials and Methods). B, RHEG size as a function of the number of UVABC tumors included. RHEGs were defined as transcripts that appeared in >50% of the tumors included. Data are shown as the median ± range of 5Cn combinations of n tumors in the UVABC cohort. C and D, UMAP embedding of patient-standardized 10cRNA-seq transcriptomes (C) or RHEGs (D). Patient-enriched clusters are highlighted with dashed ovals.

Figure 4.

Intersecting candidate regulatory heterogeneities across tumors yields a set of RHEGs. A, Gene overlaps between and among cases in the UVABC cohort. Transcripts belonging to each tumor are available in Supplementary File S1. Enriched categories were assessed statistically by Monte Carlo simulation (see Materials and Methods). B, RHEG size as a function of the number of UVABC tumors included. RHEGs were defined as transcripts that appeared in >50% of the tumors included. Data are shown as the median ± range of 5Cn combinations of n tumors in the UVABC cohort. C and D, UMAP embedding of patient-standardized 10cRNA-seq transcriptomes (C) or RHEGs (D). Patient-enriched clusters are highlighted with dashed ovals.

Close modal

To evaluate RHEGs as an organizing principle, we revisited the UMAP visualization of UVABC cases from the standpoint of regulatory heterogeneity (Fig. 2A). Because abundance-dependent dispersion evaluates fluctuations local to each tumor (Fig. 3), it was first necessary to standardize the 10cRNA-seq transcriptomes separately and regenerate the UMAP (see Materials and Methods). Tumor-specific standardization intermingled the 10cRNA-seq observations considerably, but two clusters remained enriched in samples from UVABC1 and UVABC2 (Fig. 4C). When the same approach was applied using RHEGs exclusively, we observed a projection that was different from when the whole 10cRNA-seq transcriptome was used (Fig. 4D). For the same UMAP parameters, samples were more distributed than clustered, with the UVABC1-enriched cluster disappearing and a new UVABC4-enriched cluster appearing. This analysis suggested that RHEGs could be used as a lens to refocus transcriptome-wide heterogeneity on the most-robust variations.

RHEGs are not dominated by cell-cycle covariates

In scRNA-seq, the most-overarching contributor to heterogeneity is the phase of the cell cycle (11, 49). However, it was not obvious whether such single-cell variations would also overrepresent in RHEGs derived from 10-cell pools. Using a panel of 863 transcripts associated with replicating cells (49), we identified 62 among the 710 RHEGs, a significant overlap (P < 10–6 by hypergeometric test) but one comprising <10% of the list overall. Most of the overlapping genes were expressed acutely during one cell-cycle transition (e.g., G1–S, G2–M), which is akin to the two-state expression models foundational for the theory of stochastic profiling (22, 23, 25). When the cell-cycle search was restricted to 361 oscillating transcripts (35), the RHEG intersection reduced to 24 genes (P < 0.01 by hypergeometric test). Moreover, when we compared the periodicity of the overlapping genes to the most-symmetrically cycling transcripts, we found that RHEGs were significantly more imbalanced in their oscillations, implying bias toward one phase of the cell cycle (see Materials and Methods; Supplementary Fig. S10A and S10B; Supplementary File S6). The lack of oscillators directly proportional to the cell cycle is consistent with Monte Carlo simulations of stochastic profiling that model a three-state distribution corresponding to G1, S, and G2–M populations (Supplementary Fig. S10C–S10F; Supplementary File S5). Stochastic-profiling theory thus bolstered our conclusion that genes tracking with cell-cycle oscillations are filtered out when transcriptomes are measured as 10-cell pools.

RHEGs are largely devoid of detachment artifacts and influence from breast cancer driver genes

We also considered other trivial explanations for genes in the RHEG set. Although tumors were not dissociated, it was possible that detachment-like regulatory variation was induced locally and rapidly from the tissue damage of the biopsy procedure. Using a 547-gene signature of suspension culture in luminal breast cancer cells (26), we intersected with the RHEG set and found that only 14 were shared (Fig. 5A). Thus, our clinical-procurement and sample-handling procedures had avoided detachment-like damage responses in the breast carcinoma cells.

Figure 5.

RHEGs have little in common with detachment signatures or mutational drivers of breast cancer. A–C, Venn diagrams intersecting the UVABC RHEG set with a cell-detachment signature (A; ref. 26), a set of robust drivers for breast cancer (B, left; ref. 27), a list of transcripts altered by CTCF knockout in a luminal breast cancer cell line (B, right; ref. 37), and a list of GATA3 target genes (C; ref. 38). Statistical significance of overlaps was assessed by the hypergeometric test.

Figure 5.

RHEGs have little in common with detachment signatures or mutational drivers of breast cancer. A–C, Venn diagrams intersecting the UVABC RHEG set with a cell-detachment signature (A; ref. 26), a set of robust drivers for breast cancer (B, left; ref. 27), a list of transcripts altered by CTCF knockout in a luminal breast cancer cell line (B, right; ref. 37), and a list of GATA3 target genes (C; ref. 38). Statistical significance of overlaps was assessed by the hypergeometric test.

Close modal

Metascape (50) analysis revealed a significant overlap between the RHEG set and genes associated with luminal breast cancer (q < 10–4; Supplementary File S7). Therefore, we looked at known breast cancer drivers with the premise that mutations may arise subclonally in a breast carcinoma and disrupt abundance of the encoded transcript. Among 29 robust driver genes for breast cancer (27), only one was shared with the RHEG set: CTCF (Fig. 5B, left). As an insulator protein, CTCF abundance changes could cause secondary transcriptional alterations; however, we did not observe any enrichment for conserved CTCF-sensitive genes (37) in the RHEG set (Fig. 5B, right). The same was true when we separated candidate heterogeneities for UVABC2, which has an F416L mutation in CTCF at an allele fraction of 22% (Supplementary Fig. S11A; Supplementary File S2). Although not classified as a RHEG, we also investigated transcriptional targets for the most-prevalent transcription factor mutated in luminal breast cancer, GATA3 (27). We again found no RHEG enrichment among transcripts altered by mutant GATA3 in luminal breast cancer cells (Fig. 5C; ref. 38). However, when candidates were separated for UVABC4, which has a D48Y mutation in GATA3 at an allele fraction of 23%, there was a detectable overlap (Supplementary Fig. S11B; Supplementary File S2). Stochastic profiling can thus detect the consequences of fractionally mutated transcription factors in primary cancers—the lack of association with RHEGs supported that RHEGs were more than a reflection of known sources of cell-to-cell heterogeneity in cancer.

RHEGs are enriched for EMT signatures and cofluctuate with canonical EMT markers

scRNA-seq of dissociated tumors has identified (partial-)EMT states in some carcinomas (11, 16) but not others (12). For breast cancer, changes along the EMT spectrum are mostly described in hormone-negative cell lines, but more-recent work reports EMT-like activation patterns in 65% to 85% of primary luminal breast cancers (51). We intersected a pan-cancer EMT signature (39) with the RHEG set and found significant overlap in multiple collagens, matricellular proteins, and other transcripts in the signature (Fig. 6A). The data suggest that ECM dysregulation in these tumors is jointly mediated by the carcinoma cells together with cancer-associated fibroblasts. RHEG enrichment was also found with an independently derived EMT signature (Supplementary Fig. S12A; ref. 28), reinforcing the result. Metascape (50) analysis additionally supported a modest enrichment in target genes for the EMT effector, SNAI1 (q < 0.05; Supplementary File S7). Formally, none of the canonical EMT regulators [ZEB1 (mesenchymal, M), ZEB2 (M)] or markers [CDH1 (epithelial, E), VIM (M), FN1 (M)] were RHEGs, even though VIM was a candidate heterogeneity in two cases (UVABC3, UVABC5) and FN1 a candidate in one case (UVABC3). We clustered these canonical EMT transcripts with the EMT RHEGs and found clear separation of E- and M-associated transcripts among 10-cell pools along with several notable subclusters by gene and by sample (Fig. 6B). The organization by patient was unexpected; for example, UVABC2 showed the most evidence for the E state, even though it was one of the most-advanced stage tumors of the cohort (Supplementary Table S1). Reciprocally, 10-cell profiles of the high-grade UVABC3 tumor were no more scattershot in EMT transcripts than UVABC4 (grade 2) or UVABC5 (grade 1; Fig. 6B). Among samples with M characteristics, ZEB2 appeared prominently among samples abundant for some transcripts (FN1, COL6A1, SPARC, VIM) but not others (COL5A2, TAGLN). M-state fragmentation was also observed in UVABC1, which was predominated by samples positive for VIM, SPARC, and SERPING1 but negative for CTSK, TAGLN, and mesenchymal collagens (Fig. 6B; Supplementary Fig. S12B). The RHEG set thereby provided a transcriptomic resource for posing questions about EMT regulatory patterns in early-stage luminal breast cancers.

Figure 6.

RHEGs contain EMT markers and driver genes for cancers other than breast. A, Venn diagram intersecting the UVABC RHEG set with a pan-cancer EMT signature (39). Shared genes are listed. B, Hierarchical clustering of the shared genes in A along with epithelial (E; CDH1) and mesenchymal (M; VIM, FN1, ZEB1, ZEB2) markers that were reliably detected by 10cRNA-seq. Stromal character is indicated by the relative abundance of the fibroblast marker FAP compared with the luminal markers ESR1, EPCAM, GATA3, and KRT8. C, Venn diagram intersecting the UVABC RHEG set with a pan-cancer set of driver genes (27, 40). The intersection was updated to include proximal RHEGs as described in Supplementary Table S3. Shared genes (“RHEG drivers”) are listed. D, Hierarchical clustering of RHEG drivers. Arrows with solid lines between co-clustering drivers indicate possibly direct transcription factor–target gene relationships as described in the text. The arrow with a dashed line indicates a ligand–receptor interaction. E, Leading upstream candidate regulators of the RHEG drivers. The full list of candidates is available in the complete Metascape analysis (Supplementary File S9). For A and C, statistical significance of overlaps was assessed by the hypergeometric test.

Figure 6.

RHEGs contain EMT markers and driver genes for cancers other than breast. A, Venn diagram intersecting the UVABC RHEG set with a pan-cancer EMT signature (39). Shared genes are listed. B, Hierarchical clustering of the shared genes in A along with epithelial (E; CDH1) and mesenchymal (M; VIM, FN1, ZEB1, ZEB2) markers that were reliably detected by 10cRNA-seq. Stromal character is indicated by the relative abundance of the fibroblast marker FAP compared with the luminal markers ESR1, EPCAM, GATA3, and KRT8. C, Venn diagram intersecting the UVABC RHEG set with a pan-cancer set of driver genes (27, 40). The intersection was updated to include proximal RHEGs as described in Supplementary Table S3. Shared genes (“RHEG drivers”) are listed. D, Hierarchical clustering of RHEG drivers. Arrows with solid lines between co-clustering drivers indicate possibly direct transcription factor–target gene relationships as described in the text. The arrow with a dashed line indicates a ligand–receptor interaction. E, Leading upstream candidate regulators of the RHEG drivers. The full list of candidates is available in the complete Metascape analysis (Supplementary File S9). For A and C, statistical significance of overlaps was assessed by the hypergeometric test.

Close modal

RHEGs are enriched for pan-cancer driver genes and suggest transcription factor–target relationships in single cells

The dearth of breast cancer driver genes among RHEGs (Fig. 5B) prompted us to look at cancer drivers more broadly. We merged 299 robust drivers for any cancer type (27) with the latest pan-cancer analysis reporting 803 drivers from 2,658 tumors (40) and intersected with the RHEG set. There were multiple instances where RHEGs resided in the same complex, pathway, or gene subfamily as a cancer driver (Supplementary Table S3). We included these proximal RHEGs and altogether found 46 genes as “RHEG drivers” shared between the two datasets (P = 0.001 by hypergeometric test; Fig. 6C; Supplementary File S8). Even with the expanded driver set, we found no enrichment for mutated breast cancer driver genes (P = 0.6 by hypergeometric test). In the UVABC cohort, RHEG drivers may be leveraged noncanonically through transcriptional regulation rather than mutation.

Finally, we clustered the RHEG drivers to ask whether there were interpretable covariations spanning multiple patients (Fig. 6D). Associations among 10-cell samples were tightest for UVABC2 and UVABC4, in line with their separation on the earlier UMAP (Fig. 4D). Coclustering RHEG drivers suggested direct modes of action between transcription factors and target genes (Fig. 6D, solid arrows). For example, knockdown of NFATC4 blocks induction of the neighboring RHEG driver, TNFSF10 (52), and there is literature that the reprogramming factor KLF4 is required for full induction of CDKN1A (53). Although no functional studies are available for CDKN2D, another coclustering cyclin-dependent kinase inhibitor (Fig. 6D, solid arrows), the CDKN2D locus is occupied by KLF4 (54) and may warrant further study. Likewise, the MAD1L1 locus is reportedly among the top 250 binding events in the genome for TP73 (55)—a RHEG driver absent from all luminal breast cancer cells profiled by scRNA-seq (15). Some of the debate involving MAD1L1 as a TP53 target gene (reviewed in ref. 56) could be explained by compensation from TP73 (57). The finding is consistent with the TP53 mutations identified in UVABC1–3 and bioinformatic prediction of the TP53 family as a candidate upstream regulator of RHEG drivers overall (Fig. 6E; Supplementary Files S1 and S2). Likewise, given the hormone-positive luminal subtype, we were encouraged to see ESR1 and the ESR1 effector EGR1 (58) as leading candidates for upstream regulation (Fig. 6E; Supplementary File S9). RHEG drivers are variably expressed within luminal breast cancers, and our data suggest that some are variably active.

This work combines 10cRNA-seq (19) and stochastic profiling (23) for disruption-free isolation of cancer cell-regulatory heterogeneities in a clinically practicable way. We targeted LCM isolation to breast carcinoma cells here by using nuclear cytology or epithelial-targeted antibodies; the approach is also compatible with genetically encoded fluorophores (18, 43). For 10cRNA-seq, artifactual cell stress (17) is avoided by LCM (18), and dominant cycling transcripts are mitigated by 10-cell averaging that blurs out sample-to-sample differences in cell-cycle phase. But in many respects, 10cRNA-seq of malignant cells shares similarities with scRNA-seq: Cases are very different from one another, and samples vary substantially within cases. What differs is overall gene coverage per sample (10cRNA-seq > scRNA-seq), as well as the analytical approach needed to discern single-cell differences. Abundance-dependent overdispersion can identify candidates from 10cRNA-seq, much like the nonparametric distribution tests first deployed for microarray data (22). As with microarrays, we anticipate future developments toward parameterizing the underlying single-cell distributions, which combine to yield 10cRNA-seq observations (25). The 10cRNA-seq–based subtype classifications predicted local differences not obvious from histology, and tools for spatial analysis of biomolecules are rapidly advancing (59). It will be especially intriguing when spatial transcriptomics (60) reaches the resolution and sensitivity of 10 cells.

A concern when initiating this study was that CNVs within and among primary tumors would dominate the heterogeneities identified by stochastic profiling. Fortunately, gene candidates were uncoupled from chromosomal gains–losses measured in bulk by whole-exome sequencing and estimated at the 10-cell level by inferCNV. Restricting to early-stage cancers was wise, because tumors with more extensive and variegated genomic aberrations would surely have such variation reflected in transcript abundance. Our analyses suggest that UVABC1—with a 10q loss and partial 11q gain typical for the luminal subtype (41)—was on the cusp of skewing toward genomic heterogeneity. As with recurrent CNVs, recurrent early-stage heterogeneities in transcriptional regulation may be characteristic of some cancer subtypes.

RHEGs open the possibility of making more specific claims about intratumor heterogeneity beyond cell stress, cell cycle, cell proteostasis, and cell type (9). A recent scRNA-seq study documented 12 such recurrently heterogeneous programs in a large panel of cell lines (61). Although genes in these programs overlap significantly with RHEGs (P < 10–8 by hypergeometric test), they only comprise ∼9% of the RHEG set overall. Partial EMTs in carcinomas have been documented by scRNA-seq (11, 15), which we verify here in earlier-stage tumors without any predissociation (Table 1). Although there are many ways to elicit EMT-like states, a leading explanation for the UVABC cohort is tissue stiffness given their marked desmoplasia. Notably, one RHEG is the hemidesmosomal integrin ITGB4, which acts as a critical sensor for matrix stiffness in breast epithelia (3). ITGB4 was undetected in every luminal breast cancer cell analyzed by conventional scRNA-seq (15).

An important finding from this work is the overlap between RHEGs and pan-cancer drivers. Unlike genomic alterations, transcriptional changes are reversible, rendering them less risky from the standpoint of early tumor evolution. Induced or repressed genes that productively elicit cancer hallmarks have the subsequent opportunity to be “hard-coded” by mutation, deletion, or amplification. Among RHEG drivers, secreted ligands and receptors are not overly prevalent, but a pair with some coordination is the p53 target gene GDF15 and its cognate receptor RET (Fig. 6D, dashed arrow; ref. 62). In an accompanying study on another carcinoma type (18), we suggest that such receptor–ligand pairs could engage as locally varying autocrine–paracrine circuits within a tumor that shape the immune microenvironment. For instance, among the 136 candidates shared by the two cases requiring immuno-LCM because of extensive lymphocytic infiltration, we identified an inhibitory ligand for NK cells (ADGRG1), a major histocompatibility class II receptor (HLA-DRA), a macrophage stimulatory ligand (MST1), and the palmitoyltransferase for PD-L1 (ZDHHC9). Natural variation in such carcinoma transcripts could one day be mapped to associating changes in the type and extent of immune-cell recruitment (63).

Single-cell cancer biology must trade off coverage, throughput, and handling artifacts to retain the conceptual allure of measuring one cell. The approaches described and implemented here for late-onset, early-stage breast cancer are also a compromise, but one that triangulates differently by using cell pools to reduce handling and improve coverage. We see great potential in using stochastic profiling by 10cRNA-seq to deconstruct the very earliest stages of tumor initiation and premalignancy in engineered systems (43) and in precancerous in situ lesions of the breast where the need for treatment is actively debated (64).

S. Singh reports grants from NIH during the conduct of the study. M.D. Sutcliffe reports grants from NIH during the conduct of the study. J.A. Harvey is a shareholder of Volpara Ltd, which offers software to measure breast density. K.A. Janes reports grants from NIH and David & Lucile Packard Foundation during the conduct of the study, as well as grants from NIH and National Science Foundation outside the submitted work. No disclosures were reported by the other authors.

S. Singh: Conceptualization, data curation, software, formal analysis, validation, investigation, visualization, methodology, writing–original draft, writing–review and editing. M.D. Sutcliffe: Software, formal analysis, investigation, methodology. K. Repich: Investigation, project administration. K.A. Atkins: Investigation. J.A. Harvey: Conceptualization, supervision, project administration. K.A. Janes: Conceptualization, resources, data curation, supervision, funding acquisition, investigation, visualization, writing–original draft, project administration, writing–review and editing.

The authors thank Emily Farber and Suna Onengut-Gumuscu at the University of Virginia (UVA) Center for Public Health Genomics for RNA-seq library preparation and sequencing; Craig Rumpel and Angela Miller at the UVA Biorepository and Tissue Research Facility for LCM maintenance and histology services subsidized by UVA Cancer Center support grant no. P30-CA044579; UVA Research Computing for high-performance computing access and consulting; Henry Pritchard for assistance with NCBI Gene Expression Omnibus deposition; and Dylan Schaff, Francine Garrett-Bakelman, Hui Zong, and Cheryl Borgman for critical evaluation of the manuscript. This work was supported by the NIH #R01-CA194470 (to K.A. Janes), the David & Lucile Packard Foundation #2009-34710 (to K.A. Janes), a Wagner Fellowship (to S. Singh), and the UVA Medical Scientist Training Program (to S. Singh).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Tomasetti
C
,
Vogelstein
B
. 
Cancer etiology. Variation in cancer risk among tissues can be explained by the number of stem cell divisions
.
Science
2015
;
347
:
78
81
.
2.
Nguyen-Ngoc
KV
,
Cheung
KJ
,
Brenot
A
,
Shamir
ER
,
Gray
RS
,
Hines
WC
, et al
ECM microenvironment regulates collective migration and local dissemination in normal and malignant mammary epithelium
.
Proc Natl Acad Sci U S A
2012
;
109
:
E2595
604
.
3.
Chaudhuri
O
,
Koshy
ST
,
Branco da Cunha
C
,
Shin
JW
,
Verbeke
CS
,
Allison
KH
, et al
Extracellular matrix stiffness and composition jointly regulate the induction of malignant phenotypes in mammary epithelium
.
Nat Mater
2014
;
13
:
970
8
.
4.
Navin
N
,
Kendall
J
,
Troge
J
,
Andrews
P
,
Rodgers
L
,
McIndoo
J
, et al
Tumour evolution inferred by single-cell sequencing
.
Nature
2011
;
472
:
90
4
.
5.
Gerlinger
M
,
Rowan
AJ
,
Horswell
S
,
Math
M
,
Larkin
J
,
Endesfelder
D
, et al
Intratumor heterogeneity and branched evolution revealed by multiregion sequencing
.
N Engl J Med
2012
;
366
:
883
92
.
6.
Nieto
MA
,
Huang
RY
,
Jackson
RA
,
Thiery
JP
. 
EMT: 2016
.
Cell
2016
;
166
:
21
45
.
7.
Davis
A
,
Gao
R
,
Navin
N
. 
Tumor evolution: Linear, branching, neutral or punctuated?
Biochim Biophys Acta Rev Cancer
2017
;
1867
:
151
61
.
8.
Janes
KA
. 
Single-cell states versus single-cell atlases - two classes of heterogeneity that differ in meaning and method
.
Curr Opin Biotechnol
2016
;
39
:
120
5
.
9.
Suva
ML
,
Tirosh
I
. 
Single-cell RNA sequencing in cancer: lessons learned and emerging challenges
.
Mol Cell
2019
;
75
:
7
12
.
10.
Svensson
V
,
Natarajan
KN
,
Ly
LH
,
Miragaia
RJ
,
Labalette
C
,
Macaulay
IC
, et al
Power analysis of single-cell RNA-sequencing experiments
.
Nat Methods
2017
;
14
:
381
7
.
11.
Puram
SV
,
Tirosh
I
,
Parikh
AS
,
Patel
AP
,
Yizhak
K
,
Gillespie
S
, et al
Single-cell transcriptomic analysis of primary and metastatic tumor ecosystems in head and neck cancer
.
Cell
2017
;
171
:
1611
24
.
12.
Li
H
,
Courtois
ET
,
Sengupta
D
,
Tan
Y
,
Chen
KH
,
Goh
JJL
, et al
Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors
.
Nat Genet
2017
;
49
:
708
18
.
13.
Elyada
E
,
Bolisetty
M
,
Laise
P
,
Flynn
WF
,
Courtois
ET
,
Burkhart
RA
, et al
Cross-species single-cell analysis of pancreatic ductal adenocarcinoma reveals antigen-presenting cancer-associated fibroblasts
.
Cancer Discov
2019
;
9
:
1102
23
.
14.
Peng
J
,
Sun
BF
,
Chen
CY
,
Zhou
JY
,
Chen
YS
,
Chen
H
, et al
Single-cell RNA-seq highlights intra-tumoral heterogeneity and malignant progression in pancreatic ductal adenocarcinoma
.
Cell Res
2019
;
29
:
725
38
.
15.
Chung
W
,
Eum
HH
,
Lee
HO
,
Lee
KM
,
Lee
HB
,
Kim
KT
, et al
Single-cell RNA-seq enables comprehensive tumour and immune cell profiling in primary breast cancer
.
Nat Commun
2017
;
8
:
15081
.
16.
Yin
J
,
Li
Z
,
Yan
C
,
Fang
E
,
Wang
T
,
Zhou
H
, et al
Comprehensive analysis of immune evasion in breast cancer by single-cell RNA-seq
.
BioRxiv 368605 [Preprint].
2018
.
Available from
: https://www.biorxiv.org/content/10.1101/368605v2.
17.
van den Brink
SC
,
Sage
F
,
Vertesy
A
,
Spanjaard
B
,
Peterson-Maduro
J
,
Baron
CS
, et al
Single-cell sequencing reveals dissociation-induced gene expression in tissue subpopulations
.
Nat Methods
2017
;
14
:
935
6
.
18.
Schaff
DL
,
Singh
S
,
Kim
KB
,
Sutcliffe
MD
,
Park
KS
,
Janes
KA
. 
Fragmentation of small-cell lung cancer regulatory states in heterotypic microenvironments
.
Cancer Res
2021
;
81
:
1853
67
.
19.
Singh
S
,
Wang
L
,
Schaff
DL
,
Sutcliffe
MD
,
Koeppel
AF
,
Kim
J
, et al
In situ 10-cell RNA sequencing in tissue and tumor biopsy samples
.
Sci Rep
2019
;
9
:
4836
.
20.
Nichterwitz
S
,
Chen
G
,
Aguila Benitez
J
,
Yilmaz
M
,
Storvall
H
,
Cao
M
, et al
Laser capture microscopy coupled with Smart-seq2 for precise spatial transcriptomic profiling
.
Nat Commun
2016
;
7
:
12139
.
21.
Chen
J
,
Suo
S
,
Tam
PP
,
Han
JJ
,
Peng
G
,
Jing
N
. 
Spatial transcriptomic analysis of cryosectioned tissue samples with Geo-seq
.
Nat Protoc
2017
;
12
:
566
80
.
22.
Wang
L
,
Janes
KA
. 
Stochastic profiling of transcriptional regulatory heterogeneities in tissues, tumors and cultured cells
.
Nat Protoc
2013
;
8
:
282
301
.
23.
Janes
KA
,
Wang
CC
,
Holmberg
KJ
,
Cabral
K
,
Brugge
JS
. 
Identifying single-cell molecular programs by stochastic profiling
.
Nat Methods
2010
;
7
:
311
7
.
24.
Wang
L
,
Brugge
JS
,
Janes
KA
. 
Intersection of FOXO- and RUNX1-mediated gene expression programs in single breast epithelial cells during morphogenesis and tumor progression
.
Proc Natl Acad Sci U S A
2011
;
108
:
E803
12
.
25.
Bajikar
SS
,
Fuchs
C
,
Roller
A
,
Theis
FJ
,
Janes
KA
. 
Parameterizing cell-to-cell regulatory heterogeneities via stochastic transcriptional profiles
.
Proc Natl Acad Sci U S A
2014
;
111
:
E626
35
.
26.
Tasdemir
N
,
Ding
K
,
Savariau
L
,
Levine
KM
,
Du
T
,
Elangovan
A
, et al
Proteomic and transcriptomic profiling identifies mediators of anchorage-independent growth and roles of inhibitor of differentiation proteins in invasive lobular carcinoma
.
Sci Rep
2020
;
10
:
11487
.
27.
Bailey
MH
,
Tokheim
C
,
Porta-Pardo
E
,
Sengupta
S
,
Bertrand
D
,
Weerasinghe
A
, et al
Comprehensive characterization of cancer driver genes and mutations
.
Cell
2018
;
173
:
371
85
.
28.
Mak
MP
,
Tong
P
,
Diao
L
,
Cardnell
RJ
,
Gibbons
DL
,
William
WN
, et al
A patient-derived, pan-cancer EMT signature identifies global molecular alterations and immune target enrichment following epithelial-to-mesenchymal transition
.
Clin Cancer Res
2016
;
22
:
609
20
.
29.
ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium
. 
Pan-cancer analysis of whole genomes
.
Nature
2020
;
578
:
82
93
.
30.
Ciriello
G
,
Gatza
ML
,
Beck
AH
,
Wilkerson
MD
,
Rhie
SK
,
Pastore
A
, et al
Comprehensive molecular portraits of invasive lobular breast cancer
.
Cell
2015
;
163
:
506
19
.
31.
Parker
JS
,
Mullins
M
,
Cheang
MC
,
Leung
S
,
Voduc
D
,
Vickery
T
, et al
Supervised risk predictor of breast cancer based on intrinsic subtypes
.
J Clin Oncol
2009
;
27
:
1160
7
.
32.
Consortium
GT
. 
Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans
.
Science
2015
;
348
:
648
60
.
33.
Patel
AP
,
Tirosh
I
,
Trombetta
JJ
,
Shalek
AK
,
Gillespie
SM
,
Wakimoto
H
, et al
Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma
.
Science
2014
;
344
:
1396
401
.
34.
Nguyen
QH
,
Pervolarakis
N
,
Blake
K
,
Ma
D
,
Davis
RT
,
James
N
, et al
Profiling human breast epithelial cells using single cell RNA sequencing identifies cell diversity
.
Nat Commun
2018
;
9
:
2028
.
35.
Santos
A
,
Wernersson
R
,
Jensen
LJ
. 
Cyclebase 3.0: a multi-organism database on cell-cycle regulation and phenotypes
.
Nucleic Acids Res
2015
;
43
:
D1140
4
.
36.
Whitfield
ML
,
Sherlock
G
,
Saldanha
AJ
,
Murray
JI
,
Ball
CA
,
Alexander
KE
, et al
Identification of genes periodically expressed in the human cell cycle and their expression in tumors
.
Mol Biol Cell
2002
;
13
:
1977
2000
.
37.
Aitken
SJ
,
Ibarra-Soria
X
,
Kentepozidou
E
,
Flicek
P
,
Feig
C
,
Marioni
JC
, et al
CTCF maintains regulatory homeostasis of cancer pathways
.
Genome Biol
2018
;
19
:
106
.
38.
Takaku
M
,
Grimm
SA
,
Roberts
JD
,
Chrysovergis
K
,
Bennett
BD
,
Myers
P
, et al
GATA3 zinc finger 2 mutations reprogram the breast cancer transcriptional network
.
Nat Commun
2018
;
9
:
1059
.
39.
Tan
TZ
,
Miow
QH
,
Miki
Y
,
Noda
T
,
Mori
S
,
Huang
RY
, et al
Epithelial-mesenchymal transition spectrum quantification and its efficacy in deciphering survival and drug responses of cancer patients
.
EMBO Mol Med
2014
;
6
:
1279
93
.
40.
Rheinbay
E
,
Nielsen
MM
,
Abascal
F
,
Wala
JA
,
Shapira
O
,
Tiao
G
, et al
Analyses of non-coding somatic drivers in 2,658 cancer whole genomes
.
Nature
2020
;
578
:
102
11
.
41.
Cancer Genome Atlas Network
. 
Comprehensive molecular portraits of human breast tumours
.
Nature
2012
;
490
:
61
70
.
42.
Becht
E
,
McInnes
L
,
Healy
J
,
Dutertre
CA
,
Kwok
IWH
,
Ng
LG
, et al
Dimensionality reduction for visualizing single-cell data using UMAP
.
Nat Biotechnol
2019
;
37
:
38
44
.
43.
Sutcliffe
MD
,
Galvao
RP
,
Wang
L
,
Kim
J
,
Singh
S
,
Zong
H
, et al
Premalignant oligodendrocyte precursor cells stall in a heterogeneous state of replication stress prior to gliomagenesis
.
Cancer Res
2021
;
81
:
1868
82
.
44.
Gendoo
DM
,
Ratanasirigulchai
N
,
Schroeder
MS
,
Pare
L
,
Parker
JS
,
Prat
A
, et al
genefu: Computation of gene expression-based signatures in breast cancer
.
R package version 2.18.1 [software]
. 
2020
.
45.
Picelli
S
,
Bjorklund
AK
,
Faridani
OR
,
Sagasser
S
,
Winberg
G
,
Sandberg
R
. 
Smart-seq2 for sensitive full-length transcriptome profiling in single cells
.
Nat Methods
2013
;
10
:
1096
8
.
46.
Marinov
GK
,
Williams
BA
,
McCue
K
,
Schroth
GP
,
Gertz
J
,
Myers
RM
, et al
From single-cell to cell-pool transcriptomes: stochasticity in gene expression and RNA splicing
.
Genome Res
2014
;
24
:
496
510
.
47.
Kharchenko
PV
,
Silberstein
L
,
Scadden
DT
. 
Bayesian approach to single-cell differential expression analysis
.
Nat Methods
2014
;
11
:
740
2
.
48.
Fan
J
,
Salathia
N
,
Liu
R
,
Kaeser
GE
,
Yung
YC
,
Herman
JL
, et al
Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis
.
Nat Methods
2016
;
13
:
241
4
.
49.
Buettner
F
,
Natarajan
KN
,
Casale
FP
,
Proserpio
V
,
Scialdone
A
,
Theis
FJ
, et al
Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells
.
Nat Biotechnol
2015
;
33
:
155
60
.
50.
Zhou
Y
,
Zhou
B
,
Pache
L
,
Chang
M
,
Khodabakhshi
AH
,
Tanaseichuk
O
, et al
Metascape provides a biologist-oriented resource for the analysis of systems-level datasets
.
Nat Commun
2019
;
10
:
1523
.
51.
Savci-Heijink
CD
,
Halfwerk
H
,
Hooijer
GKJ
,
Koster
J
,
Horlings
HM
,
Meijer
SL
, et al
Epithelial-to-mesenchymal transition status of primary breast carcinomas and its correlation with metastatic behavior
.
Breast Cancer Res Treat
2019
;
174
:
649
59
.
52.
Wang
Q
,
Zhou
Y
,
Weiss
HL
,
Chow
CW
,
Evers
BM
. 
NFATc1 regulation of TRAIL expression in human intestinal cells
.
PLoS One
2011
;
6
:
e19882
.
53.
Zhang
W
,
Geiman
DE
,
Shields
JM
,
Dang
DT
,
Mahatan
CS
,
Kaestner
KH
, et al
The gut-enriched Kruppel-like factor (Kruppel-like factor 4) mediates the transactivating effect of p53 on the p21WAF1/Cip1 promoter
.
J Biol Chem
2000
;
275
:
18391
8
.
54.
Aksoy
I
,
Giudice
V
,
Delahaye
E
,
Wianny
F
,
Aubry
M
,
Mure
M
, et al
Klf4 and Klf5 differentially inhibit mesoderm and endoderm differentiation in embryonic stem cells
.
Nat Commun
2014
;
5
:
3719
.
55.
Santos Guasch
GL
,
Beeler
JS
,
Marshall
CB
,
Shaver
TM
,
Sheng
Q
,
Johnson
KN
, et al
p73 is required for ovarian follicle development and regulates a gene network involved in cell-to-cell adhesion
.
iScience
2018
;
8
:
236
49
.
56.
Fischer
M
. 
Census and evaluation of p53 target genes
.
Oncogene
2017
;
36
:
3943
56
.
57.
Hoadley
KA
,
Yau
C
,
Wolf
DM
,
Cherniack
AD
,
Tamborero
D
,
Ng
S
, et al
Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin
.
Cell
2014
;
158
:
929
44
.
58.
Lu
S
,
Becker
KA
,
Hagen
MJ
,
Yan
H
,
Roberts
AL
,
Mathews
LA
, et al
Transcriptional responses to estrogen and progesterone in mammary gland identify networks regulating p53 activity
.
Endocrinology
2008
;
149
:
4809
20
.
59.
Gut
G
,
Herrmann
MD
,
Pelkmans
L
. 
Multiplexed protein maps link subcellular organization to cellular states
.
Science
2018
;
361
:
eaar7042
.
60.
Stahl
PL
,
Salmen
F
,
Vickovic
S
,
Lundmark
A
,
Navarro
JF
,
Magnusson
J
, et al
Visualization and analysis of gene expression in tissue sections by spatial transcriptomics
.
Science
2016
;
353
:
78
82
.
61.
Kinker
GS
,
Greenwald
AC
,
Tal
R
,
Orlova
Z
,
Cuoco
MS
,
McFarland
JM
, et al
Pan-cancer single-cell RNA-seq identifies recurring programs of cellular heterogeneity
.
Nat Genet
2020
;
52
:
1208
18
.
62.
Mullican
SE
,
Lin-Schmidt
X
,
Chin
CN
,
Chavez
JA
,
Furman
JL
,
Armstrong
AA
, et al
GFRAL is the receptor for GDF15 and the ligand promotes weight loss in mice and nonhuman primates
.
Nat Med
2017
;
23
:
1150
7
.
63.
Tsujikawa
T
,
Kumar
S
,
Borkar
RN
,
Azimi
V
,
Thibault
G
,
Chang
YH
, et al
Quantitative multiplex immunohistochemistry reveals myeloid-inflamed tumor-immune complexity associated with poor prognosis
.
Cell Rep
2017
;
19
:
203
17
.
64.
van Seijen
M
,
Lips
EH
,
Thompson
AM
,
Nik-Zainal
S
,
Futreal
A
,
Hwang
ES
, et al
Ductal carcinoma in situ: to treat or not to treat, that is the question
.
Br J Cancer
2019
;
121
:
285
92
.

Supplementary data