Little is known of the geospatial architecture of individual cell populations in lung adenocarcinoma (LUAD) evolution. Here, we perform single-cell RNA sequencing of 186,916 cells from five early-stage LUADs and 14 multiregion normal lung tissues of defined spatial proximities from the tumors. We show that cellular lineages, states, and transcriptomic features geospatially evolve across normal regions to LUADs. LUADs also exhibit pronounced intratumor cell heterogeneity within single sites and transcriptional lineage-plasticity programs. T regulatory cell phenotypes are increased in normal tissues with proximity to LUAD, in contrast to diminished signatures and fractions of cytotoxic CD8+ T cells, antigen-presenting macrophages, and inflammatory dendritic cells. We further find that the LUAD ligand–receptor interactome harbors increased expression of epithelial CD24, which mediates protumor phenotypes. These data provide a spatial atlas of LUAD evolution, and a resource for identification of targets for its treatment.
The geospatial ecosystem of the peripheral lung and early-stage LUAD is not known. Our multiregion single-cell sequencing analyses unravel cell populations, states, and phenotypes in the spatial and ecologic evolution of LUAD from the lung that comprise high-potential targets for early interception.
This article is highlighted in the In This Issue feature, p. 2355
Lung adenocarcinoma (LUAD) is the most common histologic subtype of lung cancer and accounts for most cancer-related deaths (1, 2). Over the past decade, annual low-dose CT screening was endorsed in an effort to reduce lung cancer mortality (3). Since then, an increasing number of early-stage LUAD diagnoses have warranted the need for novel personalized early treatment strategies. This in turn heavily rests on improved understanding of molecular and cellular processes underlying early LUAD development.
Previous studies identified molecular alterations in histologically normal–appearing epithelial fields that are close to solid tumors including those of the lung and that are less prevalent or absent in relatively more distant (from the tumor) regions—suggesting geospatial heterogeneity in the uninvolved lung that is pertinent to development of a nearby tumor (4). Although these studies provided valuable insights into the spatial development of cancer from a particular niche in the lung, they have been guided mainly by bulk profiling approaches (4, 5). It is now appreciated that editing of the immune microenvironment toward protumor phenotypes including escape of immune surveillance portends the underlying biology, development, and progression of LUAD (5). Yet, the interplay between individual immune cell populations and other cell subsets in geospatial development of early-stage LUAD is not known. Technologies that profile tissues at single-cell resolution have permitted delineating the molecular and cellular complexity of tumor ecosystems. Single-cell sequencing technologies were used to chart the immune microenvironment in metastasis and therapeutic response of advanced lung cancers to targeted therapies (6–9). Yet, the complex spatial evolution of heterogeneous cellular populations and their interactions, and as an early-stage LUAD develops from the peripheral lung, remains largely unresolved.
Here, we sought to discern the spatial atlas of the peripheral lung and early-stage LUAD at single-cell resolution to better understand the topological architecture of LUAD evolution. We performed deep single-cell RNA sequencing (scRNA-seq) analysis of 19 spatial regions, including enriched epithelial populations, from 5 early-stage LUADs and 14 multiregion normal-appearing lung tissues with differential and defined spatial proximities from the tumors. Our study unravels tumor evolutionary trajectories as well as geospatial evolution in cell populations and their expression signatures that portray how early-stage LUAD develops from the lung ecosystem.
Single-Cell Spatial Landscape of Early-Stage LUAD
To begin to chart a comprehensive single-cell atlas of early-stage LUAD and the peripheral lung, we performed scRNA-seq on all cells from an early-stage LUAD (P1; Supplementary Table S1) as well as matched tumor-adjacent and relatively more distant normal lung tissues (Fig. 1A). Unsupervised clustering of 15,132 quality control–passed cells revealed cell clusters representing five major cellular lineages, namely, epithelial, endothelial, myeloid, lymphoid, and stromal cell subsets (Fig. 1B; Supplementary Fig. S1). Epithelial (EPCAM+) cell fractions were 3.7%, 5.4%, and 3.5% for tumor, tumor-adjacent, and tumor-distant normal samples, respectively, at an average of 4.2% and in line with previous studies (ref. 8; Fig. 1B and C).
To increase the throughput and to better capture patterns of cellular heterogeneity based on distance from LUADs, in particular within the epithelial lineage, we performed separate scRNA-seq analysis of epithelial (EPCAM+) and nonepithelial (EPCAM−) cells enriched from early-stage LUADs of four additional patients with early-stage LUAD (P2–P5; Methods; Supplementary Fig. S1; Supplementary Table S1), each with three matching normal lung tissues of defined spatial proximities to LUADs: tumor-adjacent, tumor-intermediate, and tumor-distant (19 samples and 35 scRNA-seq libraries from P1–P5; Fig. 1A). The spatial locations of multiregion normal tissues were carefully defined with respect to the tumor edge (see Methods), such that the samples span a spatial continuum and thus enable interrogation of geospatial relationships among early-stage LUAD and the peripheral lung tissues. A total of 186,916 cells, uniformly derived from all patients and sequencing batches, were retained for subsequent analyses with a median of 1,844 detected genes per cell (Supplementary Fig. S2A–S2C; Supplementary Table S1). Cells clustered into the above-mentioned five distinct lineages (Supplementary Fig. S2D), and clustering was deemed robust based on high ratio of overlapping cell memberships with independent clustering methods such as reciprocal principal component analysis (PCA) and k-means clustering approach (Supplementary Fig. S3A–S3C; Supplementary Methods).
We were able to profile samples markedly enriched with epithelial cells (37.6%, n = 70,030 epithelial cells, including 3,514 proliferating cells) in comparison with the unbiased approach in P1 (4.2%; Fig. 1D). Cells were adequately derived from all spatial samples, and their lineage compositions varied spatially across the LUAD and normal tissues (Fig. 1E). Proliferating epithelial cells were highly enriched in LUADs (40%) compared with normal tissues (P = 0.07; Supplementary Fig. S4A). We also noted higher transcriptome complexity (number of genes) in EPCAM+ fractions compared with EPCAM− fractions from P2–P5 (Supplementary Fig. S4A, right; Supplementary Table S1). This was also significantly evident among cells of P1 that were not subjected to enrichment by EPCAM-based cell sorting (Supplementary Fig. S4B–S4D; P < 0.01).
Next, we analyzed hierarchical relationships among major cell lineages and found that cells from all patient LUADs were transcriptomically distinct from their matched spatial normal counterparts (Fig. 1F; see Methods). Overall, cells from adjacent normal samples clustered more closely with those of the LUADs than with more distant normal tissues (e.g., epithelial and myeloid lineages; Fig. 1F, top dendrograms). This average pattern of transcriptomic similarity between tumor and adjacent normal samples was evident in specific patients (e.g., P2; Fig. 1F, bottom dendrograms). We then further classified lymphoid and myeloid lineages into major cell types (Fig. 1G; Supplementary Fig. S4E; Methods). Analysis of spatial cell composition revealed distinct topological gradients with greater tumor proximity, which were consistently evident across patients, as well as increased fractions of B cells and decreased abundance levels of natural killer (NK) cells specifically in the LUADs (Fig. 1H and I; Supplementary Table S2). These observations highlight geospatial transcriptomic heterogeneity in tumor microenvironment (TME) landscape of early-stage LUADs.
Spatial Diversity of Lung Epithelial Cells and Intratumoral Heterogeneity in LUAD
We next interrogated spatial epithelial features of the LUADs and multiregion normal tissues. The 70,030 epithelial cells derived from all samples clustered into 10 distinct epithelial cell types with a high degree of robustness (Fig. 2A; Supplementary Fig. S5A–S5C). These clusters represented alveolar type I (AT1; C2, AGER+), AT2 (C3; SFTPC+), basal (C4; KRT15+), bronchioalveolar (C5; SFTPC+/SCGB1A1+), ciliated (C6; PIFO+), and club/secretory (C7; BPIFB1+) cells (Fig. 2A and B; Supplementary Table S3). We also identified the recently described and rare ionocytes (C8; FOXI1+/CFTR+; refs. 10, 11), bipotent alveolar progenitors (C1; ref. 12), and unique cell states such as proliferating basal cells (C10; TOP2A+; Fig. 2A and B; Supplementary Table S3).
We noted a malignant-enriched cluster (C9) with cells of mixed lineages (8) from all patients, mostly their LUADs (Fig. 2A and B; Supplementary Fig. S6A and S6B). Interestingly, few cells from the normal tissues were also found in the C9 cluster (Fig. 2B; Supplementary Fig. S6A). To distinguish bona fide malignant cells from nonmalignant subsets, we used a strategy that infers copy-number variations (CNV) from scRNA-seq data in every epithelial cell, and we generated a CNV score to quantify their level of aneuploidy (ref. 13; see Supplementary Methods). Cells in C9 exhibited overall increased CNV scores as well as higher amplitudes of CNVs (Fig. 2A, right), thereby supporting the overall malignant assignment of this cluster. Although cells from the LUADs predominantly resided in C9, a fraction (29%) was transcriptomically and genotypically (e.g., reduced copy-number profiles) similar to basal cells (17%; Supplementary Fig. S6C; Supplementary Table S4). Interestingly, basal cells showed an indication of enrichment in all LUADs compared with their normal samples (P = 0.09; Supplementary Fig. S6B and S6D; Supplementary Table S5). We also noted pronounced steady-state enrichment or depletion of epithelial subsets with spatial proximity to the tumors (Fig. 2C). Relative to cells from tumor-intermediate or tumor-distant normal sites, cells from tumor-adjacent normal tissues were, overall, more transcriptomically similar to (clustered closely with) those from the LUADs (Fig. 2D).
Alveolar differentiation hierarchies have been shown to partake in lung tumor development in vivo (12, 14, 15). In our cohort, alveolar cells with definitive lineage features (e.g., AT1, AT2, and alveolar progenitors) were depleted in LUAD tissues (Fig. 2C), which prompted us to leverage the relatively large number of alveolar cells sequenced to dissect potential alveolar differentiation trajectories. Pseudotemporal ordering of alveolar cells revealed a developmental hierarchy that was initiated by AT2 cells and that followed a main trajectory of differentiation into AT1 cells (Supplementary Fig. S7A and S7B) in close agreement with previous studies in mice (12, 14, 15). Given the reported role of NOTCH in AT2-to-AT1 differentiation and alveolar repair (16), we interrogated a NOTCH signaling score that we found to be increased along the alveolar differentiation trajectory (Supplementary Fig. S7C and S7D).
To further investigate malignant programs, we performed subclustering of cells from malignant-enriched cluster C9 (n = 10,667) while overlaying CNV scores, which separated likely malignant cells from subsets derived from normal tissues (Fig. 2E; Supplementary Fig. S8A–S8E; Supplementary Table S6). We noted low CNV scores (Supplementary Fig. S8A; Fig. 2E, right) in cells of the malignant-enriched cluster in the LUAD of P2. We thus interrogated the presence of KRAS codon 12 driver mutations that reside within genomic brackets captured in our 5′ single-cell sequencing design (see Supplementary Methods). Interestingly, among all P2 epithelial subsets, 29% of C9 cells (160 of 547) harbored the KRASG12D mutation (Fig. 2E, right, and 2F) and were exclusively derived from the tumor sample. Compared with KRAS wild-type cells from the same LUAD, cells harboring KRASG12D mutation exhibited distinctively high expression of tumor markers (e.g., CEACAM5; Supplementary Fig. S8B), increased expression of MUC5AC (Fig. 2F) and LCN2, as well as reduced expression of NKX2-1 (Supplementary Fig. S8B; Supplementary Table S7), altogether suggestive of mucinous differentiation (17, 18) and in line with the histologic (mucinous) pattern of this tumor (Supplementary Table S1). These findings underscore spatial heterogeneity patterns in KRAS driver mutation and cellular lineage that are likely unique to the ecosystem of KRAS-mutant LUAD. Additional clustering of tumor-derived C9 cells by patient underscored transcriptomic features that were shared between two or more LUADs (e.g., increased CEACAM5 or CEACAM6; Supplementary Fig. S8B–S8E). We also noted patient/LUAD-specific transcriptomic features such as enrichment of club and secretory (P2) or AT2 (P5) canonical markers, thus potentially signifying distinct cells of origin among the LUADs (Supplementary Fig. S8B and S8E).
Unlike P2, C9 cells in P3 and P5 were almost exclusively derived from the LUAD tissues (Fig. 2E). In P3 LUAD, we identified large-scale chromosomal alterations (Fig. 2G left; Supplementary Fig. S9A; Supplementary Table S6), based on which unsupervised clustering analysis revealed four clusters with differential CNV profiles. Among them, three clusters (C2, C3, and C4) exhibited pronounced CNVs that were indicative of malignant cell features (Fig. 2G, left). We found an additional CNV event (i.e., gain of 1p) unique to cells of cluster C4 but not C2 or C3, possibly signifying a late event in the evolutionary trajectory of P3 LUAD. We also observed a branched trajectory that started with cells of C2 and C3 and comprised few “normal cells” with club/secretory lineage, and that later branched into cells of CNV cluster C4 (Fig. 2H, top)—suggesting that C4 evolved from C2 and C3 and that P3 LUAD perhaps originated from club/secretory cells. This is consistent with increased expression of club/secretory canonical markers (e.g., TFF3, HP, and MUC4) in P3 tumor clones with also high CNV scores (C0 and C2; Supplementary Fig. S8C). P5 LUAD comprised seven distinct CNV clusters (Fig. 2G, right; Supplementary Fig. S9B). C1 harbored cells with increased CNV events, including events with higher amplitude, whereas C4 clustered the closest to C3, which mostly comprised nonmalignant cells (Fig. 2G, right). Pseudotime analysis revealed a C4-to-C1 path which, in contrast to P3, was unbranched, suggesting that C4 and C1 in P5 likely comprised malignant cells from early and late developmental states, respectively (Fig. 2H, bottom). Overall, hierarchical clustering analysis was consistent with phylogenetic reconstruction of tumor clonal architecture using inferred large-scale CNVs (r = 0.7 in P3; Supplementary Fig. S10). Our single-cell interrogation of a large number of epithelial cells from multiregion tissues identified diverse epithelial identities, malignant trajectories, as well as high-resolution intratumor cell heterogeneity.
Lymphoid Reprogramming toward a Protumor Microenvironment
We further characterized lymphoid spatial dynamics (Fig. 1H and I) and states (n = 53,882 cells; see Methods). Following clustering, we found 10 transcriptomically distinct lymphoid cell types/states that were uniformly derived from all sequencing batches and patients (Fig. 3A and B; Supplementary Fig. S11A and S11B; Supplementary Table S8). Lymphoid clusters were overall spatially modulated by tumor proximity (Fig. 3C). Relative to normal tissues, LUADs were heavily enriched with plasma (SDC1+/MZB1+), B (CD19+/CD22+), and T regulatory (Treg; FOXP3+) cells (Fig. 3C; Supplementary Fig. S11C). With increasing tumor proximity, we noted a gradual decrease in NK cells (GNLY+), innate lymphoid cells (ILC), both GZMAhi and GNLYhi CD4+ cytotoxic T lymphocytes (CTL; CD40LG+, BATF+), and GNLYhi CD8+ CTLs, all of which were, overall, depleted in the LUADs (Fig. 3C; Supplementary Fig. S11C).
We further performed subclustering analysis of CD8+ T cells, which identified three robust clusters: naïve, GZMKhi, and GNLYhi subpopulations with differentially expressed cell-state signatures (Fig. 3D; Supplementary Fig. S12; Supplementary Tables S9 and S10). Consistently, naïve CD8+ T cells showed high naïve and low cytotoxic T-cell scores and were composed of cells across all samples in the LUAD space (Fig. 3D; Supplementary Fig. S13A). In contrast, GNLYhi CD8+ CTLs exhibited high levels of cytotoxicity genes (TBX21, KLF3, FCGR3A, KLRG1, and KLRB1) and scores and were depleted in the LUAD samples (Fig. 3C–E). We observed a significant spatial pattern of reduced cytotoxic activity in P3 and P4 (Supplementary Fig. S13B left and right, respectively; P < 0.05). Overall, CD8+ CTLs showed significant and spatially modulated reduction in cytotoxicity signature score (P < 0.001) and decreased expression of major cytotoxic genes such as NKG7 and GNLY (P < 0.001; Fig. 3E), with these features all being lowest in the LUADs (P < 0.001).
Spatial analysis of CD4+ T-cell states (Supplementary Tables S10 and S11) showed that LUAD tissues were specifically enriched with FOXP3+ Tregs (Fig. 3F and G; Supplementary Table S12), and the Treg signature scores were significantly and spatially increased with proximity to all LUADs (Supplementary Fig. S13C, left; P < 0.01) and in each of P3 and P5 (Supplementary Fig. S13C, middle and right, respectively; P < 0.05). The Tregs also expressed high levels of protumor immune checkpoints including TIGIT, CTLA4, LAG3, or PDCD1 (Fig. 3G). The fraction of Tregs coexpressing both CTLA4 and TIGIT immune checkpoints was progressively higher along the spectrum of distant normal sites to more adjacent-to-tumor regions up to the LUADs (Fig. 3G, bottom). In contrast, we noted a reduction of cytotoxic CD4+ CTLs characterized by high expression of GZMA (P < 0.05), or coexpression of GZMA and GZMH, with increasing proximity to all LUADs (Supplementary Fig. S13D–S13F).
We further examined the spatial enrichment of LUADs with plasma and B cells (Fig. 3C; Supplementary Table S12). We found spatial changes in plasma cell isotype switching, such as increased IGHA1/2 and decreased IGHG1/3, with increasing proximity to P3 and P5 LUADs (Supplementary Fig. S13G–S13I; Supplementary Table S13). Smoker patients (P2 and P3) harbored strikingly high plasma cell fractions relative to nonsmokers (P1 and P5; Supplementary Fig. S13J). We confirmed the increased plasma fractions in smoker LUADs following analysis of The Cancer Genome Atlas (TCGA) cohort (P < 0.001; Supplementary Fig. S13K). We also identified three distinct subsets of B-cell states (Supplementary Table S14), including a LUAD-enriched subcluster (C0) with high expression levels of RAC2+ and ACTG+ (Supplementary Fig. S14A–S14C), known to play key roles in synapse formation in B cells (19). The B-cell signature (C0) was progressively increased across atypical adenomatous hyperplasias (AAH), the preneoplastic precursors of LUAD, and invasive LUADs compared with matched normal lung tissues (Supplementary Fig. S14D). The B-cell signature was also associated with prolonged overall survival (OS) and progression-free interval (PFI) in patients with treatment-naïve LUAD from TCGA (P = 0.0001, OS; P = 0.02, PFI; Supplementary Fig. S15A) and in-house (MDACC; P = 0.2, OS; P = 0.03, PFI; Supplementary Fig. S15B) cohorts. Our analyses identify spatial properties in lymphoid cell states that may underlie protumor immune remodeling in early-stage LUAD.
Depletion of Antigen-Presenting Macrophages and Inflammatory Dendritic Cells (DC) in Early-Stage LUAD
Spatial myeloid patterns (Fig. 1H and I) in LUAD space prompted us to further investigate myeloid subsets and cellular states (Fig. 4). In total, 45,803 myeloid cells from all sequencing batches and patients analyzed in this study (Supplementary Fig. S16A and S16B) clustered into 13 distinct subsets: classic monocytes (S100A8+, S100A9+), nonclassic monocytes (CDKN1C+), mast cells (MS4A2+), neutrophils (IL1A+), M2-like macrophages C1 (TREM2+), M2-like macrophages C5 (CD163+), alveolar macrophages (MARCO+), classic DC1 (cDC1; CLEC9A+), cDC2 (CLEC10A+), plasmacytoid DC (pDC; PLD4+), other DCs (CCL22+), and proliferating myeloid cells (TOP2A+; Fig. 4A and B; Supplementary Tables S15 and S16). Myeloid clusters were derived from tumor and all normal spatial samples with varying proportions (Supplementary Fig. S16C). M2-like macrophages C5, monocytes (classic and nonclassic), and mast cells were gradually depleted with increasing tumor proximity, whereas M2-like macrophages C1, proliferating myeloid subsets, and cDC2 cells were steadily enriched in the tumors (Fig. 4A and C; Supplementary Fig. S16D).
We next performed subclustering analysis of monocytes and macrophages (n = 27,664 cells), which identified five distinct subclusters and confirmed the unique enrichment of M2-like macrophages C1 in the LUAD tissues (Fig. 4D; Supplementary Fig. S17). Furthermore, we found that C1 M2-like macrophages showed significantly diminished (P < 0.001) antigen presentation scores compared with C5 M2-like macrophages which were mainly enriched in normal samples (Fig. 4D; Supplementary Fig. S18A; Supplementary Table S17). We found markedly reduced expression levels of antigen presentation signature genes with increasing spatial proximity to the LUADs (Fig. 4E and F; Supplementary Fig. S18B). The spatial pattern of antigen presentation depletion was evident across M2-like macrophages combined from both clusters C1 and C5 and was statistically significant in four of the five patients with LUAD (Fig. 4G; P < 0.001).
We examined gene and signature score differences between different subsets of DCs (n = 8,694). Spatial patterns were evident in cDC2 and pDC subsets (Fig. 4H). We observed differential expression of an inflammatory gene signature between the three cDC2 subclusters with C1, cells of which exhibited the lowest inflammatory scores, heavily enriched in the LUADs (Fig. 4I and J; Supplementary Table S18). Unsupervised subclustering of cDC2 cells identified three distinct cDC2 subsets characterized by differential expression of an inflammatory signature (highly enriched in C2) and MHC class II genes (enriched in C0/C1) previously shown to discriminate inflammatory from noninflammatory cDCs (ref. 20; Supplementary Fig. S18C). Reduced expression of proinflammatory genes and increased levels of anti-inflammatory features were evident among cDC2 cells and along the continuum of normal-to-LUAD space (Fig. 4K). cDC2 subcluster with the highest inflammatory score (C2) was further characterized by an inflammatory signature score that was spatially and significantly attenuated with increasing tumor proximity (P < 0.001; Fig. 4L), and the fraction of C2 among the cDC2 subset was overall underrepresented in the LUADs (P < 0.05; Fig. 4M). Notably, the inflammatory signature score significantly and progressively decreased along the continuum from normal lung tissues, to matched premalignant AAHs and invasive LUADs (P < 0.01; Fig. 4N), in sharp contrast to noninflammatory DC expression components (Supplementary Fig. S18D). We also studied pDC subsets and found spatial enrichment of FOS, FOSB, and JUN, genes involved in regulating DC immunogenicity (21), with increasing proximity to the tumors (Supplementary Fig. S18E and S18F). We found that a relatively higher score of noninflammatory to inflammatory expression in cDC2 was associated with prolonged OS and PFI in the TCGA LUAD cohort (P = 0.009 and P = 0.04, respectively) with similar trends observed using an in-house set (MDACC; P = 0.2 for OS; Supplementary Fig. S19A and S19B). Altogether, these data underscore spatial immune remodeling in early-stage LUAD that comprises loss of antigen presentation by macrophage subsets and inflammatory phenotypes by DCs.
Stromal cells have been shown to affect diverse aspects of tumor immune microenvironment and cancer progression (11, 22). Intrigued by the enrichment of stromal subsets in the LUADs (Fig. 1H), we further interrogated multiple stromal populations in our data set (Supplementary Fig. S20A–S20C), including tumor enrichment of vascular smooth muscle cells, adventitial fibroblasts C5, and endothelial cell (EC) venule clusters (Supplementary Fig. S20D and S20E). We pinpointed significantly differentially expressed gene sets in the EC venule subpopulation and altered stromal signatures (Supplementary Fig. S20F) that support the observed immune-related changes in the LUAD space, and that are in line with previous observations (11). These comprised tumor-specific activation of extracellular matrix reorganization, syndecan-2 pathway, and neutrophil degranulation, as well as decreased JAK–STAT signaling and reduced antigen-processing and cross-presentation (Supplementary Fig. S20G and S20H).
Ligand–Receptor Cell–Cell Cross-talk in Early-Stage LUAD
Cross-talk between tumor cells and elements in the TME is implicated in tumor progression largely in part by mediating immunosuppressive phenotypes (23). We utilized iTALK (24) to leverage signals from our scRNA-seq data set and visualize ligand–receptor (L–R)-mediated intercellular cross-talk (Fig. 5A; Supplementary Table S19). Computational analysis and annotation were carried out using iTALK's built-in database focusing on immune checkpoint receptor pairs (n = 55) and cytokine–receptor pairs (n = 327; Fig. 5A). Overall, we found reduced overlap of L–R interactions between the tumor and distant normal tissues compared with that between the tumor and more proximal (adjacent, intermediate) regions (Fig. 5B). We identified altered cellular interactions that were significantly and differentially modulated in LUADs versus their respective spatial normal tissues (Fig. 5C–F; Supplementary Fig. S21A). Among cytokine–receptor pairs, LUADs showed increased communication between CX3CL1+ tumor epithelial cells and DCs or macrophages expressing increased levels of its cognate receptor CX3CR1 (Supplementary Fig. S21B; Supplementary Table S20). CX3CR1 was increasingly expressed on macrophages and DCs but decreased on CD8 T cells from LUADs (Supplementary Fig. S21C), in line with previous reports demonstrating protumor features for CX3CR1+ macrophages (25).
Notably, we identified increased interactions between immune checkpoint proteins CD24 or LGALS9 (Galectin-9) on tumor epithelial cells, and SIGLEC10 on macrophages or HAVCR2 (TIM3) on DCs, respectively, and which were shared across multiple patients (Fig. 5C–F; Supplementary Table S20). These interactions were differentially enriched in tumors versus normal tissues at different distances from the LUADs (Fig. 5G and H; Supplementary Fig. S6A). CD24 and LGALS9 expression levels were overall increased in epithelial cells from LUAD tissues, particularly in cells of the malignant-enriched cluster (Fig. 5G and H; Supplementary Fig. S22A). This pattern of enrichment of CD24 in cells of the malignant-enriched cluster was also observed in the majority of patients analyzed (Supplementary Fig. S22B). Notably, CD24 expression was also markedly evident in B cells (Supplementary Fig. S22A and S22B) in line with previous reports (26). These findings demonstrate that the early-stage LUAD ecosystem harbors cell–cell communication that confers increased protumor inflammatory and immunosuppressive states.
Our findings above prompted us to validate CD24 expression patterns using external cohorts. We found progressively and markedly increased expression of the antigen across normal lung tissues, AAHs, and LUADs (Fig. 6A; P < 0.05). CD24 positively correlated with expression of the epithelial marker EPCAM as well as with levels of protumor and immune-suppressive features (TIGIT, CTLA4, FOXP3, and CCL19), in contrast to negatively correlating with antitumor immune markers (GZMB, GZMH, and PRF1; Fig. 6B; Supplementary Fig. S23A). These observations were, for the most part, further validated in LUADs and matched normal tissues from the TCGA cohort (ref. 27; Fig. 6C and D; Supplementary Fig. S23B). Using an in-house cohort of early-stage LUADs analyzed by targeted immune profiling (see Supplementary Methods), we found that relatively higher CD24 was associated with shortened OS and PFI (Fig. 6E; P = 0.07). Consistent with the above, CD24 expression positively correlated with that of EPCAM (r = 0.31) and negatively correlated with both PRF1 (r = −0.37) and immune cytotoxicity score (ref. 28; r = −0.33;Fig. 6F). Further, analysis of an independent in-house tissue microarray of treatment-naïve LUADs revealed that CD24 IHC protein expression was prevalent in tumor cells (Fig. 6G, left) and, in line with the above, was also associated with reduced OS (P = 0.04) and PFI (P = 0.007; Fig. 6G, right). We further found in a syngeneic mouse model that loss of Cd24a by CRISPR-mediated knockout or its inhibition using neutralizing antibodies significantly reduced the growth of mouse LUAD cells in vivo (Supplementary Methods; Supplementary Fig. S24; Fig. 6H). These findings show that CD24, which we found to be at the core of an enriched cell–cell interactome in early-stage LUAD, is associated with a protumor immune contexture and poor prognosis as well as promotes LUAD growth in vivo.
Molecular changes have been documented in the local niche of LUAD including loss of heterozygosity in 3p and 9p, point mutations, and tumor suppressor methylation (29, 30). Earlier work also underscored transcriptome profiles, somatic driver variants, as well as genome-wide allelic imbalance events that are shared between lung cancer and adjacent normal-appearing airway cells but that are absent in distant normal cells, thereby pointing to putative drivers of lung oncogenesis (4, 5). These earlier studies have focused on understanding geospatial profiles by bulk profiling methods, thereby inadvertently obscuring the individual contributions of epithelial and TME cues to lung cancer pathogenesis. Our knowledge of the geospatial architecture of individual cell populations in early-stage LUAD evolution remains poorly understood. By single-cell interrogation of a unique multiregion sampling design with epithelial cell enrichment, we here characterized spatial and ecological maps comprising various epithelial and nonepithelial subsets that underlie emergence of early-stage LUAD from its local niche.
Multiregional or spatial analyses have been used to interrogate intratumor heterogeneity (ITH) in solid tumors, including LUADs, to understand evolutionary trajectories and therapy response (31). Our analyses showed that ITH is evident at both the tumor epithelial cell and intrasite levels, i.e., within the same tumor region. We applied an integrative approach to dissect ITH of likely malignant cells and characterized cell clusters with differential transcriptomic profiles, evolutionary trajectories, CNV burdens, and/or driver mutations. We also found “normal” cells in the LUAD tissues themselves that are close in the inferred trajectory paths to specific malignant-enriched subsets, possibly representing tumor cells of origin. Normal basal cells (expressed KRT5, KRT15, KRT17, and P63) were consistently evident in the LUAD samples, possibly corroborating previously reported basal differentiation hierarchies in the normal lung (32, 33). It is noteworthy that we found, in the normal-appearing samples, cells with features of malignant-enriched subsets and heterogeneous CNV profiles. Whether these cells comprise early LUAD precursors, mutagenic clones that do not progress to malignancy, or putative molecular field cancerization remains to be investigated (34).
LUADs exhibit remarkable interpatient heterogeneity in histologic differentiation patterns, driver alterations, and inferred tumor cells of origin (35). In our cohort, we identified malignant gene-expression features that varied between the patients and that pointed to likely distinct tumor cells of origin and/or LUAD histopathologic patterns. We pinpointed G12D mutations in KRAS to a unique subset of the malignant-enriched cluster within P2 LUAD. These cells had overall low CNVs, perhaps reminiscent of findings in LUADs driven by strong driver genes in vivo (36). These cells exhibited increased expression of genes associated with KRAS-mutant cancer such as LCN2 (a marker of inflammation; ref. 17) and reduced expression of the lineage-specific oncogene NKX2–1 (18), in line with expression patterns in mucinous KRAS-mutant LUADs (e.g., P2). Compared with other C9 (malignant-enriched cluster) cells from the same LUAD (P2), KRAS-mutant cells showed reduced expression of airway lineage-specific genes (e.g., SCGB3A1 and SFTPB) in accordance with recent reports (8). Our findings also allude to the possibility that tumor cell lineage plasticity may occur at an early-stage of KRAS-mutant LUAD carcinogenesis—a supposition that warrants exploration of a larger and more diverse repertoire of KRAS-mutant cells. Of note, we could not characterize the full spectrum of ITH, clonality, and interpatient heterogeneity given the limited number of patients profiled and the use of scRNA-seq, which is not ideal for examining tumor clonality or genomic alterations. Nonetheless, our in-depth analysis of a relatively large number of epithelial cells unveiled different characteristics (airway lineages, malignant programs, potential tumor cells of origin, and cellular ITH) of the epithelial architecture of early-stage LUAD. These insights could be further extended to profile a larger and more diverse array of LUADs using anticipated advances enabling simultaneous scRNA-seq and scDNA-seq of the same cell.
Earlier studies have shown that immunosuppressive Tregs are crucial for immune evasion in lung cancer (37, 38). We found Tregs coexpressing both TIGIT and CTLA4 immune checkpoints that were progressively enriched with increasing geospatial proximity to the LUADs—suggesting a value in combinatorial targeting of multiple checkpoints for immunotherapy of early-stage LUAD (39). scRNA-seq analysis of our limited patient cohort also pinpointed important attributes to other lymphoid populations. We identified B-cell signatures that are spatially enriched in the LUADs, progressively increased along the course of normal to preneoplasia and invasive LUAD, and associated with prolonged survival. These data suggest important yet unexplored roles for B-cell phenotypes in immune evolution of LUAD (40). We also noted strikingly high plasma cell fractions among LUADs from smokers, suggesting that plasma cells may play important roles in the pathogenesis of smoking-associated LUAD and its microenvironment (41). Our analysis also pointed to mechanisms by which the myeloid immune microenvironment permits LUAD pathogenesis. Tumor-specific M2-like macrophages displayed diminished antigen presentation, whereby expression levels of MHC genes as well as genes involved in peptide transport and loading (TAP1, TAP2, TAPBP; ref. 42) were markedly reduced. We also found DC subclusters and signatures that were recently reported in healthy donors and in cancer (20, 43). These included a shift from an inflammatory to a noninflammatory DC state across normal lung to premalignant AAH up to LUAD and that was associated with prolonged survival in patients—thus pointing to immune cues that could perhaps be harnessed to manipulate the immunogenicity of tumors (43). We also identified an inflammatory tumor-depleted cDC cluster with increased expression of scavengers such as CD36. Dendritic CD36 permits acquisition and presentation of cell-surface antigens (44), and the precise effects of its stark absence in a LUAD-specific cDC subset on the tumor immune microenvironment warrant further investigation. Nevertheless, these findings suggest that unique cDC subsets play critical roles in LUAD pathogenesis and thus could be potential targets for immune-based interception.
Deciphering ligand–receptor-mediated interactions can elucidate cell-to-cell communication implicated during carcinogenesis, tumor immune coevolution, and immune reprogramming, and thus may help identify potential immune therapeutic targets (23). In this study, we applied the iTALK tool (24) developed by our group and performed a deep analysis of cellular interaction networks. We identified significant immune checkpoint receptor (e.g., CD24, Galectin-9, or TIM3) and cytokine receptor (e.g., CX3CR1) interactions whose enrichment or depletion in the LUAD space signified a highly protumorigenic milieu. These findings are in accordance with the CD24–SIGLEC10 interaction and subsequent “do not eat me” signal recently highlighted in breast cancer (26). In addition, our findings on CD24 including prominent expression in tumor epithelium, association with protumor immune phenotypes and reduced survival, and functional role in LUAD in vivo, suggest that CD24 may be a viable target for treatment of early-stage LUAD. Overall, our interactome analyses implicate potential targets as culprits in LUAD pathogenesis.
In summary, our results provide a spatial atlas of early-stage LUAD and its nearby and distant lung ecosystem. This atlas comprises high cellular heterogeneity as well as spatial dynamics in cell populations, cell states, and their transcriptomic features that underlie LUAD evolution from the peripheral lung ecosystem. Our extensive transcriptomic data set of lung epithelial and immune cells, other populations such as stromal and endothelial subsets, as well as of tumor-pertinent cell–cell interactions, constitutes a valuable resource to functionally interrogate LUAD trajectories at high resolution and generate strategies for its early treatment. Also, our study's multiregion sampling design in conjunction with single-cell analysis could help address specific questions in early malignant and immune biology of other solid tumors.
Additional description of methods can be found in the Supplementary Data file.
Multiregion Sampling of Surgically Resected LUADs and Normal Lung Tissues
Study subjects were evaluated at MD Anderson Cancer Center and underwent standard-of-care surgical resection of early-stage LUAD (I–IIIA). All samples in the study were obtained under waiver of consent from banked or residual tissues approved by MD Anderson institutional review board protocols. Residual surgical specimens were then used for derivation of multiregion samples for single-cell analysis (Supplementary Table S1). Immediately following surgery, resected tissues were processed by an experienced pathologist assistant (P.M. Brennan). One side of the specimen was documented and measured, followed by tumor margin identification. On the basis of the placement of the tumor within the specimen, incisions were made at defined collection sites in one direction along the length of the specimen and spanning the entire lobe: tumor-adjacent and tumor-distant normal parenchyma at 0.5 cm from the tumor edge and from the periphery of the overall specimen/lobe, respectively. An additional tumor-intermediate normal tissue was selected for P2–P5 that ranged between 3 and 5 cm from the edge of the tumor. Sample collection was initiated at normal lung tissues that are farthest from the tumor moving inward toward the tumor to minimize cross-contamination during collection.
Tumor and spatial normal parenchyma tissues (n = 19 samples) were immediately transported on ice for mincing and enzymatic digestion (Supplementary Methods). Cells were sorted (by FACS) for viable singlet cells (and also EPCAM ± fractions from P2–P5), followed by processing for scRNA-seq library construction using 10X Genomics and sequencing using Novaseq6000 platforms. Single-cell analyses were performed using available computational framework. Raw scRNA-seq data were preprocessed, demultiplexed, aligned to human GRCh38 reference, and feature-barcodes generated using CellRanger (10X Genomics, version 3.0.2). Details of quality control including quality check, data filtering, as well as identification and removal of cellular debris, doublets, and multiplets are found in Supplementary Methods. Following quality filtering, a total of 186,916 cells were retained for downstream analysis. Raw unique molecular identifier counts were log normalized and used for PCA using Seurat (45). Batch effects were statistically assessed using k-BET (46) and corrected by Harmony (ref. 47; Supplementary Methods), followed by unsupervised clustering analysis using Seurat v3 (45). Uniform manifold approximation and projection (UMAP; ref. 48) was used for visualization of clusters. Clustering robustness was examined for major clusters and subclusters (see Supplementary Methods), and all subclustering analyses were performed separately for each compartment. Differentially expressed genes (DEG) for each cell cluster were identified using the FindAllMarkers function in Seurat R package. We defined cell types and cluster functional states by integrating the enrichment of canonical marker genes, top-ranked DEGs in each cell cluster, and the global cluster distribution.
To study hierarchical relationships among cell types, pairwise Spearman correlations were calculated from average expression levels (Seurat function AverageExpression), based on which Euclidean distances were calculated. Monocle 2 (version 2.10.1; ref. 49) was applied to construct trajectories. Likely malignant cells were distinguished from nonmalignant subsets based on information integrated from multiple sources including cluster distribution of the cells, gene expression, as well as presence of mutations (KRAS in P2; see Supplementary Methods) and CNVs. CNVs were inferred using inferCNV tool (13) with NK cells as the control. To evaluate the robustness of CNV inference, T and B cells were also used as controls. To identify significant ligand–receptor pairs among major cell lineages, the top 30% of most highly expressed genes were included in the analysis. Significant cellular interactions were identified using iTALK as previously described (24). For ligand–receptor annotation, the iTALK built-in ligand–receptor database was used (https://github.com/Coolgenome/iTALK).
All statistical analyses were performed using R package v3.6.0. Pseudo-bulk gene-expression values for defined cell clusters were calculated by taking mean expression of each gene across all cells in a specific cluster. Pearson correlation analysis was used to identify genes significantly correlated with CD24 expression. Log-rank tests and Kaplan–Meier plots were used for survival analysis. All statistical significance testing was two-sided, and results were considered statistically significant at P < 0.05. The Benjamini–Hochberg method was applied to control the false discovery rate (FDR) and to calculate adjusted P values (q-values).
Animal experiments were conducted according to Institutional Animal Care and Use Committee–approved protocols of MD Anderson Cancer Center. Mice were maintained in a pathogen-free animal facility and were continually monitored to ensure adequate body condition scores, less than 2 cm3 tumor volumes, and less than 50% ulceration.
All sequencing data generated in this study are deposited in the European Genome–phenome Archive (EGAS00001005021).
D.Y. Duose reports other support from Chrysalis Biomedical Advisors outside the submitted work. J. Zhang reports grants from Merck, grants and personal fees from Johnson and Johnson, personal fees from BMS, AstraZeneca, Roche, Geneplus, OrigMed, and Innovent during the conduct of the study. B. Sepesi reports personal fees from Bristol-Myers Squibb and AstraZeneca outside the submitted work. T. Cascone reports personal fees and other support from MedImmune/AstraZeneca, Bristol-Myers Squibb, EMD Serono, and other support from Boehringer-Ingelheim outside the submitted work. L.A. Byers reports grants and personal fees from AstraZeneca, GenMab, Sierra Oncology, grants from ToleroPharmaceuticals, and personal fees from Bristol-Myers Squibb, Alethia, Merck, Pfizer, PharmaMar, AbbVie, Jazz Pharmaceuticals, Genentech, Debiopharm Group outside the submitted work. D.L. Gibbons reports grants from Takeda, Janssen R&D, grants and personal fees from Ribon Therapeutics, Atellas, grants from AstraZeneca, personal fees from Sanofi, Lilly, and Menarini outside the submitted work. E.J. Ostrin reports grants from American Lung Association and personal fees from AstraZeneca outside the submitted work. J.V. Heymach reports grants and other support from AstraZeneca, other support from Boehringer-Ingelheim, Bristol-Myers Squibb, Merck, Catalyst, Genentech, Guardant Health, Foundation Medicine, Hengrui Therapeutics, Eli Lilly, Novartis, EMD Serono, Sanofi, BrightPath Therapeutics, Takeda, Janssen, Kairos Venture Investments, and Leads Biolabs, grants from GlaxoSmithKline, grants, personal fees, and other support from Spectrum outside the submitted work. S.M. Dubinett reports other support from EarlyDiagnostics, Johnson & Johnson Lung Cancer Initiative, LungLife AI, Inc., and T-Cure Bioscience, Inc. outside the submitted work. J. Fujimoto reports other support from Sakura Finetek Japan and ReasonWhy Inc. outside the submitted work. I.I. Wistuba reports grants and personal fees from Bayer, AstraZeneca, Pfizer, Merck, Guardant Health, and HTG Molecular, personal fees from Roche, Bristol-Myers Squibb, GlaxoSmithKline, Oncocyte, MSD, grants from Genentech, DepArray, Adaptive, Adaptimmune, EMD Serono, Pfizer, Takeda, Amgen, Karus, Johnson & Johnson, Iovance, 4D, Novartis, and Akoya outside the submitted work. C.S. Stevenson reports other support from Johnson and Johnson during the conduct of the study; other support from Johnson and Johnson outside the submitted work. A. Spira reports personal fees from JNJ during the conduct of the study; personal fees from JNJ outside the submitted work. H. Kadara reports grants from Johnson and Johnson during the conduct of the study; grants from Johnson and Johnson outside the submitted work. No disclosures were reported by the other authors.
A. Sinjab: Data curation, formal analysis, validation, investigation, visualization, methodology, writing–original draft, writing–review and editing. G. Han: Data curation, software, formal analysis, validation, investigation, visualization, methodology, writing–original draft, writing–review and editing. W. Treekitkarnmongkol: Data curation, formal analysis, validation, and methodology. K. Hara: Data curation, formal analysis, validation, and methodology. P.M. Brennan: Resources, data curation, and methodology. M. Dang: Resources, software, visualization, and methodology. D. Hao: Resources, software, visualization, and methodology. R. Wang: Resources, software, visualization, and methodology. E. Dai: Resources, software, visualization, and methodology. H. Dejima: Data curation, formal analysis, and investigation. J. Zhang: Data curation, formal analysis, and investigation. E. Bogatenkova: Resources, investigation, and methodology. B. Sanchez-Espiridion: Resources, investigation, and project administration. K. Chang: Formal analysis and investigation. D.R. Little: Formal analysis, investigation, and methodology. S. Bazzi: Formal analysis and investigation. L.M. Tran: Visualization, writing–review and editing. K. Krysan: Visualization, writing–review and editing. C. Behrens: Resources and data curation. D.Y. Duose: Data curation and methodology. E.R. Parra: Data curation, writing–review and editing. M. Raso: Data curation and formal analysis. L.M. Solis: Data curation and formal analysis. J. Fukuoka: Resources, data curation, and investigation. J. Zhang: Investigation, writing–review and editing. B. Sepesi: Resources, investigation, writing–review and editing. T. Cascone: Resources, investigation, writing–review and editing. L.A. Byers: Resources, investigation, writing–review and editing. D.L. Gibbons: Resources, investigation, writing–review and editing. J. Chen: Investigation, methodology, writing–review and editing. S.J. Moghaddam: Investigation, visualization, writing–review and editing. E.J. Ostrin: Investigation, visualization, writing–review and editing. D. Rosen: Resources, investigation, writing–review and editing. J.V. Heymach: Resources, investigation, writing–review and editing. P.A. Scheet: Investigation, visualization, writing–review and editing. S.M. Dubinett: Resources, investigation, writing–review and editing. J. Fujimoto: Resources, data curation, investigation, methodology, writing–review and editing. I.I. Wistuba: Resources, data curation, investigation, methodology, writing–review and editing. C.S. Stevenson: Resources, supervision, investigation, project administration, writing–review and editing. A. Spira: Resources, supervision, validation, investigation, project administration, writing–review and editing. L. Wang: Conceptualization, resources, data curation, software, formal analysis, supervision, funding acquisition, investigation, methodology, writing–original draft, project administration, writing–review and editing. H. Kadara: Conceptualization, resources, data curation, formal analysis, supervision, funding acquisition, validation, investigation, methodology, writing–original draft, project administration, writing–review and editing.
This study was supported in part by research funding from Johnson and Johnson, NCI grants R01CA205608 (to H. Kadara) and 1U2CCA233238 (to H. Kadara, A. Spira, and S.M. Dubinett), University of Texas SPORE in Lung Cancer P50CA070907, NCI P50 core grant CA016672, NIH grant 1S10OD024977, Cancer Prevention and Research Institute of Texas grants RP150079 (to H. Kadara) and RP160668 (to I.I. Wistuba), and start-up research funds provided to L. Wang by UT MD Anderson Cancer Center.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.