Tumor-infiltrating B and plasma cells (TIB) are prevalent in lung adenocarcinoma (LUAD); however, they are poorly characterized. We performed paired single-cell RNA and B-cell receptor (BCR) sequencing of 16 early-stage LUADs and 47 matching multiregion normal tissues. By integrative analysis of ∼50,000 TIBs, we define 12 TIB subsets in the LUAD and adjacent normal ecosystems and demonstrate extensive remodeling of TIBs in LUADs. Memory B cells and plasma cells (PC) were highly enriched in tumor tissues with more differentiated states and increased frequencies of somatic hypermutation. Smokers exhibited markedly elevated PCs and PCs with distinct differentiation trajectories. BCR clonotype diversity increased but clonality decreased in LUADs, smokers, and with increasing pathologic stage. TIBs were mostly localized within CXCL13+ lymphoid aggregates, and immune cell sources of CXCL13 production evolved with LUAD progression and included elevated fractions of CD4 regulatory T cells. This study provides a spatial landscape of TIBs in early-stage LUAD.
While TIBs are highly enriched in LUADs, they are poorly characterized. This study provides a much-needed understanding of the transcriptional, clonotypic states and phenotypes of TIBs, unraveling their potential roles in the immunopathology of early-stage LUADs and constituting a road map for the development of TIB-targeted immunotherapies for the treatment of this morbid malignancy.
Lung adenocarcinoma (LUAD) remains the most frequently diagnosed histologic subtype of lung cancer and accounts for most cancer deaths related to smoking (1). Improved clinical screening strategies have led to increased diagnosis of LUADs at earlier pathologic stages (1). Surgery is the standard treatment for localized LUADs, yet the overall 5-year survival remains below 50%. According to the statistics from the American Cancer Society, the 5-year survival rate drops to about 7% for patients who developed distant metastases. Therefore, strategies to treat LUAD in its earliest stages are urgently needed. Such advances are severely limited by our lagging knowledge of the earliest changes in the tumor microenvironment (TME) during LUAD pathogenesis that could thus lead to ideal targets for interception.
T cells have been, for the most part, the center of attention in efforts to understand the immunobiology of lung cancer (2, 3). Other cell types, in particular, tumor-infiltrating B cells and plasma cells (TIB), have been mostly overlooked, and their roles in the pathogenesis of solid tumors such as LUAD remain poorly understood. TIBs have been detected in different solid tumor types (4). Recent studies from our group and others demonstrated that TIBs can strongly affect patient responses to anticancer chemo- and immunotherapies as well as clinical outcomes in various cancers including melanoma and lung cancer (5, 6). TIBs were also detected in recent single-cell studies, but their phenotypes and states were not extensively characterized (6, 7). It is possible that these important studies lacked the power to perform in-depth profiling of TIBs due to an inadequate number of sequenced B cells. We recently profiled epithelial and TME cells from five patients with early-stage LUADs and found that the fractions of TIBs, including both B cells and plasma cells (PC), were immensely increased in LUADs relative to their matched normal lung tissues (6), thereby suggesting potential roles for TIBs in LUAD pathogenesis.
To better understand the landscape of TIBs in early LUAD pathogenesis, we performed paired single-cell RNA- and BCR-sequencing (scRNA/BCR-seq) of 16 LUADs and 47 matched multiregion normal lung tissues from patients with early-stage disease. We studied BCRs from ∼73K cells and transcriptomes of ∼50K cells—the largest single-cell dataset on TIBs to date. We interrogated the spatial atlas of TIBs at unprecedented resolution exploring their transcriptional states, differentiation and maturation status, geospatial characteristics, clonotypic properties, as well as cellular interactions and colocalization patterns with other TME cell populations. We also correlated TIB characteristics with major clinicopathologic features, including smoking, driver gene mutation, and tumor stage, and assessed their clinical significance such as patient survival and response to immunotherapy. This study provides a much-needed and detailed understanding of the prevailing nature and functional phenotypes of TIBs in early LUAD development.
Single-Cell Profiling of TIBs in LUADs and Matched Multiregion Normal Lung Tissues
We performed integrative scRNA-seq and scBCR-seq on cells obtained from 16 early-stage LUADs and 47 matched multiregion normal lung tissues with varying spatial proximities from the tumors (i.e., adjacent, 0.5 cm from tumor edge; intermediate, 3–5 cm from tumor edge; distant, periphery of the lobe; Fig. 1A; Supplementary Table S1). Fractions of B-lineage subsets among EPCAM-negative (TME) cells were markedly enriched in LUADs compared with their matched normal lung tissues, especially in smokers (Fig. 1B; P = 1.5 × 10−8). To validate this finding and further examine the spatial distribution of TIBs, we performed multiplexed fluorescence (mIF) with a panel of eight markers on 20 available tissues from five of the 16 patients (see Methods). Lymphoid aggregates (LA) were detected in tumor samples from all patients including the tumor compartment and stroma (Fig. 1C), as well as the nontumoral stromal compartment from two patients. Consistently, the total intensities of B cells were significantly higher in tumors compared with their matched normal tissues (Fig. 1D; P = 6.0 × 10−4).
We then sought to define the various transcriptional states and BCR isotypes of TIBs. After rigorous filtering (see Methods), a total of 49,062 B-lineage cells with scRNA-seq data were retained for subsequent analyses. Unsupervised clustering revealed 12 different cell clusters (Fig. 1E and F; Supplementary Table S2) including one each of naïve B, plasmablast (PB), and stressed PC (SPC) cluster; two switched memory B-cell clusters; and seven PC clusters. Naïve B cells highly expressed MS4A1 (CD20), IGHD, IL4R, and FCER2 (CD23) without detectable expression of CD27 and CD38 (Fig. 1F). The two switched memory B-cell clusters differed in their expression of CD27 and CD24. Both naïve and memory B cells showed high expression of MHC class II genes as expected. PCs exhibited distinct profiles such as high expression of CD38, SDC1 (CD138), TNFRSF17 (BCMA), MZB1, JCHAIN, and immunoglobulin (Ig) isotype genes with no/low expression of MHC class II genes (Fig. 1F; Supplementary Fig. S1A and S1B). Seven PC clusters were characterized by differential expression of main classes and subclasses of Ig isotype genes (Fig. 1E; Supplementary Table S2). Cells of the PB cluster were CD38+CD138+IgD−CD27+, and they showed relatively higher expression of MHC class II genes when compared with PCs and the highest expression of MKI67 as well as other cell cycle–related genes (Fig. 1F). We also identified a rare IgD+ PC cluster that was previously reported to be associated with exposure to common bacteria and airborne antigens (8). For cells with single-cell BCR-seq (scBCR-seq data; n = 72,949), we defined Ig heavy-chain (IgH) isotypes based on the sequences of Ig constant regions. Overall, the antibody isotypes detected by scBCR-seq agreed well with that inferred from scRNA-seq (Fig. 1E, top right). Specifically, the naïve B-cell cluster was dominated by IgM and IgD isotypes, memory B-cell clusters showed various isotypes, and PC clusters showed a high enrichment of their corresponding IgH isotypes (Fig. 1E and G). Like PCs, the PB cluster was dominated by IgA1/2 and IgG1/2 isotypes (Fig. 1G).
Markedly Increased TIBs in LUADs and Highly Abundant PCs Particularly in Smokers
We then examined the abundance and cellular compositions of B-lineage cells in LUADs and matched normal lung tissues (Fig. 1H; Supplementary Fig. S1C). The relative fractions of naïve B cells among total B-lineage cells were significantly decreased with increasing proximity to the tumors. In contrast, we noted markedly elevated fractions of memory B cells and PCs in LUADs (Fig. 1I), with the latter especially prominent in smokers relative to nonsmokers (P = 0.07). There was a consistent trend toward increased fractions of IgA+ and IgG+ PCs in LUADs of smokers, albeit not reaching statistical significance (Supplementary Fig. S1D). We next integrated TIB characteristics with genomic data including tumor mutation burden (TMB) and driver mutations identified using whole-exome sequencing (9). No significant differences in TIB compositions were observed between LUADs with high and low TMB. Of note, EGFR-mutant LUADs had significantly lower fractions of PCs and higher fractions of naïve and memory B cells (among total TIBs) compared with KRAS-mutant or KRAS/EGFR double wild-type LUADs (Fig. 1J and K).
Next, we sought to validate our observations in independent patient cohorts, first by analyzing a large public scRNA-seq dataset from Kim and colleagues, which comprised primary LUADs, lymph node, and brain metastases, together with normal lung and lymph node tissues obtained from 44 LUAD patients (10). In line with our cohort, fractions of memory B cells and PCs were significantly increased in primary LUADs compared with normal lung tissues (Fig. 2A; Supplementary Fig. S2A). We next analyzed another scRNA-seq cohort composed of 30 early-stage LUADs from Leader and colleagues (11). Very much in line with our cohort, fractions of PCs were significantly increased in smokers versus nonsmokers (Fig. 2B). We also performed immune deconvolution analysis of bulk RNA-seq data from The Cancer Genome Atlas (TCGA) LUAD cohort (12) using CIBERSORTx (13). Corroboratively, LUADs displayed lower fractions of naïve B cells (Supplementary Fig. S2B), whereas the fractions of PCs were markedly increased in ever-smoker versus nonsmoker LUADs (Fig. 2C, left). Of note, PC fractions were markedly increased in smoker but not nonsmoker LUADs, and the magnitude of change was greater in current or reformed smokers who quit smoking ≤15 years than those reformed >15 years when compared with their matched normal lung tissues (Fig. 2C, right). These observations indicate the potential roles of PCs in the immunopathology of LUAD, particularly in smokers.
High PC Signature in LUADs Predicts Better Survival and Response to Immunotherapy
To assess the clinical significance of TIBs, we first constructed cell type–specific gene signatures for major TIB subsets using our scRNA-seq data in the manner we previously described (6) and confirmed its reliability in deconvolution analysis (Supplementary Fig. S2C and S2D; Supplementary Table S3; Methods). We next analyzed the bulk expression data from the TCGA LUAD cohort and found that PC signature scores were gradually decreased with advanced pathologic stages (Fig. 2D). Both the Cox proportional hazards model and the Kaplan–Meier model revealed that relatively high PC signature scores were significantly associated with better overall survival (OS; Fig. 2E and F). Among the 22 immune cell subsets evaluated, we found that PC signature was distinctively and significantly associated with better OS in the TCGA LUAD cohort (Supplementary Fig. S3A).
We then assessed the correlation of the PC signature with patient response to immunotherapy by analyzing public datasets from four reported clinical trials (14–16). Analysis of the Prat cohort (14) with 35 patients and the combined cohort from Prat and colleagues (14) and Hwang and colleagues (15) with 56 patients with non–small cell lung cancer treated with anti–PD-1 therapy showed that a relatively higher PC signature in pretreatment tumors was associated with better response to anti–PD-1 therapy, a durable clinical benefit (DCB), and significantly better progression-free survival (PFS; Fig. 2G; Supplementary Fig. S3B–S3D). Likewise, analysis of the large LUAD cohorts (n = 316 patients) from POPLAR (17) and OAK (18), two trials using atezolizumab for PD-L1 blockade, showed very consistent results. Relatively higher expression of the PC signature at baseline was significantly associated with better response to anti–PD-L1 therapy (Supplementary Fig. S3E) and improved OS (Fig. 2H and I), as reported by the original study (16). We also observed significantly higher expression of the PC signature in patients with DCB when compared with those with no durable benefit (Supplementary Fig. S3F). Moreover, we found that the relative abundance of IgA+ cells but not IgG+ cells, as measured by the expression of IGHA/IGHG genes relative to PC signature genes, was significantly lower in responders versus nonresponders (Supplementary Fig. S3G).
Enrichment of Highly Differentiated PCs in LUADs and Long-Lived PCs in Smokers
We next aimed to perform an in-depth analysis of the differentiation states of PCs. First, we reconstructed the developmental trajectory of PCs using Monocle 3 (19), and pseudotime analysis revealed various states of PC differentiation, including long-lived PCs, short-lived PCs, and PBs (Fig. 3A). Long-lived PCs were characterized by high expression of STAT3, IKZF3 (Aiolos), and CD37 (Fig. 3B), consistent with their functions in supporting various aspects of PC longevity (20–22). The expression levels of IgA/IgG transcripts were also higher (Fig. 3B; Supplementary Fig. S4A), consistent with the established antibody production in long-lived PCs. In addition, and in support of their antibody-secreting functions (23), most long-lived PCs lost expression of membrane Ig variable (IgV) genes and BCRs (Fig. 3C, left). The cellular differentiation states inferred by CytoTRACE (24) were in line with the Monocle 3 results (Fig. 3A), and we observed gradually decreased CytoTRACE scores—that is, increased differentiation—as the pseudotime progressed (Fig. 3C, middle). Intriguingly, fully differentiated, long-lived PCs were prominently enriched in LUADs of smokers versus nonsmokers (Fig. 3C, right; P < 0.0001).
When investigating PCs by antibody subclass, IgM/IgD+ PCs were the least differentiated, followed by IgG3+, IgG1+, IgA1+, IgG2+, and finally IgA2+ PCs, which had the lowest scores (i.e., most differentiated; Fig. 3D), consistent with their exact order during isotype switching. We also noted increased fractions of IgA1/IgA2+ PCs in LUADs (Supplementary Fig. S4B; P < 2 × 10−16), and PCs from LUADs were significantly more differentiated relative to those from normal lung tissues (Fig. 3E). Consistent with CytoTRACE-based differentiation inference, we noted that the number of genes expressed in LUAD PCs (but not other TME cells) was significantly lower relative to those of normal lung tissues (Supplementary Fig. S4C; P < 2 × 10−16). Notably, we corroborated our findings in two independent scRNA-seq datasets from Lambrechts and colleagues (25) and Kim and colleagues (10), showing the enrichment of more differentiated PCs in LUADs (Supplementary Fig. S4D).
Memory B Cells in LUADs Are More Frequently Class-Switched and at Late Germinal Center Stage
Memory B cells are generated during the germinal center (GC) reaction in parallel to PC differentiation (Fig. 3F, top), and their differentiation is tightly controlled at the transcriptional level. Relative to matched normal lung tissues, we observed a dramatically increased expression of IgA/IgG genes in memory B cells from tumor tissues (Fig. 3F, bottom; Supplementary Fig. S5A), indicating that memory B cells in LUADs are mainly class-switched and at late GC stage. To further investigate the maturation status and transcriptional heterogeneity of memory B cells in LUADs, we defined three developmental stages—early, intermediate, and late GC stages—based on their expression of IgA/IgG genes (Supplementary Fig. S5B), followed by CytoTRACE cell differentiation analysis. Consistently, compared with memory B cells at early GC stage, cells at late GC stage were likely more differentiated, indicated by significantly lower CytoTRACE scores (Fig. 3G). Additionally, memory B cells early along the trajectory were mostly CD27− and unswitched, had the lowest BCR signature scores, and were mostly derived from normal lung tissues, whereas late-trajectory cells were CD27+ and switched, had the highest BCR signature scores, and were nearly exclusive to tumor tissues (Fig. 3H and I).
Analysis of expression dynamics of the top 3,000 variable genes along the trajectory revealed distinct biological processes associated with memory B-cell differentiation (Fig. 3J; Supplementary Table S4). For example, early (relative to inferred pseudotime) upregulated genes were enriched for apoptosis, TNFα signaling via NFκB, and cell-cycle pathways, whereas late upregulated genes included many IgV genes and were enriched for BCR signaling. We also found upregulated genes (e.g., XBP1, MZB1, and JCHAIN) and downregulated genes (e.g., CXCR5, CCR7, CD69 and BACH2) in late- relative to early-stage memory B cells, consistent with previously reported roles of these genes in B-cell differentiation (refs. 26–29; Fig. 3K; Supplementary Fig. S5C; Supplementary Table S5). In addition, the upregulated differentially expressed genes in tumor tissues (vs. matched normal lung), late-stage (vs. early-stage), and switched (vs. unswitched) memory B cells were largely overlapping (Fig. 3K; Supplementary Table S5), implying that memory B cells in LUADs were likely resulting at a late stage along the differentiation trajectory. In agreement with this notion, IgA1+ and IgG1+ cells were the most abundant class-switched subsets indicated by scBCR-seq (Fig. 3L, top), and this observation was confirmed in the large TCGA LUAD cohort based on isotype-specific gene expression (Fig. 3L, bottom). In addition, we observed gradually decreased fractions of IgM+/IgD+ cells and increased fractions of IgA+ cells with increasing proximity to LUADs and, consequently, significantly increased ratios of switched to unswitched isotypes in tumor tissues (Fig. 3M). Our observations suggest that B cells and PCs with various differentiation states exhibit distinct transcriptional and phenotypic profiles as well as geospatial heterogeneity in the lung (Fig. 3I and N).
Higher BCR Clonotype Diversity and Lower Clonality in LUADs, in Smokers, and with Increasing Pathologic Stage
To better understand the Ig repertoires in LUADs and neighboring tissues, we analyzed the paired scBCR-seq data generated on the same cDNA libraries and identified 72,949 cells with productive VDJ rearrangements. Relative to normal lung tissues, we observed markedly increased numbers of unique BCR clonotypes—that is, higher diversity in their BCR repertoires—in tumor tissues (Fig. 4A; Supplementary Fig. S6A). We detected increased numbers of expanded BCR clonotypes in tumor tissues but at relatively lower clonal frequencies, that is, lower unevenness in clonal size distributions of the BCR repertoires than in normal tissues in which a small number of expanded clones dominated the repertoires (Fig. 4B, left). Likewise, we also observed lower BCR clonal unevenness in smokers versus nonsmokers (Fig. 4B, middle) and with increasing pathologic stage (Fig. 4B, right).
We next aimed to interrogate whether the geospatial distribution of BCR clonotypes in tumor and normal lung tissues could inform the trafficking of B cells. We measured how the BCR repertoire in a sample from a given geospatial location (e.g., LUAD) overlaps with that from another location (e.g., distant normal lung). On average, 38.9% of BCR clonotypes from normal lung tissues were shared with that from LUADs, contrary to BCR clonotypes from LUADs, which were largely unique with a much lower proportion of shared clones (6.6%) with their neighboring tissues (Fig. 4C). Notably, when splitting BCR clonotypes detected in LUADs based on whether or not they were unique to tumors, we observed that BCR clonotypes that were shared with normal tissues were highly expanded, which is in contrast to tumor-unique clonotypes that were less expanded and largely composed of small clones and singletons, particularly in smokers (Fig. 4D).
Distinct Profiles of Ig Class Switching, VDJ Gene Usage, and Expanded PCs in Tumors
We next explored the overlap in BCR repertoires between clonotypes of an upstream isotype with its downstream isotype following the genomic coordinates (5′ to 3′) of Ig subclass (i.e., IgG3, IgG1, IgA1, IgG2, IgG4, and IgA2) coding genes (Fig. 4E). Relative to normal lung tissues, an upstream antibody isotype had increased percentages of shared clonotypes with its direct downstream isotype in LUADs, implying more frequent Ig class switching. We also examined V(D)J gene usage and noted the LUAD enrichment of IGHV genes and IGHV-J pairs that are infrequently used in cells from normal lung tissues (Supplementary Fig. S6B and S6C), suggesting that B cells in LUADs may have undergone unique V(D)J rearrangements. By integrating the scRNA-seq and scBCR-seq data, we found that cells of most expanded clones were PCs (Fig. 4F). The top 15 most expanded clones were nearly exclusively mapped to clusters of PCs (Fig. 4G; Supplementary Fig. S6D). Lastly, compared with randomly grouped B cells and size-controlled clones, B cells of the same clone were more likely to share a common transcriptional state, antibody isotype, or spatial location (Supplementary Fig. S6E).
Geospatial Characteristics of BCR Somatic Hypermutation in LUADs
The affinity maturation of B cells in GCs is a result of iterative rounds of clonal expansion coupled with somatic hypermutation (SHM), followed by affinity-based selection of antigen-specific B cells (ref. 30; Fig. 4H, left). Typically, high SHM frequency indicates strong antigen-specific affinity (31). We quantified SHM levels using scBCR-seq data (see Methods), based on which we classified cells into three groups: none (no SHM), low-SHM (mutation frequency ≤3%), and high-SHM (>3%). As expected, we observed no SHM in BCRs of naïve B cells, a low or medium level of SHM in BCRs of memory B cells, and a high level of SHM in BCRs of PCs (Fig. 4H, right, and 4I; Supplementary Fig. S7A). Memory B cells at the late GC stage displayed higher SHM frequencies relative to those at early or intermediate GC stage (Supplementary Fig. S7B–S7E). The clonotype size was relatively larger in cells with high SHM frequency (Fig. 4J). The rare subset of IgD+ PC exhibited the highest SHM frequency (Fig. 4I; Supplementary Fig. S7A), consistent with the hypothesis that IgD+ B cells develop in super antigen-driven GC reactions (32). Notably, we observed gradually increased SHM frequencies with increasing proximity to tumors and in both switched and unswitched cells (Fig. 4K and L; Supplementary Fig. S7F). Further analysis of expanded clones (n >10 cells) revealed higher SHM frequencies in BCRs of tumor-resident clones relative to normal-resident or shared clones (Supplementary Fig. S7G). In addition, we noted lower BCR SHM frequencies in PCs of EGFR-mutant LUADs (Supplementary Fig. S7H and S7I). Consistently, quantification of tissue enrichment of all cells with scBCR-seq data based on the ratio of observed to expected cell number showed that the no and low-SHM cells were enriched in EGFR-mutant LUADs (Supplementary Fig. S7J).
The B-cell Chemotactic CXCL13–CXCR5 Axis in Early LUAD Development
To understand the recruitment of TIBs in LUADs, we examined the expression of known attractants of B cells including CXCL13, which is essential for B-cell recruitment and formation of tertiary lymphoid structures (TLS; refs. 33, 34; Fig. 5A) and other chemokines and their receptors including CCR7–CCL19/CCL21, CXCR4–CXCL12, CXCR3–CXCL9/10/11, and CCR6–CCL20. Among them, the CXCL13–CXCR5 chemokine axis appeared to be predominant in early-stage LUADs. Both CXCL13+ T cells and CXCR5+ B cells among TME cells were highly enriched in LUADs (Fig. 5B), and we noted significantly increased relative proportions of CXCL13+ cells among total T cells in tumors compared with normal lung tissues in this cohort (Supplementary Fig. S8A), which was corroborated in three public scRNA-seq cohorts (Supplementary Fig. S8B–S8D; refs. 10, 11, 35). The expression of CXCL13 was also higher in tumors than in normal lung tissues of the TCGA LUAD cohort (Supplementary Fig. S8E).
Our mIF data suggested that LAs accounted for the majority of TIBs detected in LUADs (Fig. 1C). However, due to the lack of CXCL13 in the mIF panel, we were not able to determine the spatial relationship between CXCL13+ T cells and TIBs in LUADs. We then characterized CXCL13 protein (BCA1 antibody) expression along with lineage markers that define T-cell (CD3), B-cell (CD20), and tumor compartments (PanCK) using the NanoString GeoMx digital spatial profiler (DSP) platform in four LUADs and paired normal-appearing lung tissues (see Methods). For each case, a total of 13 regions of interest (ROI) were selected including four ROIs for CXCL13-positive LAs (LA/BCA+), three ROIs for CXCL13-negative LAs (LA/BCA−), three ROIs that covered tumor and tumor stroma without LAs (T/LA−/BCA−), and three ROIs that covered normal lung tissues (Supplementary Fig. S9A–S9F). Each of these selected ROIs was then segmented into three areas of illumination (AOI): B cell–, T cell–, or tumor cell–enriched based on the expression of the morphology markers (Fig. 5C). No or very few B cells were observed in normal lung or T/LA−/BCA− ROIs. Relative to the LA/BCA− ROIs, we observed prominently increased B-cell abundance in the LA/BCA1+ ROIs (Fig. 5D). To obtain a more comprehensive picture of the spatial distribution of LAs, we further performed spatial transcriptomics (ST) on four samples from two patients in our scRNA-seq cohort using the Visium platform (10X Genomics). Consistent with the mIF and DSP data, the majority of LAs identified were within the tumor compartment and B cells in LUADs were largely colocalized with CXCR5+ T follicular helper (Tfh) cells and CXCL13+ T cells within LAs (Supplementary Fig. S9G).
While T cells were overall the main source of CXCL13 production, our single-cell analysis suggested that the majority of CXCL13+ cells were CD4 Tfh and CD8 exhausted T cells (Tex), followed by CD4 regulatory T cells (Treg) and proliferative Tregs, whereas only a very small subset of CD8 effector T (Teff) cells and CD8 resident memory T (Trm) cells expressed CXCL13 (Fig. 5E). We also detected CXCL13 expression in stressed CD4/CD8 T cells and in a small subset of LAMP3+ dendritic cells (DC), macrophages, and fibroblasts. Analysis of these 10 CXCL13-expressing subsets (with >100 CXCL13+ cells and the percentage of CXCL13 expression >0.5%) revealed a strong correlation between CXCL13+ CD4 Tfh, Tregs, CD8 Tex, fibroblasts, and macrophages on the one hand and CXCR5+ memory B cells on the other (Fig. 5E).
We then examined whether the cellular sources of CXCL13 production in LUADs evolve as the disease develops and progresses. Consistently across this cohort and three additional public scRNA-seq datasets, we observed significantly increased relative proportions of CXCL13+ cells among Tregs in LUADs versus normal lung tissues (Fig. 5F; Supplementary Fig. S8D). Increased expression of CXCL13 was also found in the TCGA LUAD cohort, whereby it was significantly higher in tumors, regardless of their pathologic stage, relative to normal lung tissues (Supplementary Fig. S8E). CXCL13 expression was highest in early-stage LUADs, and its expression gradually decreased with advancing pathologic stage. We also interrogated our precancer bulk RNA-seq cohort (36) and noted that CXCL13 levels were progressively increased from normal lung to atypical adenomatous hyperplasia (AAH), the earliest precursor lesion of LUAD (37), and up to invasive LUAD (Fig. 5G, left). Consistently, B-cell and PC signature scores were also gradually increased along this pathologic spectrum (Fig. 5G, middle and right).
Expression of Immunomodulatory Genes and TIB Phenotypic Heterogeneity
We next analyzed the expression of immune biomarkers using the GeoMx DSP data we generated on 43 protein markers (Supplementary Table S6). We determined the differences in immune biomarker expression in the B-cell compartment between tumor-associated LAs with and without CXCL13 expression (LA/BCA+ and LA/BCA−, respectively). Relative to LA/BCA− ROIs, we found that the B-cell compartment of LA/BCA+ ROIs showed significantly increased expression of CD27 and CD40 as well as decreased expression of CTLA-4, LAG3, and PD-L1 (Supplementary Fig. S10A). In addition, increased IgA+ PCs in LUADs prompted us to interrogate how cytokines produced by various cell populations within TME regulate B-cell class-switch recombination (CSR) and generation of PCs. We then quantified expression levels of cytokines that induce IgG (IFNγ and TNFα) and IgA (TGFβ, IL6, and IL10) CSR (Fig. 5H; Supplementary Fig. S10B) in immune and stromal cell subsets (30, 38–41). We noted the limitation of scRNA-seq in detecting the transcripts of interleukin (likely due to the “dropout” events) and therefore focused our analysis on TGFB1, which is known to be critical in promoting IgA isotype switching in human B cells (30, 38–41). Although TGFB1 was broadly expressed by various cell types (Supplementary Fig. S10B), it is noteworthy that, relative to matched normal lung tissues, TGFB1 was significantly higher in cancer-associated fibroblasts (CAF) in LUADs in our cohort (Fig. 5I) and the dataset from Kim and colleagues (Supplementary Fig. S10C; ref. 10). In addition, we performed spatial mapping of TGFB1-expressing cells using the Visium data and observed strong TGFB1 expression, particularly in spatial regions enriched with fibroblasts, as indicated by high expression of canonical markers of fibroblasts (Fig. 5J). These results suggest that TGFB1 upregulation in CAFs is likely associated with increased fractions of IgA+ PCs in LUADs.
Memory B Cells and IgA+ PCs Strongly Co-occur with CD4 Tregs and CD8 Tex, and They Negatively Correlate with Cytotoxic T Cells
Lastly, to better understand how the various TIB subsets may influence the functional phenotypes of the TME, we first measured the expression of immunomodulatory genes (Supplementary Table S7) in TIB subsets. TGFB1, GPR183 (EBI2), CD24, and LTB were highly enriched in memory B cells, whereas CCR10, VSIR (VISTA), TNFRSF18 (GITR), and LGALS3 (Galectin 3) were expressed by IgA+ PCs (Supplementary Fig. S10D), thereby pointing toward their immunosuppressive potential. We further examined cell-state coassociation relationships with 28 other immune and stromal cell subsets defined using our scRNA-seq data and found that memory B cells and IgA+ PCs strongly co-occurred with CD4+ Tfh, Treg, and CD8+ Tex cells (Fig. 5K and L). IgA+ PCs also strongly co-occurred with cDC2, fibroblasts, and macrophages. In contrast, IgA+ PCs showed significant negative correlations with cytotoxic T lymphocytes (CTL), including CD4+ CTLs, CD8 Teff cells, and CD8 Trm cells (Fig. 5K). With immune deconvolution analysis of the bulk RNA-seq data, we also observed positive correlations between the signature scores of PCs and Treg/Tex in the TCGA LUAD cohort (Fig. 5M). These results suggest distinct coassociation relationships and cross-talk between TIBs and various TME cells in LUADs.
Single-cell sequencing has been applied to interrogate the highly complex immune microenvironment of LUADs (10, 11, 25, 35), and we and others have demonstrated that the local TME consists of immune cells with diverse phenotypes and complex cross-talk (6, 42). However, analysis of the immune biology of LUADs and other solid tumors has been largely T cell–focused, and the characteristics of TIBs are relatively understudied. Most previous single-cell studies classified TIBs into two main cell types, B cells and PCs, with each one treated as a homogeneous subset (10, 25, 42, 43). This study leverages our unique scRNA-seq and scBCR-seq data generated on spatially defined tumor and matched normal lung tissues collected from surgically resected LUADs and presents a spatially resolved single-cell atlas of TIBs in early-stage LUAD (Fig. 6). Our study by deep profiling of the various transcriptional and differentiation states, BCR clonotypes, antibody isotypes, SHM frequencies, spatial features of B cells and PCs, and co-occurrence patterns with other TME cells using multiple independent approaches highlights the heterogeneous nature of TIBs in LUADs and their cross-talk with the TME, and it presents a unique attempt to comprehensively characterize TIBs in solid tumors.
We demonstrate extensive remodeling of TIBs in the TME of LUADs, characterized by markedly increased fractions of memory B cells and PCs with more differentiated states. Our analyses suggest that various microenvironmental factors may have contributed to the altered TIB landscapes in LUADs. First, we showed significantly increased TGFB1 expression in CAFs, which is in line with the highly enriched IgA+ PCs in LUADs given the critical role of TGFB1 in promoting IgA isotype switching in human B cells (30, 38–41). In addition, we noted massive changes in TIBs in smoker LUADs, such as enrichment of the fully differentiated, long-lived, IgA-producing PCs, and decreased B-cell clonality, pointing toward a link to cigarette smoke exposure (44). We also showed that the majority of TIBs detected in LUADs were within LAs as well as underscored prominent differences in B-cell abundance between CXCL13+ and CXCL13− LAs and in immune biomarker expression such as upregulated CD27 and CD40 as well as downregulated CTLA-4 and LAG3 in the B-cell compartment of CXCL13+ LAs. Our findings suggest that the presence of CXCL13+ T cells in the TME may have contributed not only to B-cell recruitment but also to the phenotypic heterogeneity of TIBs in the LUAD TME. The progressively increased CXCL13 expression along the evolution continuum from AAH to invasive LUAD supports a multifaceted role of CXCL13 in early-stage LUAD. In addition to Tfh cells, we found multiple other cellular sources of CXCL13 production in the TME such as CD8 Tex and CD4 Tregs. Our observation of significantly increased relative proportions of CXCL13+ cells among Tregs in LUADs versus normal lung tissues across multiple scRNA-seq cohorts intriguingly points to the evolutionary dynamics of CXCL13 production by various TME cell subsets in LUAD.
The generation of a cancerization “field” has been considered to present an initiating phase in tumor development from a particular niche, and field carcinogenesis has been described in various malignancies, including those of the lung (45). Previous studies from our and other groups have identified cellular and molecular alterations in histologic normal-appearing regions that are adjacent to solid tumors and that are less prevalent/absent in relatively more distant normal regions (6, 45, 46), but previous analyses of the field of cancerization were, for the most part, focused on the epithelial compartment. The multiregional sampling strategy of this study is unique and allowed us to interrogate the geospatial distribution of TIBs in LUAD tissues as well as in adjacent (to tumor) and relatively more distant normal lung tissues. By characterizing the landscape of TIBs within the LUAD field, we found significant geospatial changes such as gradually decreased fractions of naïve B cells and elevated fractions of class-switched memory B cells and PCs with increasing tumor proximity (Fig. 5). Notably, we found increased BCR clonal diversity and frequencies of SHM across normal tissues with closer proximity to the LUADs, together with significant changes in cell differentiation states. These findings allow us to surmise that TIBs and, thus, the mounting of B-cell-mediated immune responses, may be actively associated with the development of LUAD from a particular niche in the lung as opposed to somewhere else in the mutagenized (by smoking) lung. Of note, more than one third of the BCR clonotypes detected in normal tissues were shared with those in the LUAD tissue from the same patient. Although their antigen specificity is to be determined, our results underscore a phenomenon of B-cell trafficking from distant to more proximal (to the tumor) regions within the field, a supposition that warrants further investigation, for instance, in higher spatial resolution and in a longitudinal setting.
Notably, our findings of the prominent increase in the fractions of PCs in smokers with more long-lived PCs are in agreement with the observation of the decreased fractions of PCs and lower SHM rates in LUADs harboring EGFR mutations (which are seen commonly in nonsmokers), and our results indicate a superior role of PCs in smoking-related LUADs. In addition, our analyses pointed toward highly abundant IgA+ PCs in LUADs whereby IgA+ PCs strongly co-occurred with activated and proliferative Tregs, CD8+ Tex, CAFs, and macrophages and negatively correlated with CTLs, which are consistent with multiple lines of evidence supporting the immunosuppressive and protumorigenic role of IgA+ PCs (30), but further functional studies are needed to determine their exact roles in LUAD pathogenesis.
In terms of their clinical significance, our analysis clearly points to the PC (not B-cell) subset, as the PC signature derived from this study notably showed the strongest correlation with patient survival across multiple LUAD cohorts and significantly associated with patient response to anti–PD-1/PD-L1–based immunotherapy. This observation is in accordance with a recent study demonstrating that intratumoral PCs predict outcomes in patients with NSCLC treated with atezolizumab (16). Our findings also lend support for the PC signature as a potential biomarker for the prediction of OS benefit and the efficacy of PD-1/PD-L1 blockade in patients with LUAD.
While single-cell sequencing provides a robust approach that enables whole-transcriptome profiling of TIBs and integration with the BCR repertoire, our study has limitations. First, the interleukin transcripts are poorly captured in scRNA-seq likely due to a high dropout rate, and, consequently, IL10, which is important to characterize the reported regulatory B cells (Breg), is not informative, and the fraction of IL10+ Bregs was very low (0.87%). Another reason for a low Breg fraction could be due to the dynamic and context-dependent nature of the Breg state. A recent expert view on the matter suggests that Bregs reflect a phenotypic state rather than a hard-wired lineage (47). It is considered that IL10, the most commonly reported Breg effector molecule, can be immunostimulatory in some contexts. Second, the scarcity of remaining tissue following scRNA-seq, genomic analysis, and imaging hindered our ability to directly measure cytokines, soluble factors, and ligands secreted by TIBs, which could better characterize the functional phenotypes of TIBs. Lastly, due to the lack of markers for Tfh and follicular DCs in our mIF or DSP panels, we were not able to further determine whether an LA is a TLS or its maturation status. Nevertheless, this study provides fundamental findings on yet uncharted phenotypic heterogeneity and roles of TIBs in the development and immunopathology of early-stage LUAD, and it constitutes a valuable resource to leverage targets for the development of desperately needed novel immunotherapeutic strategies for the early treatment of this morbid malignancy.
Multiregional Sampling of Surgically Resected LUADs and Matched Normal Lung Tissues
Sixteen patients with early-stage LUADs (I–IIIA) were evaluated at the MD Anderson Cancer Center and underwent standard-of-care surgical resection (Supplementary Table S1). All samples in the study were obtained under a waiver of consent from banked or residual tissues approved by MD Anderson institutional review board protocols. Tumor (LUAD) and matched multiregional normal lung samples were collected as we previously described (6).
Single-Cell Library Construction and Sequencing
Freshly frozen tumor and normal samples (n = 63) were immediately minced and enzymatically digested, and red blood cells were lysed using optimized protocols as we previously described (6). Briefly, for patient 1, cells were then filtered, counted, and stained with SYTOX Blue viability dye (S34857, Life Technologies), followed by fluorescence-activated single-cell sorting to eliminate doublets and dead cells and collect viable singlet cells. Cells from patients 2 through 16 (P2–P16) were stained with anti–EPCAM-PE [347198, BD Biosciences; 1:50 dilution in ice-cold phosphate-buffered saline (PBS) containing 2% FBS] for 30 minutes with gentle rotation at 4°C. EPCAM-stained cells were then washed, filtered using 40-μm tip filters, stained with SYTOX Blue, and processed on a FACSAria II instrument. Doublets and dead cells were eliminated, and viable (SYTOX-negative) EPCAM-negative singlets were collected in PBS containing 2% FBS. Cells were washed again to eliminate ambient RNA, and a sample was taken for counting by Trypan Blue exclusion (T8154, Sigma-Aldrich) before loading on 10X Chromium microfluidic chips. Gene expression libraries were generated according to the manufacturer's instructions using Chromium Next GEM Single Cell 5′ Library & Gel Bead Kits (v1/v3 Chemistry, 10X Genomics). Library quality was assessed using high-sensitivity capillary electrophoresis on a Fragment Analyzer (TapeStation: High Sensitivity 5000 from Agilent Technologies). Library pools were denatured and diluted as recommended for sequencing on the Illumina NovaSeq 6000 platforms to achieve a depth of at least 50,000 read pairs per cell for single-cell GEX libraries and at least 10,000 reads pairs per cell for single-cell V(D)J libraries. The detailed methods of sample processing were described in our recent study (6).
Single-Cell Data Processing and Analysis
Raw scRNA-seq and scBCR-seq data were preprocessed using CellRanger (v3.0.2, 10X Genomics), followed by quality filtering, doublet removal, data normalization, assessment, and correction of batch effects. Briefly, cells expressing fewer than 500 genes or more than 6,500 genes were removed. Likely dying or apoptotic cells where >15% of transcripts were derived from the mitochondrial genome were also excluded. Additional possible doublets were further identified and removed. Seurat (version 4.0; ref. 48) was applied for data normalization, unsupervised cell clustering, and identification of differentially expressed genes among cell clusters. More details of data processing are described in Supplementary Methods. The cell types and transcriptional states of B cells and PCs were defined based on cluster distribution, cluster-specific genes (Supplementary Table S2), and unique expression of canonical B-cell/PC-related markers as well as transcriptional factors (Supplementary Methods). Monocle (19) and CytoTRACE (24) were applied to reconstruct the differentiation trajectories and infer cell differentiation states. We constructed cell type–specific gene expression signatures (Supplementary Table S3) for major B-cell subtypes (i.e., naïve B cells, memory B cells, PCs) using our scRNA-seq data in the manner we previously described (6).
scBCR-seq data were first preprocessed by Cell Ranger v3.0.2 for V(D)J sequence assembly and BCR reconstruction using the GRCh38 assembly in Ensembl (refdata-cellranger-vdj-GRCh38-alts-ensembl-2.0.0) as the reference. The Change-O repertoire clonal assignment toolkit (49) was used to define clones, and BCR clonal abundances and diversity were evaluated by estimateAbundance and alphaDiversity functions in Alakazam (v1.1.0), respectively (49). The inverse of the Simpson index was used to quantify the clonal diversity. The diversity index (D) for each group is the mean value of overall resampling realizations, and the confidence intervals of clonal diversity are derived from the standard deviation of the resampling realizations. We quantitatively assessed the similarity between BCR repertoires among groups based on the nucleotide sequences of the CDR3 regions. To identify SHM, BCR germline sequences were reconstructed using CreateGermlines.py, and the number of somatic mutations for each sequence was calculated using observedMutations (Shazam v1.1.0; ref. 49). Paired scBCR-seq data were integrated with scRNA-seq based on their matched unique cell barcodes.
mIF, IHC Staining, and NanoString Geomx DSP
mIF staining was performed using 4-μm-thick formalin-fixed, paraffin-embedded (FFPE) sample sections from five of the 16 patients with antibodies against a validated panel of eight markers: PanCK, CD20, CD3, CD4, CD8, CD56, Foxp3, and Ki-67. In each sample, five ROIs (931 × 698 μm size) were selected at a resolution of 20× for image analysis. The primary antibody (Anti-BCA1, clone EPR23400-92, Abcam) was used for CXCL13 IHC. For GeoMx DSP, 5-μm sections were stained with the semiautomated GeoMx DSP standard protein protocol. For visualization markers, we used PanCK (AE1/AE3) for epithelial cells, SYTO 13 for nuclear stain, CD3e for T cells, and CD20 for B cells. For each case, a total of 13 ROIs were selected, and selected ROIs were segmented into tumor, T cell–enriched, and B-cell–enriched compartments. More details of sample processing, selection of ROIs, segmentation, and data analysis are described in Supplementary Methods.
ST Data Generation
We further performed ST on four samples from two patients (P10 and P14) in our scRNA-seq cohort using the Visium spatial technology from 10X Genomics. First, the FFPE tissue block was placed in a microtome and cut to expose the tissue. After the tissue was exposed, three to four tissue sections of 10-μm thickness were collected for RNA extraction, which was performed using the Qiagen RNeasy FFPE Kit. To assess the RNA quality of the tissue, the purified RNA was immediately processed to calculate the percentage of total RNA fragments >200 nucleotides (DV200) using the Agilent RNA 6000 Pico Kit. Based on DV200 evaluation, blocks with DV200 >50% were selected for proceeding with sectioning. The tissue blocks were then sectioned by a microtome to generate appropriately sized sections for Visium slides. Sections were collected in a 42°C water bath and were placed within the frames of capture areas on Visium Spatial Gene Expression Slide (PN-1000188, 10X Genomics) with only one section placed within each capture area (6.5 × 6.5 mm). The tissues were then deparaffinized, stained, and decross-linked, followed by probe hybridization, ligation, release, and extension. The Visium spatial gene expression FFPE libraries were constructed using the Visium Human Transcriptome Probe Kit (PN-1000363) and the Visium FFPE Reagent Kit (PN-1000361) following the manufacturer's guidance. Constructed libraries were sequenced on the Illumina NovaSeq 6000 platforms to achieve a depth of at least 75,000 mean read pairs per spot and at least 2,000 median genes per spot. More details of ST data analysis are described in the Supplementary Methods.
Additional Statistical Analyses
In addition to the bioinformatics approaches and statistical analyses described above, all other statistical analyses were performed using statistical software R v3.6.0. To assess phenotypic relationships and examine co-occurrence patterns (both positive and negative) between TIB cell subsets and various other immune and stromal cell populations in the TME, Spearman correlation analysis was applied. To compare the fractions of different B and PCs and their subsets across tumor and spatially defined normal tissues, the Mann–Whitney test and paired t test were applied as indicated. All statistical significance testing in this study was two-sided, and results were considered statistically significant at P values or FDR q-values < 0.05. Defaulted “P < 2 × 10−16” reported in R v3.6.0 was used when the P value was too small to illustrate.
All scRNA-seq and scBCR-seq data generated by this study have been deposited at the European Genome–phenome Archive (EGA) under accession number EGAS00001005021. Access to this shared dataset is controlled by the institutional Data Access Committee in compliance with the NIH policy for Data Management and Sharing. Further information about EGA can be found at https://ega-archive.org. The data can also be accessed through the online Single Cell Research Portal (https://singlecell.mdanderson.org/BcellLC), an interactive Web-based tool we have developed for visualizing our single-cell data. All codes used for analysis and cell annotation are available from https://github.com/dapenghao/B_LUAD. B-cell gene signatures derived from this study are provided in Supplementary Tables S2 and S3. Previously published datasets reanalyzed in this study are described in Supplementary Methods. Further information and requests should be directed to and will be fulfilled by the corresponding authors.
A. Sinjab reports grants from an NIH postdoctoral fellowship (T32CA217789) to MD Anderson Cancer Center during the conduct of the study. J.A. Wargo reports grants from the Mark Foundation outside the submitted work; speakers bureau/advisory boards for Imedex, Dava, Omniprex, Illumina, Bristol Myers Squibb, Roche/Genentech, GSK, Novartis, AstraZeneca, PeerView, Micronoma, Ella Therapeutics, and Gilead; and stock options from Micronoma. T. Cascone reports personal fees and other support from MedImmune/AstraZeneca and EMD Serono, grants, personal fees, and other support from Bristol Myers Squibb, personal fees from Merck, Genentech, Arrowhead Pharmaceuticals, the Society for Immunotherapy of Cancer, Roche, Medscape, and PeerView, and other support from Boehringer Ingelheim outside the submitted work. B. Sepesi reports personal fees from AstraZeneca, Medscape, and PeerView outside the submitted work. S.M. Dubinett reports grants from Janssen, Novartis, and Johnson & Johnson, and nonfinancial support from LungLife AI and Early Diagnostics outside the submitted work. I.I. Wistuba reports grants and personal fees from Genentech/Roche, Bayer, Bristol Myers Squibb, AstraZeneca, Pfizer, HTG Molecular, Merck, Guardant Health, Novartis, and Amgen, personal fees from GSK, Flame, Sanofi, Daiichi Sankyo, Oncocyte, Janssen, MSD, and Platform Health, and grants from Adaptimmune, Adaptive, 4D, EMD Serono, Takeda, Karus, Iovance, Johnson & Johnson, and Akoya outside the submitted work. C.S. Stevenson reports other support from Johnson & Johnson during the conduct of the study, as well as other support from Johnson & Johnson outside the submitted work. A. Spira reports personal fees from Johnson & Johnson during the conduct of the study, as well as personal fees from Johnson & Johnson outside the submitted work. H. Kadara reports grants from Johnson & Johnson during the conduct of the study. No disclosures were reported by the other authors.
D. Hao: Data curation, formal analysis, validation, investigation, visualization, methodology, writing–original draft. G. Han: Data curation, formal analysis, methodology, writing–original draft. A. Sinjab: Data curation, validation, writing–original draft. L.I. Gomez-Bolanos: Formal analysis, validation, visualization, methodology. R. Lazcano: Validation, methodology. A. Serrano: Formal analysis, validation, visualization, methodology. S.D. Hernandez: Validation, methodology. E. Dai: Formal analysis, methodology. X. Cao: Formal analysis, validation, visualization, methodology. J. Hu: Software, methodology. M. Dang: Formal analysis, methodology. R. Wang: Formal analysis, methodology. Y. Chu: Formal analysis, methodology. X. Song: Software, methodology. J. Zhang: Software, methodology. E.R. Parra: Formal analysis, validation, visualization, methodology. J.A. Wargo: Interpretation of results. S.G. Swisher: Data curation. T. Cascone: Data curation. B. Sepesi: Data curation. A.P. Futreal: Resources, interpretation of results. M. Li: Software, methodology, interpretation of results. S.M. Dubinett: Resources. J. Fujimoto: Investigation. L.M. Solis Soto: Formal analysis, validation, visualization, methodology, writing–original draft, interpretation of results. I.I. Wistuba: Resources. C.S. Stevenson: Resources, data curation. A. Spira: Resources, data curation. S. Shalapour: Writing–review and editing, interpretation of results. H. Kadara: Conceptualization, resources, data curation, formal analysis, supervision, funding acquisition, project administration, writing–review and editing, interpretation of results. L. Wang: Conceptualization, resources, data curation, formal analysis, supervision, funding acquisition, visualization, methodology, writing–original draft, project administration, writing–review and editing, interpretation of results.
This study was supported in part by the start-up research fund and the University Cancer Foundation via the Institutional Research Grant Program at The University of Texas MD Anderson Cancer Center (to L. Wang). This study was also supported by NCI grants U01CA264583 (to H. Kadara and L. Wang), R01CA205608 (to H. Kadara), and 1U2CCA233238 (to A. Spira, S.M. Dubinett, and H. Kadara), Cancer Prevention & Research Institute of Texas grants RP220101 (to H. Kadara and L. Wang) and RP160668 (to I.I. Wistuba), University of Texas SPORE in Lung Cancer P50CA070907, and NCI P50 core grant CA016672, as well as research funding from Johnson & Johnson (to H. Kadara). L. Wang and H. Kadara are Andrew Sabin Family Foundation fellows. A. Sinjab is supported by a T32CA217789 MD Anderson Cancer Center Translational Genomics and Precision Medicine in Cancer training program.
The publication costs of this article were defrayed in part by the payment of publication fees. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.
Note: Supplementary data for this article are available at Cancer Discovery Online (http://cancerdiscovery.aacrjournals.org/).