Cancer evolution determines molecular and morphologic intratumor heterogeneity and challenges the design of effective treatments. In lung adenocarcinoma, disease progression and prognosis are associated with the appearance of morphologically diverse tumor regions, termed histologic patterns. However, the link between molecular and histologic features remains elusive. Here, we generated multiomics and spatially resolved molecular profiles of histologic patterns from primary lung adenocarcinoma, which we integrated with molecular data from >2,000 patients. The transition from indolent to aggressive patterns was not driven by genetic alterations but by epigenetic and transcriptional reprogramming reshaping cancer cell identity. A signature quantifying this transition was an independent predictor of patient prognosis in multiple human cohorts. Within individual tumors, highly multiplexed protein spatial profiling revealed coexistence of immune desert, inflamed, and excluded regions, which matched histologic pattern composition. Our results provide a detailed molecular map of lung adenocarcinoma intratumor spatial heterogeneity, tracing nongenetic routes of cancer evolution.
Lung adenocarcinomas are classified based on histologic pattern prevalence. However, individual tumors exhibit multiple patterns with unknown molecular features. We characterized nongenetic mechanisms underlying intratumor patterns and molecular markers predicting patient prognosis. Intratumor patterns determined diverse immune microenvironments, warranting their study in the context of current immunotherapies.
This article is highlighted in the In This Issue feature, p. 1307
Cancer cells evolve by acquiring novel alterations and adapting to changing conditions. Genetic, epigenetic, and transcriptional changes determine extensive heterogeneity among patients and within individual tumors, influencing disease prognosis and therapeutic options. Lung adenocarcinoma is the most common subtype of lung cancer, and it encompasses molecularly and phenotypically diverse diseases (1, 2), either associated with or unrelated to tobacco exposure (3, 4). Lung adenocarcinoma genetic diversity has been documented both across patients (1, 5), where it can determine treatment choices (6–10), and within individual tumors (11, 12), where it sustains disease evolution and treatment resistance (13, 14). Lung adenocarcinoma inter- and intrapatient molecular diversity is, however, not exclusively genetic. Transcriptional and epigenetic heterogeneity has been reported both among and within patients (15–19). Moreover, molecular diversity can translate in diverse tumor microenvironments, for example in association with variable tumor mutational burden (TMB; refs. 20–23) or presence of specific oncogenic alterations (24).
In the clinic, histopathologic analyses have revealed heterogeneous tumor tissue morphologies referred to as histologic patterns. The most frequent patterns are classified as lepidic, papillary, acinar, and solid (Fig. 1A), and 80% of lung adenocarcinoma tumors concurrently exhibit at least two of these patterns. According to the latest WHO classification (2), histopathologic lung adenocarcinoma classification is based on pattern prevalence, which is a major prognostic indicator (25, 26). Indeed, tumors with a prevalent lepidic pattern are typically considered less aggressive and associated with the early phases of the disease, whereas solid-prevalent tumors are indicative of poor prognosis. These prognostic associations define a potential progression of patterns from lepidic to papillary, acinar, and at last solid (Fig. 1A). Whether this progression can also be observed at the molecular level is, however, unknown. Indeed, reconciling molecular and histologic heterogeneity is hampered by the difficulty of assaying these features in the same tumor material. Recent digital pathology and spatial genomics technologies coupled with advanced computational approaches provided new strategies to address this challenge. For example, the combination of multiregion molecular profiles and histologic data was recently used to draw a link between lung adenocarcinoma mutational heterogeneity and microenvironment composition, with relevant implications for the adoption of current immunotherapies in this disease (21, 22). However, in spite of their prognostic relevance, comprehensive molecular profiles of morphologically diverse regions in the same patient are missing. Indeed, the molecular features of lung adenocarcinoma histologic patterns and the molding of their tumor microenvironment are largely unknown.
Here, we performed histopathology-guided multiregion sampling from primary human lung adenocarcinoma to dissect tumor regions corresponding to unique histologic patterns. Multiomics and spatially resolved molecular profiles of these regions allowed us to define tumor-intrinsic and tumor-extrinsic processes that determined lung adenocarcinoma pattern progression. We validated and evaluated the prognostic significance of our results in more than 2,000 lung adenocarcinoma samples from independent patient cohorts. Importantly, none of these processes could be traced back to specific genetic variants, but rather to epigenetic and transcriptional reprogramming. Overall, we identified oncogenic processes and spatial features that support nongenetic evolution as a driver of lung adenocarcinoma heterogeneity and progression.
Molecular Interpatient Heterogeneity of Histologic Patterns
We first examined molecular features of 206 lung adenocarcinoma samples from The Cancer Genome Atlas (TCGA) cohort (5), which had been annotated based on their most prevalent pattern: lepidic (n = 8), papillary (n = 47), acinar (n = 86), and solid (n = 65; Supplementary Table S1). Samples with the same most prevalent pattern had a similar representation of tumor stages (Supplementary Fig. S1A), but solid-prevalent tumors exhibited significantly higher TMB(Fig. 1B), consistent with recent observations in an independent cohort (27), and number of copy-number alterations at coding genes (Supplementary Fig. S1B). A notable exception to this trend was a lepidic-annotated tumor sample (TCGA-44-7670). However, after reviewing the virtual slides provided for this data set, we found highly different histologic patterns in the tumor region submitted for molecular profiling (01A-TS1) and the one submitted for pathology review (01Z-DX), suggesting that intratumor heterogeneity could explain this inconsistency (Fig. 1C). Predicted neoantigens increased proportionally with the TMB (Supplementary Fig. S1C), potentially predicting diverse immunogenicity among the histologic patterns. No recurrent genetic lesion (mutation, copy-number alteration, or gene fusion) was found enriched in a specific pattern, except for a few PIK3CA mutations mostly occurring in lepidic samples (three of eight patients, adj. P = 0.0004) and a trend for a higher fraction of TP53 mutations in solid samples (adj. P = 0.096; Supplementary Fig. S1D; Supplementary Table S2). The association between solid-prevalent tumors and high TMB and TP53 mutations was confirmed in an independent data set of patients of East Asian ancestry (EAS) with lung adenocarcinoma (ref. 28; Supplementary Fig. S1E and S1F) and in a recently analyzed clinical cohort (27). Conversely, no association was found in these cohorts for PIK3CA mutations. Additional analyses to test candidate weak drivers (29) or alterations converging on the same pathway (30) did not return significant hits that could be confirmed across data sets (Supplementary Table S3). Overall, our results suggest limited associations between histologic patterns and lung adenocarcinoma genetic features.
In contrast, TCGA samples with different prevalent patterns exhibited highly diverse transcriptional and epigenetic profiles (Supplementary Table S3), with at least 2-fold more differentially expressed genes or methylated probes than expected by chance (Fig. 1D and E). Interestingly, the most differentially expressed genes (n = 1,337, adj. P < 0.001) and methylated DNA loci (n = 1,753, adj. P < 0.001) among the four histologic subtypes did not highlight features unique to each group but rather progressive changes from lepidic- to solid-prevalent samples. To quantify this trend, we computed gene expression and DNA methylation fold changes between each pair of histologic subtypes, always comparing a more aggressive to a less aggressive subtype (Supplementary Fig. S1G). In this way, progressive changes from lepidic to papillary, acinar, and solid cases would result in all fold changes having the same sign: all positive for increasing expression/methylation or all negative for decreasing expression/methylation with pattern progression (e.g., see the top differentially expressed genes RAP1GAP and ANLN; Fig. 1F). Indeed, concordant positive or negative gene expression (Fig. 1G, top) or DNA methylation (Fig. 1G, bottom) fold changes were observed in the majority of the cases, suggesting that histologic patterns do not represent four independent molecular phenotypes, but rather a transition from lepidic to solid, driven by epigenetic and transcriptional reprogramming.
Differentially expressed genes and methylated gene promoters were enriched for similar functional categories (Supplementary Table S4). Indeed, genes overexpressed in lepidic compared with solid samples were enriched for cell differentiation, development, and morphogenesis terms, whereas genes overexpressed in solid compared with lepidic cases were highly enriched for cell proliferation and markers of immune infiltration (Fig. 1H). Transcriptional differences were confirmed in the EAS data set (Supplementary Fig. S1H) and in an additional lung adenocarcinoma cohort (ref. 31; Supplementary Fig. S1I). Similarly, promoter probes that increased DNA methylation with pattern progression were enriched for genes involved in cell differentiation and morphogenesis, whereas probes that lost methylation with pattern progression were enriched for immune cell markers (Fig. 1I), further suggesting that aggressive patterns are associated with changes in the tumor microenvironment. To corroborate this finding, we estimated the presence of distinct nontumor cell populations from transcriptional data (32). Lepidic samples were enriched for lung alveolar and epithelial markers, supporting a similar cell identity between lepidic cancer cells and normal lung tissue, whereas both lymphoid and myeloid immune cell types were invariably enriched in acinar- and solid-prevalent samples, in both the TCGA (Fig. 1J) and EAS (Supplementary Fig. S1J) cohorts. Overall, these results indicated that lung adenocarcinoma pattern progression is associated with a progressive reprogramming of both tumor cells and their microenvironment. However, molecular profiles analyzed so far were generated from single tumor samples annotated by predominant pattern; hence, it remained unclear whether similar features and plasticity could be observed within individual tumors.
Molecular Intratumor Heterogeneity of Histologic Patterns
To determine the molecular features of histologic pattern progression within individual tumors, we selected a cohort of 10 early-stage lung adenocarcinoma primary patient samples that each exhibited at least two distinct patterns (CHUV cohort; Supplementary Table S5) and performed histopathology-guided multiregion sampling. For each patient, we reviewed and dissected tumor regions from formalin-fixed paraffin-embedded (FFPE) tissue slides such that each region was composed by a unique pattern (Fig. 2A). In total, we collected 29 tumor regions and 10 normal tissue samples. These samples were processed by whole-exome sequencing, RNA sequencing (RNA-seq), and DNA methylation EPIC array (see Methods). Lung adenocarcinoma driver mutations were predominantly clonal, i.e., observed in all regions, and not associated with a specific pattern (Fig. 2B). In most cases, we confirmed a trend for higher TMB in more advanced patterns (Supplementary Fig. S2A). After accounting for patient-specific features, differentially expressed genes and methylated probes clustered together samples annotated for the same pattern (Fig. 2C; Supplementary Fig. S2B; Supplementary Table S6). Transcriptional differences among patterns in our cohort were consistent with those observed in the TCGA and EAS cohorts (Supplementary Fig. S2C and S2D; Supplementary Table S7). Indeed, genes overexpressed in lepidic samples were enriched for tissue development and morphogenesis (Fig. 2C, blue cluster), whereas solid but especially acinar samples exhibited overexpression of immune infiltration markers, in particular of B cells (Fig. 2C, orange cluster). Genes overexpressed in solid samples were more specifically enriched for markers of cell proliferation and overexpression of matrix metallopeptidase (MMP) genes (Fig. 2C, red cluster). Both lepidic- and solid-associated genes were enriched for extracellular matrix (ECM) components and regulators (Fig. 2C), albeit exerting opposite functions (33). Indeed, ECM genes upregulated in solid samples were mostly enriched for ECM degradation (e.g., MMP genes) and collagen proteins (e.g., COL1A1 and COL1A2), whose activation is known to alter cell adhesion and promote invasion (34). Vice versa, ECM genes overexpressed in lepidic samples included several proteins mediating cell adhesion (ref. 35; e.g., TNXB, FBLN5, and MFAP4), and putative tumor suppressors (e.g., DLC1, ref. 36; and FOXF1, ref. 37). Importantly, expression of these genes was associated with pattern progression within individual tumors (Fig. 2D). Similarly, immune infiltration predicted from gene expression (32) increased from lepidic to solid pattern (Fig. 2E) within 8 of 10 patients, and intratumor patterns showed a different enrichment for markers of normal lung tissue (enriched in lepidic) and immune cell markers (enriched in acinar and solid; Supplementary Fig. S2E). Altogether, transcriptional and epigenetic differences observed among patients classified by predominant pattern paralleled expression and methylation changes observed within individual tumors. Importantly, these differences pointed to both tumor-intrinsic (differentiation, migration, proliferation) and tumor-extrinsic (immune infiltration) processes as key determinants of lung adnocarcinoma histologic patterns.
Cancer Cell Plasticity Underlies Pattern Progression
To explore tumor-intrinsic features of pattern progression, independent of the extent of immune infiltration, we analyzed single-cell RNA-seq data for three lung adenocarcinoma samples (38). Differential expression analysis between tumor and nontumor cells allowed us to extract 2,410 genes that were highly expressed only in tumor cells (cancer-specific genes, Fig. 3A; see Methods). First, we selected cancer-specific genes that were significantly differentially expressed between lepidic and solid tumor regions in our cohort (adj P < 0.1 and absolute fold change > 2) to determine lepidic (n = 36) and solid (n = 21) cancer cell markers (Fig. 3B). These genes confirmed the enrichment for cell proliferation (solid) and differentiation (lepidic) terms (Fig. 3C; Supplementary Table S8). Next, using these genes as cancer cell markers of lepidic and solid patterns, we derived a transcriptional score for each single cancer cell to quantify their lepidic-like or solid-like transcriptional state (see Methods). Single-cell transcriptional scores from these patient samples showed a transition of states consistent with plastic reprogramming (Fig. 3D): cells from sample S1 exhibited predominantly lepidic features, sample S2 instead harbored tumor cells that lost lepidic markers and exhibited variable expression of solid markers, and lastly sample S3 comprised cells spanning the whole transition from lepidic to solid (Fig. 3D). To explore the origin of these transcriptional changes, we algorithmically predicted which master transcriptional regulators (TR) were most likely to modulate differentially expressed genes between lepidic and solid samples (39). Results in the TCGA and our cohorts were extremely concordant (Fig. 3E) and identified, among solid master TRs, cell-cycle regulators such as E2F transcription factors, minichromosome maintenance (MCM) complex components, which regulate DNA replication and elongation, and the Forkhead Box M1 (FOXM1) transcription factor, which is a key regulator of cell proliferation and overexpressed in several cancer types (40). Among lepidic master TRs, we found genes associated with tumor-suppressive functions, such as the circadian repressor CRY2, which degrades the MYC oncogene (41), and the zinc-finger transcription factor ZBTB4 (42), as well as transcription factors involved in cell differentiation and development, such as CASZ1 (43, 44), and the YAP repressor WWC1 (45, 46). In both the TCGA and our cohorts, lepidic master TRs and lepidic cancer cell markers exhibited on average higher promoter DNA methylation in solid samples (Supplementary Fig. S3A), suggesting that downregulation of lepidic TRs and markers is at least in part driven by epigenetic silencing. Interestingly, data from a high-throughput CRISPR knockout screening (47) revealed that, in lung adenocarcinoma cell lines, loss of TRs enriched in the solid pattern was largely deleterious and many, though not all, were classified as essential genes, due to their role on cell proliferation (Fig. 3F). Conversely, in the same cells, knockout of TRs enriched in the lepidic pattern led to moderate effects on cell viability and sometimes even improved cell fitness (Fig. 3F), consistent with a putative tumor-suppressive function.
Next, we combined cancer-specific lepidic and solid markers to generate a unique mRNA signature and quantify lepidic-to-solid transition (L2S signature). L2S signature scores in TCGA and CHUV samples were consistent with patient and intratumor classifications and pattern progression (Supplementary Fig. S3B and S3C) and, indeed, normal lung tissues had the lowest scores, followed by lepidic, papillary, acinar, and finally solid samples, which on average had the highest scores. Interestingly, L2S scores correctly predicted the pattern of the misannotated TCGA sample (TCGA-44-7670; Fig. 1C) and, unlike the classification based on predominant pattern, it stratified TCGA samples in classes with significantly different prognosis (Fig. 3G; Supplementary Fig. S3D). This signature gave us the possibility of estimating pattern progression and assessing its prognostic value in a much larger ensemble of lung adenocarcinoma tumors, where transcriptional profiles were available, but histopathology annotations were not. In total, we analyzed and scored >2,000 lung adenocarcinoma human samples, from 10 patient cohorts (5, 23, 28, 31, 48–51). Multivariate Cox regression confirmed that tumor stage and L2S scores were orthogonal and independent prognostic factors in all except one of the tested cohorts (i.e., cohorts comprising more than 100 patients; Fig. 3H; Supplementary Fig. S3E; Supplementary Table S1). Furthermore, across all cohorts, L2S scores were strongly associated with the predicted activity of lepidic and solid TRs (Supplementary Fig. S3F) and microenvironment composition (Fig. 3I). Intriguingly, the highest correlation between L2S scores and immune cell markers was with markers of T-cell exhaustion, suggesting that mechanisms of immune evasion occur in tumor samples with solid pattern features.
The Tumor Microenvironment of Lung Adenocarcinoma Histologic Patterns
The reproducible association between our L2S signature and immune infiltration across independent lung adenocarcinoma patient cohorts (Fig. 3I) prompted us to investigate the spatial composition of the tumor immune microenvironment in correspondence of different patterns. First, we analyzed FFPE tumor tissue slides from our patient cohort and from three additional patients with solid patterns by multicolor immunofluorescence to detect proliferating cells (Ki-67+), B cells (CD20+), CD4+ and CD8+ T cells, and macrophages (CD68+). We distinguished lung adenocarcinoma patterns and tumor cells by hematoxylin and eosin (H&E) staining (Fig. 4A) and TTF1 staining (Fig. 4B), respectively, and quantified fluorescent signal intensities (Fig. 4C) by designing a spatial grid quantification approach (GridQuant) that averaged fluorescence signals within pixels of variable size (Fig. 4D; see Methods). These analyses revealed striking differences in the extent and geographic organization of immune cell infiltration across lung adenocarcinoma patterns. Solid regions exhibited significantly stronger Ki-67 intensity than the other patterns, whereas immune cell markers increased intensity with pattern progression but were highest in acinar regions (Fig. 4E). Interestingly, in several tumors we observed the formation of tertiary lymphoid structures (TLS; Fig. 4F), sometimes characterized by a Ki-67–positive core of proliferating B cells (Fig. 4F, right) resembling germinal centers. TLS formation has been associated with improved prognosis and response to immunotherapies (52, 53); hence, we assessed their distribution across patterns in our samples. We automatically identified all TLS in our slides and found that these were absent in normal lung tissue and lepidic cancer regions but prevalently observed within acinar regions and, less frequently, in papillary and solid regions (Fig. 4G). Altogether, these results suggested that immune infiltration increased with pattern progression but was maximal in acinar and not in solid patterns.
Next, we investigated the spatial organization of the tumor microenvironment in different patterns by assessing colocalization of tumor and nontumor cells. TTF1+ and TTF1− signals were positively correlated in normal lung and lepidic regions, likely due to the presence of cell-depleted lung alveolar structures, lacked correlation in papillary and acinar regions, but were highly anticorrelated in solid regions (Fig. 4H), consistent with low intermixing of cancer and noncancer cells. Similarly, colocalization of immune cell markers and Ki-67, which here could be used to mark tumor cells (Supplementary Fig. S4A), was lowest at solid regions independent of the pixel size (Supplementary Fig. S4B). Consistent with these trends, we noticed that lymphoid cells and macrophages localize at the boundary of solid regions within individual tumor slides (Fig. 4I). To quantify these observations, we used GridQuant to extract average signal intensities at different distances from the periphery of each solid tumor region toward its core (Fig. 4J). In all cases, the density of immune cells was higher at the periphery than at the core of the tumor region (Fig. 4K and L), indicating that the spatial distribution of immune cells in solid patterns was consistent with an immune-excluded phenotype.
To corroborate this evidence and explore in more detail the molecular profiles and immune microenvironment of the core and periphery of lung adenocarcinoma, we performed digital spatial profiling (DSP; NanoString GeoMX) in five tissue slides from five patients. Briefly, in each slide, we selected and analyzed with a panel of 58 antibodies 12 regions of interest (ROI; n = 60 in total), located at either the core or periphery of different histologic patterns (Supplementary Table S9; Supplementary Fig. S5A). ROI localization was not associated with immune infiltration, measured by either ratio of CD45-positive cells or protein expression, except for solid ROIs (Fig. 5A). Indeed, out of six solid ROIs from patient 8 (Fig. 5B), two were localized at the core of the tumor (R9 and R8) and had the lowest levels of immune infiltration, and four were selected at the tumor periphery (R6, R7, R10, R12) and all exhibited high immune infiltration (Fig. 5A). Solid core ROIs expressed high levels of cancer cell–specific markers (PanCK and EPCAM), Ki-67, and the interleukin 7 receptor (IL7R or CD127; Fig. 5C). Although IL7R can be expressed by both tumor and immune cells, the low content of immune cells in these ROIs suggested that IL7R was here expressed by cancer cells. Importantly, cancer cell expression of IL7R has been associated with poor prognosis in non–small cell lung cancer (54). Conversely, periphery ROIs exhibited high expression of immune cell markers, including the immunosuppressive regulatory T-cell (Treg) marker TIM3 and the immune checkpoint VISTA (Fig. 5C). The highly different extent of immune infiltration at the core and periphery of the solid tumor region challenged the possibility of comparing features specific to either cancer cells or immune cells. To overcome this challenge, we profiled 36 additional ROIs from three solid tumor regions, and, in each ROI, we separately analyzed immune cells (CD45+) and cancer cells (PanCK+; Fig. 5D; Supplementary Fig. S5B). By selectively retaining the signal coming from either one or the other cell population (see an example of the masking strategy on Fig. 5D), we first compared cancer cells at the core and periphery of solid tumor regions. Here, we found that cancer cells at the periphery actually exhibited significantly higher levels of the proliferation marker Ki-67 than cancer cells at the core of the tumor, consistent with an invasive margin, as well as Pan-AKT and p53 (Fig. 5E; Supplementary Table S9). Next, although immune infiltration was low at the core of solid ROIs, specific comparison of CD45+ cells at the core and periphery showed that immune cells infiltrated at the core of solid regions were significantly enriched for markers of immunosuppressive effector Tregs, such as FOXP3, CD25, and TIM3, and immune checkpoints, such as CTLA4, VISTA, and ICOS (Fig. 5F; Supplementary Fig. S5C). Overall, whereas all solid tumors exhibited features of immune exclusion, the residual immune infiltration was consistent with an immunosuppressive microenvironment associated in particular with the presence of effector Tregs.
Cancer heterogeneity across and within patients is apparent at both the molecular and histologic levels. However, how and whether genomic features determine cell morphology and spatial organization is largely unexplored. Here, we reconciled molecular and histologic heterogeneity in lung adenocarcinoma by introducing an approach based on histopathology-guided multiregion sampling. Our results showed evidence of nongenetic mechanisms of tumor evolution as determinants of histologic heterogeneity and disease progression. Indeed, progression from lepidic to solid histology was associated with plastic reprogramming of differentiation cell markers, increased cell proliferation, and a transition from an immune desert (lepidic) to an inflamed (papillary and especially acinar) and eventually excluded and suppressive (solid) microenvironment (Fig. 6). Importantly, the transition of both cancer cell–intrinsic features and microenvironment composition was evident within individual tumors and matched intratumor pattern heterogeneity.
The concomitant evidence of plastic reprogramming of cancer cells and changing microenvironment prompts questions on the origin of such changes. Does tumor cell dedifferentiation activate specific immune shaping and responses? Or are dynamic changes of the tumor immune microenvironment triggering cancer cell plasticity? To address these questions, models of lung adenocarcinoma that mimic transcriptional, epigenetic, and morphologic features of the human disease are required. Mouse models of non–small cell lung cancer recapitulate some of the histologic patterns observed in the human disease (55, 56), but the molecular features of these patterns remain to be investigated. Interestingly, recent evidence has shown that lung adenocarcinoma progression in a genetically engineered mouse model (GEMM) is accompanied by plastic reprogramming driving cell dedifferentiation (57, 58). A detailed comparison of the cell state transitions that we observed in our human cohort with those in the lung adenocarcinoma GEMM will be important to investigate the possibility of genetically and therapeutically manipulating specific TRs driving lung adenocarcinoma progression. Moreover, cross-talks between the tumor microenvironment and changes in cancer cell epigenetic features have been observed in the context of neoantigen presentation, cytokine production, and epigenetic regulation of PD-L1 expression (59, 60). However, it is challenging to establish the relative timing and causative interactions between cancer cell reprogramming and immune surveillance. Toward this goal, detailed single-cell spatial characterization of tumor molecular profiles from primary patient samples or longitudinal analysis of tumor spatial features could inform and complement functional assays in experimental models.
Intriguingly, even by selecting intratumor regions characterized by a unique pattern, we did not find evidence of markers discriminating the four patterns into separate and independent classes. Instead, our results suggested a transition between two extreme states, lepidic and solid, with papillary and acinar as possible intermediate states. To quantify this transition, we proposed a transcriptional signature (called L2S signature) derived from the comparison of pure lepidic and pure solid tumor regions. Importantly, L2S scores were independent predictors of patients' overall survival and immune infiltration in multiple independent lung adenocarcinoma cohorts and highlighted sample misannotations due to intratumor heterogeneity. With the expanding use of molecular profiling technologies for diagnostic purposes, signatures such as the one we identified could provide a complement to histopathology. In particular, intratumor pattern heterogeneity is prevalent in early-stage lung adenocarcinoma. With the recent success and possible increased adoption of screenings to detect the disease (61, 62), cases diagnosed at an early stage are expected to increase. It will be critical to discriminate those more likely to progress or relapse after treatment, surgery and/or radiotherapy, and better select those needing adjuvant treatment and type of therapy.
In advanced/metastatic adenocarcinoma, immunotherapy as a single agent or in combination with other drugs is now the treatment of choice in an important portion of cases (63). Furthermore, immune-checkpoint inhibitors (ICI) in the neoadjuvant setting are currently of great interest in non–small cell lung cancer, as shown by recent results with the PD-1 inhibitor nivolumab, which led to major pathologic response in 45% of the patients (64). Additional clinical trials on early-stage tumors are ongoing, and results are expected this year (65, 66). As new data will become available, it will be interesting to test the association between lung adenocarcinoma histologic pattern composition or signature and response to ICI in both adjuvant and neoadjuvant setting.
The emergence of histologic heterogeneity with disease progression is not a feature exclusively observed in lung adenocarcinoma. Evidence of morphologic changes has been reported in several tumor types such as breast cancer (67) and hepatocellular carcinoma (68). In these tumor types, integrating histopathology-guided multiregion sampling with molecular profiling could prove to be an effective strategy to study the evolution of the disease. In particular, recently developed spatial genomics technologies need to be coupled with computational approaches able to exploit spatial information to provide novel insight on interactions between tumor and nontumor cells and on how such interactions shape cancer cell identity. Overall, reconciling molecular and phenotypic heterogeneity is a critical first step to understand and integrate genetic and nongenetic mechanisms of cancer evolution.
Details of all statistical analyses and tests performed and referred to in the main text and Methods are outlined in Supplementary Table S2. Standard statistical tests (χ2, Fisher, Kruskal–Wallis, t, Wilcoxon rank-sum, correlation coefficients) were performed using the appropriate functions from R “stats” package. Dunn and Tukey post-hoc tests were performed with R packages “FSA” (v 0.8.22; https://github.com/droglenc/FSA) and “multcomp” (v1.4-13; 69), respectively. Multiple hypothesis corrections were made using the Benjamini–Hochberg procedure. All analyses were performed implementing custom scripts in bash and R (v3.4 and 3.5) languages.
TCGA Data Set
Molecular and clinical data for the TCGA lung adenocarcinoma cohort (TCGA-LUAD) were downloaded from the Genomic Data Commons (GDC; ref. 70) and GDAC FireHose (https://gdac.broadinstitute.org/) repositories. The data set included somatic point mutations (whole-exome sequencing, MAF file version: mc3 v0.2.8; ref. 71), copy-number changes (Illumina SNP6 array, segmentation files, and gene-level copy number generated with GISTIC; ref. 72), gene expression profiled by RNA-seq (Illumina HiSeq, HTSeq raw counts, and FPKM-normalized values), DNA methylation data (Illumina Infinium HM450k array, beta values), H&E-stained images, and clinical data. Neoantigen counts were retrieved from https://gdc.cancer.gov/about-data/publications/panimmune (ref. 73; metric: “numberOfBindingExpressedPMHC”). Only primary and normal samples were considered; in case multiple samples for the same patient were available, the sample with the latest plate number was retained as recommended by the GDAC guidelines. All data were generated and processed by the TCGA research network (74). Consistent predominant histologic pattern annotation for a subset of tumors (N = 206) was obtained by merging clinical tables from GDC, GDAC, and Supplementary Table S1 from TCGA-LUAD 2014 manuscript (74). Patients ambiguously annotated to different patterns in different clinical tables were excluded from this pattern-annotated subset.
CHUV Data Set
We retrieved from the database of the Pathology Institute of the Lausanne University Hospital resected stage I and II lung adenocarcinoma. Thirteen adenocarcinomas were selected (Supplementary Table S5) fulfilling following criteria: (i) presence of at least two of the main histologic patterns of lung adenocarcinoma (lepidic, acinar, papillary, and solid); (ii) sufficient material available with appropriate surface and delineation of each pattern allowing microscopic dissection and material harvesting for molecular and imaging analyses. For every patient, an FFPE block of normal lung tissue distant from the tumor was selected. Two to five tumor regions/FFPE blocks were also selected for every patient each with a specific dissectible pattern. For three patients (1, 3 and 8), two nonadjacent regions of the same pattern were also selected for intratumor comparisons. Five- to 20-μm thick unstained slides were obtained from every selected block. Slides were dried and deparaffinized using Xylol and Ethanol (100% and 70%) and stained with Toluidine Blue 0.024% isopropanol 30%. Macroscopic and microscopic dissection was performed, and unwanted tissue (nonpattern-specific and nontumor tissue) was removed from the slides. Four-micrometer-thick slides were also taken before, after, and in between the dissected slides and stained with H&E to improve morphologic control. Slides taken at the same time were subsequently used for molecular and imaging analyses. Local Ethical Committee approval was obtained to perform all mentioned analyses, under authorization No. 2017-00334.
Other Data Sets
All additional data sets were generated and processed as described in the corresponding publications and are summarized in Supplementary Table S1. Chen/EAS (28) data set including somatic point mutations, copy-number changes, gene expression profiled by RNA-seq, H&E-stained images, and clinical data were downloaded from OncoSG (75, 76). Ding (31) gene-expression data set was downloaded from the UCSC Xena browser (77). TRACERx (12) raw RNA-seq data for each tumor region were downloaded from the European Genome-Phenome Archive (ID: EGAS00001003458) and processed as described below in “RNA-seq and Data Processing”; lymph node metastases were excluded; access was provided by the authors upon request. Micke (48), Yokota (49), Beg (50), Shedden (51), and Pintilie (78) gene-expression data sets were downloaded from the Gene Expression Omnibus (accession numbers, respectively: GSE37745, GSE31210, GSE72094, GSE68465, and GSE50081).
Whole-Exome Sequencing and Data Processing
Whole-exome sequencing was performed on 27 tumor regions and 9 adjacent normal lung specimens by Genewiz with a protocol optimized for FFPE samples. Briefly, genomic DNA samples were fragmented into 200–500 base pair fragments; libraries were prepared using the Agilent SureSelect Exome library preparation kit and sequenced on Illumina HiSeq in High Output mode with a 2 × 150 bases paired-end sequencing configuration. On average, the total number of reads per sample was around 98 million, with a mean quality score of 38.42 and 92% of bases with quality ≥ 30. Sequencing reads were checked for quality using FastQC v0.11.7, trimmed with TrimGalore v0.4.5 to remove Illumina universal adapter contamination (parameters: -q 15 –phred33 –illumina –length 20 –paired –retain_unpaired -r1 24 -r2 24) and aligned to human genome [hg38 build, downloaded from GATK (79) resource bundle] using bwa-mem (80) v0.7.17. Duplicates were removed with Picard tool MarkDuplicates (v2.18.4). Reads were processed with GATK v184.108.40.206 Best Practices workflow (81) using the tools AddOrReplaceReadGroups, BaseRecalibrator, ApplyBQSR, and AnalyzeCovariates (see https://gatk.broadinstitute.org/hc/en-us/articles/360035535912 for details) and tagging known variant sites with the VCF files dbsnp_146.hg38.vcf, Mills_and_1000G_gold_standard.indels.hg38.vcf, af-only-gnomad.hg38.vcf, Homo_sapiens_assembly38.known_indels.vcf (downloaded from GATK resource bundle). The BAM files thus processed were indexed with samtools v1.6 and used for somatic variant calling.
Variant Calling and Filtering
Somatic point mutations and short insertions–deletions for each tumor region were called with GATK v220.127.116.11 tool Mutect2 using the matched normal lung sample from the same patient (the only exceptions were tumor regions of patient 8, for which variants were called in tumor-only mode) and a panel of normal samples. Additional parameters used were “–af-of-alleles-not-in-resource ‘0.0000025’” and “–disable-read-filter MateOnSameContigOrNoMappedMateReadFilter.” Variants were filtered using FilterMutectCalls, CollectSequencingArtifactMetrics and FilterByOrientationBias (–artifact-modes “G/T” and “C/T”); the latter was designed specifically to filter out transitions likely resulting from FFPE-related deamination of cytosines. Resulting VCF files were converted into MAF files using vcf2maf (https://github.com/mskcc/vcf2maf) v1.6.16 with default parameters. Variants were then tagged with OncoKB (82) for oncogenicity and retained only if they satisfied all of the following conditions: (i) GnomAD population frequency < 0.01, (ii) not being tagged by Mutect2 filters panel_of_normals, artifact_in_normal, germline_risk, str_contraction, multiallelic and clustered_events, (iii) variant allele frequency in the tumor sample is at least twice greater than the one in the normal sample, (iv) tumor depth is greater than 6 and Mutect2 filter value is “PASS” or, alternatively, the same variant is shared by more than one tumor region of the same patient. The rationale behind these choices consists in not filtering out variants whose evidence is supported by independent regions (point iv), provided that they are not germline (points ii and iii). Known oncogenic variants were manually verified and recovered using IGV genome browser (83) and inspecting aligned RNA reads.
RNA-seq and Data Processing
RNA-seq was performed on 29 tumor regions by Genewiz with a protocol optimized for FFPE samples, which involved ribosomal RNA depletion and 2 × 150 bases paired-end sequencing with Illumina HiSeq. On average, the total number of reads per sample was around 65 million, with a mean quality score of 38 and 91% of bases with quality ≥ 30. Sequencing reads were checked for quality using FastQC v0.11.7, trimmed with TrimGalore v0.4.5 to remove Illumina universal adapter contamination (parameters: -q 15 –phred33 –illumina –length 20 –paired –retain_unpaired -r1 24 -r2 24) and processed with RSEM (84) v1.3.0, performing the following steps: (i) alignment with STAR (85) v2.5.4b to human genome (hg38) using GTF annotation file “gencode.v27.primary_assembly.annotation.gtf” (downloaded from GATK resource bundle) and default RSEM parameters; (ii) removal of reads mapping to tRNA and rRNA regions (retrieved using the UCSC table browser); (iii) estimation of isoform-level and gene-level expression as transcripts per million (TPM) and RSEM-expected counts using rsem-calculate-expression. A batch effect detected with principal component analysis was corrected in the TPM expression matrix using the ComBat function implemented in the R package sva (86) v3.30.1. Conversions between Ensembl IDs and gene symbols were performed using BioMart (87).
DNA Methylation Array and Data Processing
Pattern-specific tumor samples were extracted as described in the section “CHUV data set” and collected in deparafinization solution from the GeneRead DNA FFPE Kit (cat. #180134, Qiagen); DNA was extracted according to the manufacturer's instructions. DNA concentration was assessed using the Qubit High Sensitivity Assay, and DNA quality was monitored using the Infinium HD FFPE QC Assay (cat. #WG-321-1001, Illumina), according to the manufacturer's instructions. Samples for which at least 1 μg was available and with good overall quality (Delta Cq from the Infinium HD FFPE QC Assay lower than 5) were selected for bisulfite conversion. Bisulfite conversion was performed on 1 μg of DNA using the EZ DNA Methylation Kit (cat. # D5001, Zymo Research), using the alternative protocol for CT conversion (optimized for the Illumina Infinium Methylation Assay). FFPE restoration was then performed using the Infinium HD FFPE DNA restore kit (cat. #WG-321-1002; Illumina), and samples were processed using the Infinium MethylationEPIC 850k Kit (iGE3, University of Geneva). Raw signal intensities were processed, quantile-normalized, and converted into beta values using the R package minfi (88) v1.28.4. Probes were annotated using the hg38 EPIC manifest file generated by Zhou and colleagues (ref. 89; version of September 2018) and filtered according to the corresponding masking column. Probes mapping to sex chromosomes and having a detection P value above 0.01 were also removed, thus obtaining a final set of 730,257 probes. A batch effect detected with principal component analysis was corrected using the ComBat function implemented in the R package sva (86) v3.30.1. Probes were annotated to FANTOM5 (90) gene promoters, and promoter methylation for each gene was computed as the average beta value of all probes mapping to the gene main promoter (p1 or p).
Copy-Number Alterations and Fusion Calling
Genetic Alteration Differences among Patterns
The fraction of genome altered for each sample was computed as the fraction of genes having a gene-level GISTIC value equal to 2 or −2. Differences among patterns in terms of tumor mutational burden and fraction of genome altered were tested with Kruskal–Wallis test followed by post hoc Dunn test. The potential confounding role of purity in the association between mutational burden and patterns was assessed by performing a main-effects ANCOVA with the log-scaled number of mutations as dependent variable, purity (CPE values from ref. 92) as continuous covariate, and histologic pattern as categorical independent variable. In the TCGA pattern-annotated data set, two lists of driver genetic alterations were inspected for differences among patterns: (i) a published binary genomic alteration matrix (93), which included point mutations, copy-number changes, and gene fusions (results shown in figures); (ii) a list of drivers, which included “weak drivers” (as described in ref. 29) obtained by running FunSeq2 (94) algorithm as web service and retaining coding and noncoding variants with score ≥1.5. Driver alterations occurring in at least three samples were then tested for differences among the four patterns with a χ2 test. Residuals were inspected in order to determine which pattern was enriched for a given genetic event. P values were adjusted for multiple hypotheses with Benjamini–Hochberg procedure. FunSeq2 was also applied to the EAS data set, and the same downstream analysis was performed. For pathway-level analysis, driver alterations called by FunSeq2 were used and the following steps were performed: (i) relevant pathways in cancer and their gene components annotated as “oncogenes,” “tumor suppressor genes,” or “unknown” were retrieved from ref. 30; (ii) a pathway was called “altered” in TCGA pattern-annotated data set if at least one gene was altered in FunSeq2 calls or the gene harbored GISTIC-based deep deletion (in case of “tumor suppressor genes” or “unknown”) or amplification (in case of “oncogenes” or “unknown”); (iii) pathway alterations were then tested for differences among the four patterns with a χ2 test, residuals were inspected in order to determine which pattern was enriched for a given altered pathway, and P values were adjusted for multiple hypotheses with the Benjamini–Hochberg procedure.
Differential Expression Analyses
Differential expression analyses among patterns were performed on RNA-seq read counts (HTSeq counts for TCGA and RSEM expected counts for Chen and CHUV data sets) using R packages limma v3.38.3 (95) and edgeR (96) v3.24.3 with a standard published pipeline (97). Only genes expressed (counts per million > 1) in at least 3 (CHUV) or 50% (TCGA and Chen) of samples of any pattern were tested. P values were adjusted using the Benjamini–Hochberg FDR-controlling procedure. Pairwise pattern comparisons were performed with limma function decideTests. In the CHUV data set, the patient corresponding to each tumor region was inserted as covariate in the limma model. For purely graphical purposes, in the heat maps the patient-specific batch effect was removed with limma function “removeBatchEffect.” For TCGA, a null model was constructed by randomly permuting the assignment of patterns to samples and reperforming differential expression analysis on 100 random permutations. Differential expression analysis on the Ding data set was performed on microarray gene-expression profiles between samples with ≥50% (solid-like) and <50% (lepidic-like) of solid pattern prevalence.
Differential Methylation Analyses
Differential methylation analyses were performed on M-values, which were derived from beta values, on all probes and with the same pipeline as for expression data. Pairwise pattern comparisons were performed with limma function decideTests. In the CHUV data set, the patient corresponding to each tumor region was inserted as covariate in the limma model. For purely graphical purposes, in the heat maps the patient-specific batch effect was removed with limma function “removeBatchEffect.” For TCGA, a null model was constructed by randomly permuting the assignment of patterns to samples and reperforming differential methylation analysis on 100 random permutations.
Gene Ontology Analyses
Gene ontology analyses were performed on differentially expressed genes and genes targeted by differentially methylated promoters using MSigDB (98) and Gene Ontology gene sets (ref. 99; Biological Process and Molecular Function categories), using an FDR cutoff of 0.01 and retrieving a maximum of 100 categories.
Extraction of Lung Adenocarcinoma Tumor and Nontumor Markers
Genes expressed preferentially in lung adenocarcinoma tumor cells were extracted from a previously published lung cancer single-cell RNA-seq data set (38), which included three lung adenocarcinoma patients (two from the “discovery cohort” and one from the “validation cohort”), in the following way: (i) For the two “discovery cohort” patients, genes × cell expression matrix and cell IDs for each cell type including cancer cells were downloaded from ArrayExpress (ID: E-MTAB-6149) and SCope repositories, and the expression matrix was filtered such that only single cells coming from tumor regions belonging to the two patients with lung adenocarcinoma were retained; (ii) for the tumor regions of the patient with lung adenocarcinoma in the “validation cohort,” only raw FASTQ files were available (ArrayExpress ID: E-MTAB-6653), and thus they were aligned, filtered, and processed as described in the Lambrecht and colleagues study (38) using CellRanger v3.0.2 and Seurat v3.0.0; clusters were detected with Seurat function “FindClusters” (resolution = 0.5) and 12 of 16 of them could be assigned to the main immune/stromal cell types using the expression of the markers reported in Supplementary Fig. S1 of (38); of the remaining four clusters, two showed increased expression of several keratin genes and they were thus assigned to cancer cells; (iii) for each of the three patients separately, markers for cells classified as “cancer” were extracted using the function “FindMarkers” (with min.pct = 0) of Seurat (100); (iv) markers were further filtered to have adjusted P < 0.05 and average log2(fold change) > 0.25; (v) in order to better accommodate for interpatient heterogeneity in tumor cell expression, the union of the three lists of patient-specific cancer markers was extracted as the final list. This procedure yielded a list of 2,410 cancer-specific genes. Markers of lung alveolar and epithelial cells were extracted with FindMarkers applied to the lung adenocarcinoma samples of the “discovery cohort” data set using more stringent thresholds (adjusted P < 0.0001 and average log2(fold change) > 10) in order to increase specificity.
Quantification of Immune Cell Infiltration
Methylation-based immune cell infiltration fractions for TCGA-LUAD samples were downloaded from a previous study (92). Bulk RNA-seq deconvolution was performed with consensusTME (32) implemented in the corresponding R package (v 0.0.1.9000, parameters: cancer = “LUAD,” statMethod = “ssgsea”). Three cell types were added to the available ones (T cells exhausted, lung epithelial cells, and lung alveolar cells), and their score in each sample was quantified in the same way as done by consensusTME, namely, by computing single sample gene set enrichment analysis [ssgsea, implemented in GSVA (101) R package v1.30.0 with ssgsea.norm = T] using markers of each of these cell types. Markers for T-cell exhaustion used were PD-1, PD-L1, LAG3, TIGIT, PD-L2, B7-H3 (CD276), HAVCR2 (TIM3), CD244, CTLA4, CD160 (ref. 102; https://www.rndsystems.com/product-highlights/adoptive-cell-transfer-monitor-t-cell-exhaustion); markers of lung epithelial and alveolar cells were extracted as described above in “Extraction of Lung Adenocarcinoma Tumor and Nontumor Markers.”
Construction and Scoring of Lepidic-to-Solid Signature
A differential expression analysis restricted to the 2,410 lung adenocarcinoma cancer-specific genes was performed between lepidic and solid samples of the CHUV data set. Differentially expressed genes were extracted by filtering for adjusted P < 0.1 and absolute log2(fold change) > 1, thus obtaining 36 cancer-specific lepidic markers and 21 cancer-specific solid markers. For bulk RNA-seq data sets, sample-wise enrichment scores for these markers were computed using the singscore (103) R package (v1.0.0). Singscore outputs a unified score for the complete signature (“TotalScore”) as well as scores for the upregulated (lepidic) and downregulated (solid) genes separately. The sign of scores relative to the lepidic marker set was changed for graphical reasons (such that lepidic-like samples would harbor higher lepidic marker signature scores). For the single-cell RNA-seq data set, a different strategy was adopted in order to account for the high dropout rate of single-cell profiles: (i) cancer cells from the three patients with lung adenocarcinoma extracted as reported above in “Extraction of Lung Adenocarcinoma Tumor and Nontumor Markers” were further filtered to retain only cells with more than 2,000 genes expressed; (ii) a differential expression analysis was performed between the 10 most lepidic-like TCGA patients (i.e., harboring the lowest “TotalScore” computed as described above) and the 10 most solid-like TCGA patients (with highest “TotalScore”), restricting the set of genes tested to the list of 2,410 cancer-related genes; (iii) differentially expressed genes were again extracted by filtering for adjusted P < 0.1 and absolute log2(fold change) > 1, in this case obtaining 279 cancer-specific lepidic markers and 215 cancer-specific solid markers, due to the higher statistical power achieved with increased sample size (most of the lepidic and solid markers obtained from the CHUV data set as reported above were among these augmented lepidic and solid marker lists, and none was found in the wrong list); (iv) this augmented signature was used to score single cells obtained at point (i) for lepidic-like or solid-like features using AUCell (ref. 104; with default parameters), a tool specifically designed for scRNA-seq data sets.
Purity Correction of DNA Methylation Profiles
Tumor purity was estimated from DNA methylation profiles in the CHUV data set using the Lump algorithm (92). Briefly, we retrieved DNA methylation of purified leukocytes profiled with Illumina 850k EPIC array (105), and probes with a beta value below 0.1 in all leukocyte samples were intersected with probes having a beta value above 0.85 in the top seven TCGA purest samples as estimated by Lump, which yielded a set of 24 probes methylated in tumor cells and unmethylated in immune cells. Finally, purity in heterogeneous samples was computed as mean beta value of these 24 probes. Purity estimates were very correlated with ConsensusTME's immune scores (Spearman r = 0.87), and different thresholds yielded consistent results. Next, in order to estimate tumor cell–specific methylation, beta values observed for a mixture of tumor and immune cells were modeled as a linear combination of tumor and immune cells beta values, i.e., for probe i: βobserved[i] = pβtumor[i] + (1 − p) βimmune[i], where p is the purity computed as described. βtumor was estimated in the following way: (i) variance of beta values across all leukocyte samples was computed for each probe, and probes showing a variance greater than 0.01 were discarded; (ii) βimmune for the remaining probes was computed as the mean beta value in leukocyte samples; (iii) βtumor was thus computed with the equation above, and probes whose purity-corrected values were not within [0,1] were discarded. The rationale behind step (i) was to select probes for which a reliable estimate of leukocyte methylation could be obtained. The conservative choices at steps (i) and (iii) resulted in a decrease of the number of probes for which tumor cell–specific methylation could be computed (571,349 for the CHUV and 91,228 for the TCGA data sets), with the advantage, however, of a higher reliability for the purity correction. Differential methylation analyses (as described above) were performed on purity-corrected methylation profiles to assess methylation differences shown in Supplementary Fig. S3A.
Master Transcriptional Regulator Activity Analysis
Virtual Inference of Protein-activity by Enriched Regulon analysis (ref. 39; implemented in the “viper” R package, v1.16.0) was used to estimate TR activity, following guidelines contained in the user manual. Briefly, “msviper” function takes as input (i) a coexpression regulatory network estimated with ARACNe methodology (106), for which an lung adenocarcinoma-specific regulatory network was downloaded as a Bioconductor package (DOI: 10.18129/B9.bioc.aracne.networks); (ii) a gene-expression signature generated with viper function “rowTtest” between two biological conditions; and (iii) a null model obtained with viper function “ttestNull” and 1,000 random permutations. The output is a list of TRs driving the biological conditions of interest and their corresponding enrichment P value, FDR, and normalized enrichment score. VIPER analyses were thus performed on lepidic and solid samples of CHUV and TCGA data sets separately, restricting the gene-expression matrix in input to contain only cancer-specific genes (see “Extraction of Lung Adenocarcinoma Tumor and Nontumor Markers” above). Lastly, TR activity was estimated in each single sample of all data sets using the “viper” function again restricted to cancer-specific genes.
Survival analyses were performed using the R package survival v 2.44-1.1. Cox regression models included sex, age, and stage (numeric) as covariates. Signature scores were recalibrated such that hazard ratios represented the effect of a 10% increase of the signature value from its theoretical minimum (−1) to its theoretical maximum (+1).
Analyses of Genetic Dependency Screenings
CRISPR-based gene dependency scores and gene expression of cell lines were downloaded from the DepMap portal (https://depmap.org/portal/download/, CRISPR/Avana: “Achilles_gene_effect.csv,” expression: “CCLE_expression.csv”) along with cell line annotations. Only cell lines with lineage_sub_subtype = NSCLC_adenocarcinoma were investigated. Essential genes were also retrieved from the DepMap portal (union of “common_essentials.csv” and “Achilles_common_essentials.csv”). TRs that were differentially active in lepidic versus solid in the CHUV and TCGA cohorts were tested for dependency.
H&E, TTF1, and Multicolor Immunofluorescence Staining
H&E staining was performed with standard protocol to retrieve the histologic patterns on all samples that were used for molecular or imaging assays. TTF1 staining was performed using mouse anti-TTF1 antibody, clone 8G7G3/1 (Invitrogen 18-0221) on Roche-Ventana Benchmark Ultra using the following protocol: (i) retrieval in Ventana Cell Conditioning 1 solution at 95°C for 64 minutes; (ii) incubation for 32 minutes at 37°C, with the 8G7G3/1 being diluted 1/15; (iii) application of Ventana ultraView Universal DAB Detection Kit followed by Ventana Hematoxylin as a nuclear counterstain. The multicolor immunofluorescence assay was performed using the Ventana Discovery ULTRA automate (Roche Diagnostics). All steps were performed automatically with Ventana solutions except if mentioned. Dewaxed and rehydrated paraffin sections were pretreated with heat using the CC1 solution for 40 minutes at 95°C. Primary antibodies were applied and revealed sequentially either with a rabbit Immpress HRP (ready to use; Vector Laboratories) or a mouse Immpress HRP (ready to use; Vector Laboratories) followed by incubation with a fluorescent tyramide. A heat denaturation step was performed after every revelation. The primary antibody sequence was: rabbit anti-CD8 (clone: Sp57, fluorophore: R6G), rabbit anti-CD4 (clone: Sp35, fluorophore: DCC), mouse anti-CD68 (clone: KP-1, fluorophore: R610), rabbit anti-Ki-67 (clone: 30-9, fluorophore: Cy5) and mouse anti-CD20 (clone: L26, fluorophore: FAM). All sections (H&E, TTF1, and multicolor immunofluorescence) were mounted with FluoromountG (Bioconcept) and scanned at 20× magnification using an Olympus VS120 whole slide scanner equipped with specific filters.
Spatially Resolved Cell Quantification Framework (GridQuant)
H&E, TTF1, and multicolor immunofluorescence-stained images were visualized using QuPath (107) v0.2.0 software. Histologic patterns were assessed and drawn with QuPath on H&E images and annotations were then transferred with minor adjustments to the nearby adjacent TTF1-stained and immunofluorescence-stained images. Regions that were too small were discarded. Next, quantifications and downstream statistical analyses were performed by developing a gridding framework named GridQuant, which involved the following steps for each image: (i) automated cell detections on QuPath with available algorithms; (ii) acquisition of the coordinates of histologic pattern boundaries drawn on the image; (iii) setup of a pixel grid spanning the entire image, with tunable grid spacing (pixel size); (iv) for each cell type, summarization of cell detections at the pixel level as counts of the number of cells whose centroid fell within each pixel of the grid, thus obtaining one matrix for each cell type and for each pixel size considered; (v) various downstream spatially resolved statistical analyses across cell types and histologic patterns (described in the following paragraphs). For TTF1-stained images, cell detection algorithm used on QuPath was “positive cell detection” and cell types investigated were TTF1-positive and TTF1-negative. For multicolor immunofluorescence-stained images, “Watershed cell detection” algorithm was used to detect macrophages (CD63+ cells), CD4 T cells (CD4+ cells), CD8 T cells (CD8+ cells), B cells (CD20+ cells), and proliferating cells (Ki-67+ cells). In order to accommodate for potential variability in signal intensities among cell types and slides, the parameters of these cell detection algorithms were tuned with visual inspection in each slide and for each cell type. Steps (i) and (ii) of GridQuant were implemented in Groovy programming language; steps (iii), (iv), and (v) were implemented in R v3.5.
GridQuant: Cell Density across Patterns
Cell densities for each pixel were computed as the pixel counts divided by the pixel area. Distributions of densities across patterns were derived aggregating all pixels annotated to each pattern across all slides.
GridQuant: Cell Type Colocalizations across Patterns
For each pattern-annotated region, the colocalization between cell types X and Y was computed as the Spearman correlation coefficient between X and Y densities across all pixels falling within the region boundary. Pixels having total cell densities (summing densities of all available cell types) below the fifth percentile or below 200 N/mm2 were discarded as likely representing empty regions corresponding to alveoli.
GridQuant: Solid Pattern Boundary Analysis
Regions annotated to the solid pattern were partitioned into three internal subregions according to their distance from the boundary delimiting solid-annotated region and other patterns, including the normal lung (boundaries between the solid-annotated region and uncertain transitioning patterns or the external cut delimiting the end of the slide were removed). The first subregion was constituted by the pixels displaced at a distance between 0 and 0.5 mm, the second at a distance between 0.5 and 1 mm, and the third was composed of all the remaining internal pixels (core). Moreover, an external region with pixels at a distance between 0 and 0.5 mm outside the boundary was also considered (Fig. 4J). The mean and distribution of densities of each cell type were then computed across all pixels falling within each of these four regions.
In Situ Detection and Quantification of Tertiary Lymphoid Structures (TLS-Finder)
TLSs were modeled as clusters of B cells detected with immunofluorescence staining. An automatic detection pipeline (named TLS-finder) was developed to quantify their presence across histologic patterns. TLS-finder involved the following steps: (i) GridQuant framework was used to generate high-resolution (pixel size = 20 mm) matrices of B-cell counts; (ii) each matrix was imported in Fiji (108) image analysis software as text image; (iii) B-cell count matrices were binarized using Fiji “Convert to mask” function; (iv) TLSs were modeled as connected components of the binarized B-cell matrices, which were detected and labeled using Fiji plugin “Find connected regions,” with a minimum number of pixels to call a connected component set to 25 (corresponding to a minimum TLS size of 10,000 mm2); (v) Labeled TLSs were then processed with R scripts for downstream statistical analyses. The density of TLSs across patterns was computed as the number of distinct TLSs divided by the area of each pattern-annotated region (Fig. 4G).
Digital Spatial Profiling
Two runs of highly multiplexed and spatially resolved proteomic profiling of tumor and immune cells were performed with the GeoMx Digital Spatial Profiler (NanoString) as previously described (52) on five (batch1) and three (batch2) FFPE tissue sections. In batch1, immunofluorescence assays were performed using antibodies against CD3, CD20, CD45, and DAPI, and the multicolor images were used to guide the selection of 12 ROIs for each slide; for each ROI, digital counts from barcodes corresponding to protein probes (52 immune and tumor-related proteins; Supplementary Table S9) were obtained using nCounter (NanoString). In each ROI, automated cell detection was performed on the immunofluorescence images to count nucleated cells (DAPI-positive) and, among them, CD45-positive cells. Immune ratio was computed as the ratio between the number of CD45-positive cells and DAPI-positive cells. In batch2, Pan-cytokeratin (PanCK), CD45, CD3, and DAPI antibodies were used; from each ROI, two areas of interest (AOI) were extracted with image segmentation, one containing pixels positive for PanCK (tumor compartment) and one for CD45 (immune compartment); digital counts were then obtained for each AOI separately (73 immune and tumor-related proteins; Supplementary Table S9); in PanCK-positive and CD45-positive AOIs, only tumor-related and immune-related proteins, respectively, were tested in downstream analyses. Levels of three housekeeping proteins (GAPDH, histone H3, and S6) and three negative controls (Ms IgG1, Ms IgG2a, and Rb IgG) were also measured. Digital counts for each protein were normalized with internal spike-in controls (ERCC) and signal-to-noise ratio (SNR), i.e., the ratio between the ERCC-normalized counts of the protein and the geometric mean of the negative controls assayed in the ROI/AOI considered. In all downstream analyses, only proteins having SNR>2 in at least three ROIs/AOIs were tested. ROIs/AOIs were annotated according to their histologic pattern and location with respect to the pattern boundary (center/core or periphery) using adjacent H&E-stained tissue sections. Some ROIs were also taken from TLS but were not used for the analyses presented in this study. Differences between locations in solid pattern regions were tested with t test (batch1, one solid pattern region and six ROIs) and with two-way ANOVA controlling for sample and testing separately AOIs from immune and tumor compartments (batch2, three samples with one solid pattern region in each of them, 26 total AOIs for each tissue compartment).
Data and Code Availability
All data sets analyzed and the corresponding accession numbers are reported in Supplementary Table S1. Data generated in this study have been deposited in two Zenodo repositories, one containing raw images for H&E, TTF1, and immunofluorescence staining and DSP data (DOI: 10.5281/zenodo.3941450) and one containing processed molecular data (somatic mutations, gene-expression tables, and DNA methylation beta values; DOI: 10.5281/zenodo.4443496). Source code for GridQuant and TLS-finder pipelines is available in two public GitHub repositories (https://github.com/CSOgroup/GridQuant and https://github.com/CSOgroup/TLS-finder).
D. Tavernari reports grants from Fondation Emma Muschamp during the conduct of the study. E. Dheilly reports working for Ichnos Sciences, a pharma company, for the last 11 months. The transition from academia to the company occurred after the contribution to this work. M. Mina reports working for SOPHiA GENETICS, a biomedical company, for the last nine months. The transition from academia to the company occurred after the contribution to this work. S. Peters reports personal fees from AbbVie, Amgen, AstraZeneca, Bayer, Beigene, Biocartis, Boehringer Ingelheim, BMS, Clovis, Daiichi Sankyo, Debiopharm, Eli Lilly, Roche/Genentech, Foundation Medicine, Illumina, Incyte, Janssen, Medscape, MSD, Merck Seono, Merrimack, Novartis, Pharmamar, Phosphopatin Therapeutics, Pfizer, Regeneron, Sanofi, Seattle Genetics, and Takeda outside the submitted work. No other disclosures were reported.
D. Tavernari: Conceptualization, data curation, software, formal analysis, visualization, methodology, writing–original draft, writing–review and editing. N. Riggi: Data curation, funding acquisition, and investigation. E. Oricchio: Data curation, supervision, and investigation. I. Letovanec: Conceptualization, data curation, supervision, funding acquisition, and investigation. G. Ciriello: Conceptualization, formal analysis, supervision, funding acquisition, visualization, writing–original draft, project administration, writing–review and editing. E. Battistello: Data curation and investigation. E. Dheilly: Data curation and investigation. A.S. Petruzzella: Data curation and investigation. M. Mina: Data curation and formal analysis. J. Sordet-Dessimoz: Resources. S. Peters: Resources. T. Krueger: Resources. D. Gfeller: Data curation and formal analysis.
We would like to thank Leila Akkari and Luca Magnani for critical reading of the manuscript and insightful discussions. This work was supported by the Gabriella Giorgi-Cavaglieri foundation (G. Ciriello), University of Lausanne Interdisciplinary Research Grant (G. Ciriello, I. Letovanec, and N. Riggi), the Swiss National Science Foundation (SNSF; grant 310030_169519; G. Ciriello and D. Tavernari), and Emma Muschamp foundation (D. Tavernari).