Purpose:

The presence of a high degree of tumor-infiltrating lymphocytes (TIL) has been proven to be associated with outcome in patients with non–small cell lung cancer (NSCLC). However, recent evidence indicates that tissue architecture is also prognostic of disease-specific survival and recurrence. We show a set of descriptors (spatial TIL, SpaTIL) that capture density, and spatial colocalization of TILs and tumor cells across digital images that can predict likelihood of recurrence in early-stage NSCLC.

Experimental Design:

The association between recurrence in early-stage NSCLC and SpaTIL features was explored on 301 patients across four different cohorts. Cohort D1 (n = 70) was used to identify the most prognostic SpaTIL features and to train a classifier to predict the likelihood of recurrence. The classifier performance was evaluated in cohorts D2 (n = 119), D3 (n = 112), and D4 (n = 112). Two pathologists graded each sample of D1 and D2; intraobserver agreement and association between manual grading and likelihood of recurrence were analyzed.

Results:

SpaTIL was associated with likelihood of recurrence in all test sets (log-rank P < 0.02). A multivariate Cox proportional hazards analysis revealed an HR of 3.08 (95% confidence interval, 2.1–4.5, P = 7.3 × 10−5). In contrast, agreement among expert pathologists using tumor grade was moderate (Kappa = 0.5), and the manual TIL grading was only prognostic for one reader in D2 (P = 8.0 × 10−3).

Conclusions:

A set of features related to density and spatial architecture of TILs was found to be associated with a likelihood of recurrence of early-stage NSCLC. This information could potentially be used for helping in treatment planning and management of early-stage NSCLC.

See related commentary by Peled et al., p. 1449

Translational Relevance

The presence of a high degree of tumor-infiltrating lymphocytes (TIL) has been found to be associated with better prognosis in non–small cell lung cancer (NSCLC). There is a moderate agreement between pathologists when assessing the degree of TIL presence from histopathologic tissue specimens. In this study, we present a new set of computer-extracted quantitative features (spatial TIL, SpaTIL) related to the spatial architecture of TILs, the colocalization of TILs and cancer nuclei, and the density variation of TIL clusters from hematoxylin and eosin images. We demonstrate that SpaTIL can predict the likelihood of recurrence in early-stage NSCLC. Compared with clinicopathologic features, these features were independently prognostic of disease recurrence. Following additional, independent, multisite validation, this SpaTIL test could be used in a manner similar to genomic-based companion diagnostic tests to identify those patients who have a higher chance of recurrence and who might thus potentially receive added benefit from adjuvant chemotherapy following surgical resection.

Early-stage (stages I and II) non–small cell lung cancer (NSCLC; refs. 1, 2) is typically treated with complete surgical excision. However, even after resecting the entire tumor mass, 30% to 55% of patients develop disease recurrence within first 5 years following surgery (3). The ability to identify patients who are at a higher risk of recurrence could help selecting those patients who may gain maximum benefit with further treatment including adjuvant chemotherapy following standard-of-care surgery.

NSCLC histopathology is characterized by a complex interplay of tumor cells, immune cells (lymphocytes, plasma cells, macrophages, and granulocytes), fibroblasts, and pericytes/endothelial cells. Recent evidence suggests that the interaction of tumor cells with immune cells has a high association with likelihood of disease progression and influences tumor development, invasion, metastasis, and patient outcome (1, 2). Several independent studies (2, 4–6) meanwhile have also shown an association between patient survival and treatment response with an increased density of tumor-infiltrating lymphocytes (TIL) in diverse solid tumor types. In addition, there is substantial evidence supporting the fact that increased TIL density is associated with better chemotherapeutic response (7–10).

Studies have also found substantial interreader variability in estimating TIL density using hematoxylin and eosin (H&E)–stained slides. This has limited the routine use of TIL density as a metric in the clinic as a prognostic marker for NSCLC. Brambilla and colleagues (6), for instance, determined that interpathologist agreement was at best moderate (Kappa = 0.59) in quantifying TILs on tissue slides. Although attempts have been made to establish guidelines for standardizing TIL grading in breast cancer (11), these efforts have been lagging in lung cancer.

Over the last few years, there has been increasing interest in developing automated cell segmentation and detection algorithms for identifying and quantifying the extent of TILs from routine H&E pathology slide images (12–14). These approaches can be broadly categorized into (i) involving extraction of visual features (12), (ii) employing morphologic operations (15, 16), and (iii) using deep learning–based models (17). However, these approaches have primarily focused on either counting the individual TILs (13, 14) or estimating the TIL grade, typically as being low, moderate, or high (12, 15, 16).

Interestingly, there has also been a recent surge in looking at spatial patterns of TILs and investigating their relationship with disease outcome. Multiplexed quantitative immunofluorescence (QIF) and IHC-based methods have been employed for objectively identifying TIL subtypes and correlate the spatial arrangement and density of these TIL subtypes with overall survival (OS) in NSCLC (4, 18). For instance, Schalper and colleagues (4) found out that increased levels of CD3 and CD8 TILs were significantly associated with improved 5-year OS. Similarly, Barua and colleagues (18) showed that spatial interplay between tumor and regulatory T cells was associated with OS in NSCLC. Furthermore, Liu and colleagues (19) demonstrated that the presence of CD8+ and FOXP3+ TILs was correlated with the response of platinum-based neoadjuvant chemotherapy in advanced NSCLC.

Interestingly, computer-extracted features of spatial patterns and morphologic attributes of TILs from routine H&E slides also appeared to be prognostic. Basavanhally and colleagues (15) explored the use of graph network algorithms to spatially characterize the arrangement of machine-identified TILs in HER2+ breast cancer H&E images to predict TIL grade (i.e., high or low). Yu and colleagues (20) and Luo and colleagues (21) extracted quantitative morphologic features of nuclei and the surrounding cytoplasm from H&E tissue images of early-stage NSCLC patients (e.g., area, shape, intensity, texture, density) for predicting survival. Saltz and colleagues (22) used a deep learning model to identify patches of TILs in images, which were subsequently clustered using different similarity metrics. From such patch clusters, different indices were computed (Ball and Hall, Banfield and Raftery, C, determinant ratio, etc.) which was then found to be correlated with patient survival across different tumor types.

Given the recent evidence that the colocalization of immune and cancer nuclei is prognostic of disease outcome (20–23), this work aims to develop and evaluate new computer-extracted spatial TIL (SpaTIL) features relating to (i) the spatial architecture of TIL clusters, (ii) colocalization of clusters of both TILs and cancer nuclei, and (iii) variation in density of TIL clusters across the tissue slide image. Specifically, our goal was to evaluate the association between disease recurrence and the SpaTIL features on patients with stage I and II of NSCLC. In addition, we also sought to compare the SpaTIL features in terms of their ability to predict recurrence in these patients with NSCLC against the manually estimated degree of TILs by two thoracic pathologists.

Datasets

Tissue microarrays (TMA) obtained from H&E-stained slides collated from three independent and well-characterized early-stage NSCLC cohorts were included in this study, representing a total of 301 patients. The three cohorts are represented by D1 (n = 70), D2 (n = 119), and D3 (n = 112). A fourth dataset, named D4 (n = 112), was also included, containing tissue cores corresponding to the same patients in D3 but extracted from different regions of the tumor. The corresponding clinicopathologic and outcome information from patients in D1 to D4 was obtained from retrospective chart review from the institutions at which the datasets were collated after obtaining the respective Institutional Review Board approval.

Patients from D1 and D2 comprised formalin-fixed paraffin-embedded tumor sections from previously reported retrospective collections of NSCLC patients (4). Note that 0.6-mm cores from each tumor were then arrayed to make up the TMA. A further 116 patients provided two cores from the same tumor, to make up TMA datasets D3 and D4. The standard TMA preparation protocol is described in ref. (24). D1 and D2 were scanned and digitized using an Aperio Scanscope CS whole-slide imager at 20x magnification. D3 and D4 were scanned and digitized at 20x using a Ventana iScan HT Scanner (serial #: BI15N7205). Finally, a 1500 × 1500 pixel image at 20× magnification was extracted and used to represent a unique tumor sample derived from each patient.

D1 was employed for feature discovery and model training. This dataset included samples from 350 patients and was collected independently at Sotiria General Hospital and Patras University General Hospital between 1991 and 2001. D2 and D3 were used for independently validating the trained classifier. D2 comprised samples from 202 patients and was collected at Yale Pathology between 1988 and 2003. D3 and D4 (tissue cores corresponding to same patients in D3 but from a different portion of the tumor) comprised tissue images from 189 patients and was collected at the Cleveland Clinic between 2004 and 2014. D3 and D4 were used to quantitatively assess the ability of the approach to deal with intratumoral heterogeneity. Supplementary Fig. S1 illustrates the inclusion and exclusion criteria for patient selection for this study.

Automatic characterization of TILs

Identification of lymphocytes.

The first step in the process was to identify the spatial location of the TILs on digitized H&E images. A watershed-based algorithm (25) was used for automatically detecting the nuclei. This method applies a set of mathematical operations (fast radial symmetry transform and regional minima) at different scales to identify candidate locations for nuclei. This method was chosen based on its documented advantages, including simplicity, speed, and the ease with which parameters can be adjusted and fine-tuned.

Once nuclei were detected, the approach described in ref. (26) was used to distinguish lymphocytes from nonlymphocytes. This approach takes advantage of the fact that TILs tend to be smaller compared with cancerous nuclei, and they also tend to be more rounded and with a darker, more homogeneous staining. Once all the candidate nuclei were identified via the watershed approach described above, a machine classifier using 7 image-derived features related to texture, shape, and color attributes of the segmented nuclei was used to classify the individual nuclei as corresponding either to TILs or to non-TILs (Fig. 1B and E).

Figure 1.

Representative TMA tissue spots of recurrent (top row) and nonrecurrent (bottom row) early-stage NSCLC cases. The first column (A and D) shows the original H&E-stained images. Identification of TILs (yellow) and non-TILs (cyan) is presented in the second column (B and E). The third column (C and F) illustrates the qualitative representation of one of the SpaTIL features overlaid on the original images, specifically, the variation in the density of lymphocyte clusters. The color bars represent the density measurement (H stands for highly dense clusters, whereas L stands for low-density or sparse clusters). Nonrecurrence cases are characterized by the presence of more high-density clusters, whereas recurrence cases were comprised of a larger number of low-density clusters.

Figure 1.

Representative TMA tissue spots of recurrent (top row) and nonrecurrent (bottom row) early-stage NSCLC cases. The first column (A and D) shows the original H&E-stained images. Identification of TILs (yellow) and non-TILs (cyan) is presented in the second column (B and E). The third column (C and F) illustrates the qualitative representation of one of the SpaTIL features overlaid on the original images, specifically, the variation in the density of lymphocyte clusters. The color bars represent the density measurement (H stands for highly dense clusters, whereas L stands for low-density or sparse clusters). Nonrecurrence cases are characterized by the presence of more high-density clusters, whereas recurrence cases were comprised of a larger number of low-density clusters.

Close modal

Quantitative evaluation of spatial arrangement of TILs.

Spatial TIL graph construction

A graph is a mathematical construct comprising of a finite sets of objects (nodes) that capture global and local relationships via pairwise connections (edges) between the nodes. Graphs have been previously used to quantitatively characterize nuclear architecture in histopathologic images by representing the nuclei as nodes and subsequently quantifying neighborhood relationships (e.g., proximity) and spatial arrangement between the nodes (15, 27, 28).

In order to evaluate a spatial network of TILs and to extract the corresponding SpaTIL features, we first identified sets of clusters of proximal TILs and non-TILs, respectively. We represented the centroids of each of the individual TILs and non-TILs as nodes of a graph. Using the approach described in refs. (27, 28), each node was connected to others based on the weighted Euclidean norm where a weighting function favors the connectivity between proximal nodes. Utilizing this process resulted in multiple disconnected subgraphs or clusters of TILs being generated. This process was also repeated separately for all the non-TILs.

SpaTIL features

We extracted two separate sets of SpaTIL features. The first set comprised of 20 features related to spatial arrangement of TILs, extracted from the TIL cluster graphs. These features included first-order statistics (e.g., mean, mode, median) of the following representative descriptors: number of lymphocytes within the clusters, ratio between the area of the TIL clusters and area of the TMA spot, and ratio between the numbers of TILs within the cluster and the cluster area. The second set included 65 features describing the relationship between TIL and non-TIL clusters extracted for each image. These included the ratio between the density (ratio between the number of nuclei within the cluster and the cluster area) of a nonlymphocyte cluster and the density of its closest lymphocyte cluster, the intersecting areas of the lymphocyte and nonlymphocyte clusters, a value indicating if the nearest neighbor of a lymphocyte cluster is either a lymphocyte or a nonlymphocyte cluster. The 85 features are listed in the Supplementary Material (see Supplementary Table S1).

Feature selection.

The minimum redundancy maximum relevance feature selection method (29) was employed to identify the SpaTIL features that most correlated with recurrence in the discovery set D1 while also eliminating features which were grossly similar to each other to prevent redundancy.

Comparative strategies

Interreader variability in TIL estimation by human readers.

Two expert thoracic pathologists experienced in grading TILs were asked to determine the infiltration grade for each of the images in D1 and D2 via visual evaluation of digitized H&E-stained images. An in-house custom web application was used by the readers to assign an infiltration score to each image. Infiltration options were defined based on findings reported in refs. (4, 6) as (0) no infiltration (virtual absence of TILs), (1) low (1%–33%), (2) moderate (34%–66%), and (3) high (67%–100%).

The agreement among pathologists during the TIL-grading task was measured via two quantitative indices: Spearman correlation coefficient (30) and Cohen Kappa coefficient (31). The Kappa index is a widely used measure to determine the agreement among a set of experts making categorical judgments, considering agreement may occur by chance.

Computer-based estimation of TIL density.

We also extracted TIL density-based (DenTIL) features and compared the prognostic performance of these DenTIL features against the SpaTIL features. The DenTIL features included ratio between the number of TILs and the TMA spot area, ratio between the total regions of the TMA spot covered by TILS to the total area of the corresponding TMA spot, ratio between the number of TILs and the number of non-TILs within a TMA spot, and a grouping value indicating how close the TILs are to each other (computed as the sum of the inverse distances between TILs).

Statistical analysis

Classification.

A quadratic discriminant analysis (QDA) classifier was trained using the top SpaTIL features (QS) identified from D1 to separate the patients into two classes: recurrence and nonrecurrence. We chose this classifier owing to the fact that it has no hyper parameters to tune and is able to learn quadratic boundaries. Therefore, it has been shown to be more flexible compared with linear classifiers. In addition, another QDA classifier was trained using DenTIL features (QD) on the training set D1. Following parameter optimization, the classifier was locked down using D1.

The performance of the locked down QDA classifiers QS and QD in distinguishing between early-stage NSCLC patients who did and did not have recurrence was evaluated on the independent validation sets D2, D3, and D4. Classifiers QS and QD assigned a probability of recurrence to each image in the test sets. Classifier performance was evaluated via the concordance statistic or C-index. The recurrence and nonrecurrence labels predicted by QS and QD were compared with the ground truth labels (true patient outcomes) to determine classifier accuracy and C-index. The C-index obtained for QS on D3 and D4 was quantitatively compared to evaluate the effect of spatial tissue sampling on the classifier performance.

Statistical and survival analysis.

Recurrence-free survival (RFS) was defined as the time interval between the date of diagnosis and the date of death or recurrence (whichever happened first). Patients who were still alive without recurrence at the last reported date were labeled as censored. Kaplan–Meier survival analysis was used to examine the difference of RFS between different patient groups categorized by the classifier output, and the difference of RFS in each group was assessed by the log-rank test. Multivariate Cox regression analysis was employed to examine the predictive ability of the QS and QD classifiers when controlling for the effects of clinical and pathologic parameters including gender, age, T stage, and N stage. P values were two-sided, and all values under 0.05 were considered to be statistically significant.

A Kaplan–Meier analysis was also carried out for the DenTIL approach and for the TIL density estimation by the human readers. In the case of manual estimation of TIL density, patients were split into two groups: High-TIL and Low-TIL. Because pathologists graded each case from 0 to 3, case patients with TIL categories of 0 to 2 were considered as being part of the low-TIL group and those with scores of 3 were grouped in as part of the high-TIL tumor category. This strategy was based on the approaches described in refs. (4, 6).

Clinicopathologic features of the patient cohorts

The median follow-up for patients was 40.91 months, 45.33 months, and 70.92 months for D1, D2, and D3, respectively. By the end of the study/follow-up, 34 of 70 patients (48.6%) in D1, 54 of 119 (45.4%) in D2, and 34 of 112 (30.4%) in D3 had developed recurrence. Table 1 presents a summary of clinical and pathologic features for the whole dataset (D1, D2, D3, and D4). Supplementary Table S3 presents the clinical and pathologic features of cohorts D1, D2, D3, and D4 individually.

Table 1.

Summary of clinical and pathologic features for the patients in the whole dataset

VariableSubvariablesN (%)
Number of patients  301 
Average age  64.3 ± 10.5 
Sex Male 176 (58.5) 
 Female 125 (41.5) 
N–pathologic 205 (68.1) 
 96 (31.9) 
T–pathologic 135 (44.9) 
 166 (55.1) 
Stage I/IA/IB 221 (73.4) 
 II/IIA/IIB 80 (26.6) 
Recurrence Nonrecurrence 179 (59.5) 
 Recurrence 122 (40.5) 
Tumor types Adenocarcinoma 135 (44.9) 
 Squamous cell carcinoma 89 (29.6) 
 Others 77 (25.6) 
VariableSubvariablesN (%)
Number of patients  301 
Average age  64.3 ± 10.5 
Sex Male 176 (58.5) 
 Female 125 (41.5) 
N–pathologic 205 (68.1) 
 96 (31.9) 
T–pathologic 135 (44.9) 
 166 (55.1) 
Stage I/IA/IB 221 (73.4) 
 II/IIA/IIB 80 (26.6) 
Recurrence Nonrecurrence 179 (59.5) 
 Recurrence 122 (40.5) 
Tumor types Adenocarcinoma 135 (44.9) 
 Squamous cell carcinoma 89 (29.6) 
 Others 77 (25.6) 

The images are all pretreatment tissue slides. The analysis is performed during the diagnosis phase (no treatment at all). Obviously, the treatment may influence the survival variable because of the inherent biological variability, i.e., each individual response is different, but once the patient is categorized, the treatment is the same for that particular category. The kind of treatment is shown in Supplementary Table S5.

Experiment 1: Prognostic ability of SpaTIL in early-stage NSCLC

Supplementary Table S2 and Supplementary Fig. S3 illustrate the top SpaTIL features identified via feature selection and their corresponding boxplots, respectively.

Figures 2C, 3C, and 4A and D illustrate the ROC curves and corresponding C-indices for the SpaTIL classifier (Qs) for predicting recurrence in NSCLC on D1, D2, D3, and D4. C-indices of the binary classifier for the 4 datasets were 0.70, 0.73, 0.70, and 0.71, respectively.

Figure 2.

Prognostic prediction results for human readers, QD, and QS for D1. A, Bar plot illustrating the Kappa index and correlation coefficient computed between readers 1 and 2. B, Kaplan–Meier curves for readers 1 and 2. C, ROC curve and corresponding confidence interval (CI) for QS. D, Kaplan–Meier plot for QS classifier using RFS as endpoint. E, Kaplan–Meier plot for QD classifier. The number of cases in each category is indicated in the charts.

Figure 2.

Prognostic prediction results for human readers, QD, and QS for D1. A, Bar plot illustrating the Kappa index and correlation coefficient computed between readers 1 and 2. B, Kaplan–Meier curves for readers 1 and 2. C, ROC curve and corresponding confidence interval (CI) for QS. D, Kaplan–Meier plot for QS classifier using RFS as endpoint. E, Kaplan–Meier plot for QD classifier. The number of cases in each category is indicated in the charts.

Close modal
Figure 3.

Prognostic prediction results for human readers, QD, and QS for D2. A, Bar plot illustrating the Kappa index and correlation coefficient computed between readers 1 and 2. B, Kaplan–Meier curves for readers 1 and 2. C, ROC curve and corresponding confidence interval (CI) for QS. D, Kaplan–Meier plot for QS classifier using RFS as endpoint. Nonrecurr., nonrecurrence; Pred., predicted. E, Kaplan–Meier plot for QD classifier. The number of cases in each category is indicated in the charts. Nonrecurr., nonrecurrence; Pred., predicted.

Figure 3.

Prognostic prediction results for human readers, QD, and QS for D2. A, Bar plot illustrating the Kappa index and correlation coefficient computed between readers 1 and 2. B, Kaplan–Meier curves for readers 1 and 2. C, ROC curve and corresponding confidence interval (CI) for QS. D, Kaplan–Meier plot for QS classifier using RFS as endpoint. Nonrecurr., nonrecurrence; Pred., predicted. E, Kaplan–Meier plot for QD classifier. The number of cases in each category is indicated in the charts. Nonrecurr., nonrecurrence; Pred., predicted.

Close modal
Figure 4.

Prognostic prediction results for QD and QS for D3 (top row) and D4 (bottom row). First column (A and D) shows the ROC curve and corresponding confidence interval (CI) for QS, second column (B and E) presents the Kaplan–Meier plots for QS classifier using RFS as endpoint, and third column (C and F) illustrates the Kaplan–Meier plot for QD classifier. The number of cases in each category is indicated in the charts. Nonrecurr., nonrecurrence; Pred., predicted.

Figure 4.

Prognostic prediction results for QD and QS for D3 (top row) and D4 (bottom row). First column (A and D) shows the ROC curve and corresponding confidence interval (CI) for QS, second column (B and E) presents the Kaplan–Meier plots for QS classifier using RFS as endpoint, and third column (C and F) illustrates the Kaplan–Meier plot for QD classifier. The number of cases in each category is indicated in the charts. Nonrecurr., nonrecurrence; Pred., predicted.

Close modal

Figures 2D, 3D, and 4B and E illustrate the Kaplan–Meier plots corresponding to the SpaTIL features for D1, D2, D3, and D4, respectively. Qs was found to be prognostic for D2 (P = 5.0 × 10−4, HR, 2.80; 95% confidence interval, 1.64–4.80), D3 (P value = 1.4 × 10−3; HR, 4.45; 95% confidence interval, 1.76–11.25), and D4 (P value = 0.02; HR, 4.45; 95% confidence interval, 1.26–8.46).

Significance of clinical and pathologic variables with patients' survival time in the test sets was evaluated via the log-rank test (Supplementary Fig. S2). A multivariate analysis, controlling the effect of major pathologic and clinical variables, was performed for the three test cohorts (D2, D3, and D4). Table 2 presents the results of the analysis for the three cohorts together, whereas Supplementary Table S4 shows the results for each cohort individually. Patients identified by the Qs classifier as having poor prognosis had statistically significantly worse disease-specific survival. The HR was 3.08 (95% confidence interval, 2.1–4.5, P = 7.3 × 10−5), meaning that patients with recurrence were approximately 3 times more likely to develop disease recurrence and die from the disease.

Table 2.

Multivariable survival analysis performed on the test sets (D2, D3, and D4) including SpaTIL

VariableHR (95% confidence interval)P value
Gender 
 Male vs. female 1.18 (0.85–1.65) 0.32 
T-stage 
 T1 vs. T2 0.98 (0.68–1.40) 0.90 
N-stage 
 N0 vs. N1 0.91 (0.59–1.41) 0.68 
Stage 
 Stage I vs. stage II 1.01 (0.65–1.58) 0.96 
Tumor subtypes 
 ADCs vs. SCC vs. others 0.92 (0.72–1.19) 0.53 
SpaTIL 
 Recurrence vs. nonrecurrence 3.08 (2.10–4.51) 7.3 × 10−5 
VariableHR (95% confidence interval)P value
Gender 
 Male vs. female 1.18 (0.85–1.65) 0.32 
T-stage 
 T1 vs. T2 0.98 (0.68–1.40) 0.90 
N-stage 
 N0 vs. N1 0.91 (0.59–1.41) 0.68 
Stage 
 Stage I vs. stage II 1.01 (0.65–1.58) 0.96 
Tumor subtypes 
 ADCs vs. SCC vs. others 0.92 (0.72–1.19) 0.53 
SpaTIL 
 Recurrence vs. nonrecurrence 3.08 (2.10–4.51) 7.3 × 10−5 

NOTE: Two-sided P < 0.05 (in bold) was considered as statistically significant.

Abbreviations: ADC, adenocarcinoma; SCC; squamous cell carcinoma.

Experiment 2: Comparison of human and machine-based assessment of TIL density for predicting recurrence in early-stage NSCLC

Figures 2A and 3A show the computed Spearman correlation and Kappa index values for the two pathologists for D1 and D2, respectively. The overall computed Kappa (considering both analyzed datasets) was 0.50. When computed independently for each dataset, Kappa indices were 0.38 and 0.57 for D1 and D2, respectively. On the other hand, the correlation coefficients were 0.61 for D1 and 0.79 for D2.

Figures 2B and 3B illustrate the Kaplan–Meier plots for both pathologists on D1 and D2, respectively. For reader 1, no statistically significant correlation between TIL estimation and outcome was observed for D1 (P = 0.14) nor D2 (P = 0.26). Conversely, for reader 2, a significant statistical correlation was observed for set D1 (P = 0.01) but not for set D2 (P = 0.07).

Figures 2E, 3E, and 4C and F illustrate the Kaplan–Meier plots corresponding to the QD classifier on D1, D2, D3, and D4, respectively. The QD classifier was found to have a statistically significant correlation between the classifier and patient outcome for D2 (P = 3.4 × 10−4); the predictions, however, were not statistically significant for D3 (P = 0.36) and D4 (P = 0.36), respectively.

Different studies reported the importance of immune response in different cancers (32). Such studies have demonstrated a high correlation between TILs density and both disease outcome and treatment response, particularly for lung, breast, ovarian, pancreatic, colorectal, and skin cancer, to name a few (1, 2, 6, 11–13, 22, 32, 33). Pathologists have aimed to quantify the number of immune cells and their relationships within a tumor. For example, Galon and colleagues (34) classified tumors with an Immunoscore that relates the density and location of immune cells within the tumor. They showed this score yields a prognostic value that can complement or even replace the standard tumor–node–metastasis classification in colorectal cancer. The authors have subsequently made a dedicated effort to promote the Immunoscore in routine clinical settings (see http://www.immunoscore.org/). Likewise, Salgado and colleagues (11), as part of the International Immuno-oncology Biomarker Working Group on Breast Cancer (see https://www.tilsinbreastcancer.org/), have performed the largest effort to date to establish a TIL-based quantification protocol. The authors constructed a set of guides that standardize the methodology for visual assessment of TILs on H&E slide sections. In lung cancer, however, there is a conspicuous lack of a standardized guideline for TIL scoring/use. Although different works have demonstrated the prognostic value of TILs in lung cancer, in each of these studies (4, 19), the degree of TILs was estimated by visual observation, a time-consuming, subjective, and often error-prone task. Moreover, several studies have shown a limited reproducibility in visual estimation of TILs (6, 12).

Computer-based approaches for automatic estimation of TILs (12–15) have helped to mitigate the subjectivity and low reproducibility associated with human TIL grading. However, different recent works have shown that, besides TIL density, the spatial location of immune cells is useful in predicting patient prognosis (32). Such studies appear to suggest that the spatial organization of lymphocytic infiltration in the context of nearby cancer cells is an important prognostic hallmark of certain types of tumors. This suggests that the study of the immune response with respect to patient outcome should take into account not only the quantity of immune cells, but also the spatial arrangement of the cancerous and surrounding immune cells (20, 21, 23, 35). Previous related studies have found a strong association between the spatial location of nuclei and surrounding cytoplasmic features with OS (20, 21) and RFS (4, 23) in patients with early-stage NSCLC.

In this work, we presented a set of features based on the spatial architecture of TILs (SpaTIL), devised to capture the TIL local density and variation in the density, architecture, and colocalization of TIL and cancerous cells.

On two TMA datasets comprising of 119 and 112 patients, the SpaTIL classifier yielded confidence intervals of 0.73 and 0.70, respectively. A Kaplan–Meier analysis along utilizing the log-rank test showed a strong association between the predictions of the SpaTIL classifier and recurrence for D2 (P = 5.0 × 10−4), D3 (P = 1.0 × 10−3), and D4 (P = 0.01). Likewise, a multivariate Cox proportional survival analysis revealed an HR of 3.08 (95% confidence interval: 2.1–4.5, P = 7.3 × 10−5).

Notably, the cohorts were obtained from different institutions with local (and presumably variable) tissue processing and preservation protocols. In addition, slides were stained in different institutions and digitalized with two different instruments. The latter appears to reflect the robustness of the SpaTIL classifier, its relative resilience to image, and color variance on account of major preanalytical variables and with samples from multiple different sites.

We also compared the prognostic performance of TIL estimation carried out by two human readers. A Kaplan–Meier analysis was conducted for each pathologist; results showed that no significant statistical correlation was found between Pathologist 1 and prognosis for any dataset (P > 0.05), whereas there was a significant statistical correlation between TIL grade estimation of Pathologist 2 and patient outcome for D1 (P = 6.0 × 10−3). In addition, the agreement among expert pathologists for D1 and D2 was found to be moderate (K = 0.50) and comparable with the values previously reported for NSCLC (K = 0.59; ref. 6) and breast pathology (K = 0.72; ref. 36).This moderate agreement might be due to the fact that TIL grading in lung pathology lacks a standardized scoring system; hence, each pathologist might preferentially focus on different areas of the tissue during examination (e.g., epithelium or stroma) or consider different cell populations within the “TIL” infiltration (e.g., mononuclear cells beyond lymphocytes such as plasmocytes and myeloid cells). Finally, different pathologists may have variable expertise evaluating immune cell infiltrates or natural individual variation in their perception of colors, shapes, and relative amounts/proportions. In contrast, the results obtained by using SpaTIL features are objectively measured and suggest that the spatial arrangement of TILs and tumor cells was strongly associated with recurrence in early-stage NSCLC (P < 0.05).

The two papers most closely related to the work presented here are that of Khan and colleagues (37) and Saltz and colleagues (22). Khan and colleagues (37) measured the degree of TIL infiltration for each core of a TMA as the ratio of lymphocytes to all cells. The infiltration score was found to be associated with poor prognosis in breast cancer, specifically in HER2-amplified/positive cases (P = 0.02). Saltz and colleagues (22) used a deep learning model to identify patches from whole slide images (WSI) containing TILs. From these patch clusters, cluster indices were calculated, namely Ball and Hall (38), Banfield and Raftery (39), C (40), determinant ratio (41), among others. Five associations between these cluster indices and patient survival were found to be significant for different tumor types: Ball–Hall for breast invasive carcinoma (P = 7 × 10−3), C-index for lung adenocarcinoma (P = 3.0 × 10−3), Banfield–Raftery for prostate adenocarcinoma (P = 0.01), Determinant Ratio for prostate adenocarcinoma (P = 0.01), and Banfield–Raftery for skin cutaneous melanoma (P = 1.0 × 10−3). Although the study by Saltz and colleagues (22) clearly suggests the prognostic relevance of spatial arrangement of TILs, the SpaTIL features capture not just the spatial arrangement of TILs, but also the spatial interplay and colocalization of TILs and cancer nuclei. Although studies involving IHC and QIF images (4, 18, 19) have shown the importance of looking at the spatial architecture and interplay of different TIL families, to the best of our knowledge, the SpaTIL approach represents the first attempt to capture the spatial architecture and arrangement of TILs and non-TILs from routine H&E images alone.

In addition, the approaches of both Khan and colleagues (37) and Saltz and colleagues (22) are actually more similar to the DenTIL classifier which we employed to compare against the SpaTIL features, one that invoked the ratio of lymphocytes to all cells in a TMA core and also included other TIL-based density features. Although the results of the DenTIL classifier were found to be statistically significant in predicting recurrence for D2, the results were not significant for either D3 or D4, hence showing the lack of reproducibility. The corresponding HR and confidence interval values for the DenTIL classifier were also found to be lower compared with the SpaTIL features across the validation sets D2, D3, and D4.

Our work did however have a few limitations. First, unlike the study of Saltz and colleagues (22), our approach was evaluated on TMAs and not using WSIs. The SpaTIL features were however evaluated and compared from tissue punches from different parts of the same tumor (D3 and D4). The SpaTIL features were found to prognostic of recurrence in both D3 and D4 (P ≤ 0.01), and the corresponding C-index and HRs were also found to be comparable (confidence interval = 0.70, HR: 4.45 for D3; confidence interval = 0.70, HR: 3.26 D4). These findings appear to be in concordance with previous studies involving TIL-based biomarkers that have suggested that results using TMAs are concordant with findings from WSIs (4, 11, 37). However, additional studies comparing the results in TMAs versus WSI will help establish the minimum amount of tumor tissue required for optimally calculating SpaTIL features that are prognostic of recurrence.

Future work will include additional analysis among larger patient cohorts and eventually using prospectively collected samples. Although our cohorts were not characterized using alternative prognostic molecular signatures for NSCLC (4, 18, 19), we anticipate the SpaTIL to be complementary with such tests. Additional studies will be required to determine the relative contributions, feasibility, and cost-effectiveness of each approach; and the possible role of such metrics to predict sensitivity/resistance to novel anticancer immunostimulatory therapies.

In summary, we presented a novel computational image-analysis–based approach that exploits a set of density and spatial topological features related to the arrangement of clusters of TILs and nonlymphocytes within the tumor area using standard H&E histology preparations. We identified a metric that was consistently associated with recurrence/prognosis in early-stage NSCLC. This work represents the first preliminary step in the development of an image-based companion diagnostic test to help identify early-stage lung cancer patients at increased risk for recurrence and hence potential candidates for adjuvant therapy (e.g., chemotherapy or immunotherapy) following standard-of-care surgery.

D.L. Rimm is a consultant/advisory board member for PAIGE. K.A. Schalper reports receiving commercial research grants from Moderna Therapeutics, Pierre Fabre, Surface Oncology, Takeda, Tesaro, and Vasculox, speakers bureau honoraria from Merck and Takeda, and is a consultant/advisory board member for Celgene, Moderna Therapeutics, and Shattuck Labs. A. Madabhushi holds ownership interest (including patents) in and is a consultant/advisory board member for Inspirata Inc. No potential conflicts of interest were disclosed by the other authors.

The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

Conception and design: G. Corredor, X. Wang, K. Syrigos, E. Romero, V. Velcheti, A. Madabhushi

Development of methodology: G. Corredor, X. Wang, Y. Zhou, C. Lu, E. Romero, V. Velcheti, A. Madabhushi

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): X. Wang, K. Syrigos, D.L. Rimm, M. Yang, K.A. Schalper, V. Velcheti, A. Madabhushi

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): G. Corredor, X. Wang, Y. Zhou, C. Lu, P. Fu, M. Yang, E. Romero, V. Velcheti, A. Madabhushi

Writing, review, and/or revision of the manuscript: G. Corredor, X. Wang, P. Fu, K. Syrigos, D.L. Rimm, E. Romero, K.A. Schalper, V. Velcheti, A. Madabhushi

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): X. Wang, D.L. Rimm, V. Velcheti, A. Madabhushi

Study supervision: E. Romero, A. Madabhushi

This study was supported by the NCI of the NIH under award numbers 1U24CA199374-01, R01CA202752-01A1, R01CA208236-01A1, R01 CA216579-01A1, and R01 CA220581-01A1; the National Center for Research Resources under award number 1 C06 RR12463-01; the Department of Defense (DOD) Prostate Cancer Idea Development Award; the DOD Lung Cancer Idea Development Award; the DOD Peer Reviewed Cancer Research Program (W81XWH-16-1-0329); the Ohio Third Frontier Technology Validation Fund; the Wallace H. Coulter Foundation Program in the Department of Biomedical Engineering; and the Clinical and Translational Science Award Program (CTSA) at Case Western Reserve University.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Geng
Y
,
Shao
Y
,
He
W
,
Hu
W
,
Xu
Y
,
Chen
J
, et al
Prognostic role of tumor-infiltrating lymphocytes in lung cancer: a meta-analysis
.
Cell Physiol Biochem
2015
;
37
:
1560
71
.
2.
Bremnes
RM
,
Donnem
T
,
Busund
LT
. 
Importance of tumor infiltrating lymphocytes in non-small cell lung cancer?
Ann Transl Med
2016
;
4
:
142
.
3.
Uramoto
H
,
Tanaka
F
. 
Recurrence after surgery in patients with NSCLC
.
Transl Lung Cancer Res
2014
;
3
:
242
9
.
4.
Schalper
KA
,
Brown
J
,
Carvajal-Hausdorf
D
,
McLaughlin
J
,
Velcheti
V
,
Syrigos
KN
, et al
Objective measurement and clinical significance of TILs in non–small cell lung cancer
.
J Natl Cancer Inst
2015
;
107
. pii:
dju435
.
5.
Donnem
T
,
Hald
SM
,
Paulsen
EE
,
Richardsen
E
,
Al-Saad
S
,
Kilvaer
TK
, et al
Stromal CD8+ T-cell density - A promising supplement to TNM staging in non-small cell lung cancer
.
Clin Cancer Res
2015
;
21
:
2635
43
.
6.
Brambilla
E
,
Teuff
GL
,
Marguet
S
,
Lantuejoul
S
,
Dunant
A
,
Graziano
S
, et al
Prognostic effect of tumor lymphocytic infiltration in resectable non–small-cell lung cancer
.
J Clin Oncol
2016
;
34
:
1223
30
.
7.
Brown
JR
,
Wimberly
H
,
Lannin
DR
,
Nixon
C
,
Rimm
DL
,
Bossuyt
V
. 
Multiplexed quantitative analysis of CD3, CD8, and CD20 predicts response to neoadjuvant chemotherapy in breast cancer
.
Clin Cancer Res
2014
;
20
:
5995
6005
.
8.
Wang
K
,
Xu
J
,
Zhang
T
,
Xue
D
. 
Tumor-infiltrating lymphocytes in breast cancer predict the response to chemotherapy and survival outcome: a meta-analysis
.
Oncotarget
2016
;
7
:
44288
98
.
9.
Kochi
M
,
Iwamoto
T
,
Niikura
N
,
Bianchini
G
,
Masuda
S
,
Mizoo
T
, et al
Tumour-infiltrating lymphocytes (TILs)-related genomic signature predicts chemotherapy response in breast cancer
.
Breast Cancer Res Treat
2017
;
167
:
39
47
.
10.
Kashiwagi
S
,
Asano
Y
,
Goto
W
,
Takada
K
,
Takahashi
K
,
Noda
S
, et al
Use of tumor-infiltrating lymphocytes (TILs) to predict the treatment response to eribulin chemotherapy in breast cancer
.
PLoS One
2017
;
12
:
e0170634
.
11.
Salgado
R
,
Denkert
C
,
Demaria
S
,
Sirtaine
N
,
Klauschen
F
,
Pruneri
G
, et al
The evaluation of tumor-infiltrating lymphocytes (TILs) in breast cancer: recommendations by an International TILs Working Group 2014
.
Ann Oncol
2015
;
26
:
259
71
.
12.
Allard
MA
,
Bachet
JB
,
Beauchet
A
,
Julie
C
,
Malafosse
R
,
Penna
C
, et al
Linear quantification of lymphoid infiltration of the tumor margin: a reproducible method, developed with colorectal cancer tissues, for assessing a highly variable prognostic factor
.
Diagn Pathol
2012
;
7
:
156
.
13.
Robins
HS
,
Ericson
NG
,
Guenthoer
J
,
O'Briant
KC
,
Tewari
M
,
Drescher
CW
, et al
Digital quantification of tumor infiltrating lymphocytes
.
Sci Transl Med
2013
;
5
:
214ra169
.
14.
Eriksen
AC
,
Andersen
JB
,
Kristensson
M
,
dePont Christensen
R
,
Hansen
TF
,
Kjær-Frifeldt
S
, et al
Computer-assisted stereology and automated image analysis for quantification of tumor infiltrating lymphocytes in colon cancer
.
Diagn Pathol
2017
;
12
:
65
.
15.
Basavanhally
AN
,
Ganesan
S
,
Agner
S
,
Monaco
JP
,
Feldman
MD
,
Tomaszewski
JE
, et al
Computerized image-based detection and grading of lymphocytic infiltration in HER2+ breast cancer histopathology
.
IEEE Trans Biomed Eng
2010
;
57
:
642
53
.
16.
Fatakdawala
H
,
Xu
J
,
Basavanhally
A
,
Bhanot
G
,
Ganesan
S
,
Feldman
M
, et al
Expectation-maximization-driven geodesic active contour with overlap resolution (EMaGACOR): application to lymphocyte segmentation on breast cancer histopathology
.
IEEE Trans Biomed Eng
2010
;
57
:
1676
89
.
17.
Janowczyk
A
,
Madabhushi
A
. 
Deep learning for digital pathology image analysis: a comprehensive tutorial with selected use cases
.
J Pathol Inform
2016
;
7
:
29
.
18.
Barua
S
,
Fang
P
,
Sharma
A
,
Fujimoto
J
,
Wistuba
I
,
Rao
AUK
, et al
Spatial interaction of tumor cells and regulatory T cells correlates with survival in non-small cell lung cancer
.
Lung Cancer
2018
;
117
:
73
9
.
19.
Liu
H
,
Zhang
T
,
Ye
J
,
Li
H
,
Huang
J
,
Li
X
, et al
Tumor-infiltrating lymphocytes predict response to chemotherapy in patients with advance non-small cell lung cancer
.
Cancer Immunol Immunother
2012
;
61
:
1849
56
.
20.
Yu
KH
,
Zhang
C
,
Berry
GJ
,
Altman
RB
,
C
,
Rubin
DL
, et al
Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features
.
Nat Commun
2016
;
7
:
12474
.
21.
Luo
X
,
Zang
X
,
Yang
L
,
Huang
J
,
Liang
F
,
Rodriguez-Canales
J
, et al
Comprehensive computational pathological image analysis predicts lung cancer prognosis
.
J Thorac Oncol
2017
;
12
:
501
9
.
22.
Saltz
J
,
Gupta
R
,
Hou
L
,
Kurc
T
,
Singh
P
,
Nguyen
V
, et al
Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images
.
Cell Rep
2018
;
23
:
181
93
.
e7
.
23.
Wang
X
,
Janowczyk
A
,
Zhou
Y
,
Thawani
R
,
Fu
P
,
Schalper
K
, et al
Prediction of recurrence in early stage non-small cell lung cancer using computer extracted nuclear features from digital H&E images
.
Sci Rep
2017
;
7
:
13543
.
24.
Camp
RL
,
Chung
GG
,
Rimm
DL
. 
Automated subcellular localization and quantification of protein expression in tissue microarrays
.
Nat Med
2002
;
8
:
1323
7
.
25.
Veta
M
,
van Diest
PJ
,
Kornegoor
R
,
Huisman
A
,
Viergever
MA
,
Pluim
JPW
. 
Automatic nuclei segmentation in H&E stained breast cancer histopathology images
.
PLoS One
2013
;
8
:
e70221
.
26.
Corredor
G
,
Wang
X
,
Lu
C
,
Velcheti
V
,
Romero
E
,
Madabhushi
A
. 
A watershed and feature-based approach for automated detection of lymphocytes on lung cancer images
. In:
Angelini
ED
,
Landman
BA
.
Medical imaging 2018: image processing. Proceedings of SPIE Medical Imaging; 2018 Feb 10–15; Houston, TX
.
Bellingham (WA)
:
Society of Photo-Optical Instrumentation Engineers (SPIE)
; 
2018
.
27.
Ali
S
,
Lewis
J
,
Madabhushi
A
. 
Spatially aware cell cluster(spACC1) graphs: predicting outcome in oropharyngeal pl6+ tumors
. In:
Mori
K
,
Sakuma
I
,
Sato
Y
,
Barillot
C
,
Navab
N
, editors.
Medical image computing and computer assisted intervention — MICCAI 2013. Proceedings of the 21st International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI); 2013 Sep 22–26; Nagoya, Japan
.
Berlin (Germany)
:
Springer
.
p.
412
9
.
28.
Ali
S
,
Veltri
R
,
Epstein
JA
,
Christudass
C
,
Madabhushi
A
. 
Cell cluster graph for prediction of biochemical recurrence in prostate cancer patients from tissue microarrays
. In:
Gurcan
MN
,
Madabhushi
A
, editors.
Medical imaging 2013: digital pathology. Proceedings of SPIE Medical Imaging; 2013 Feb 9–14; Lake Buena Vista, FL
.
Bellingham (WA)
:
Society of Photo-Optical Instrumentation Engineers (SPIE)
; 
2013
. p.
86760H
.
29.
Peng
H
,
Long
F
,
Ding
C
. 
Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy
.
IEEE Trans Pattern Anal Mach Intell
2005
;
27
:
1226
38
.
30.
Myers
L
,
Sirois
MJ
. 
Spearman correlation coefficients, differences between
.
Encyclopedia of statistical sciences
.
Hoboken (NJ)
:
Wiley
; 
2006
.
31.
Carletta
J
. 
Assessing agreement on classification tasks: the Kappa statistic
.
Comput Linguist
1996
;
22
:
249
54
.
32.
Yuan
Y
. 
Modelling the spatial heterogeneity and molecular correlates of lymphocytic infiltration in triple-negative breast cancer
.
J R Soc Interface
2015
;
12
.
33.
Savas
P
,
Salgado
R
,
Denkert
C
,
Sotiriou
C
,
Darcy
PK
,
Smyth
MJ
, et al
Clinical relevance of host immunity in breast cancer: from TILs to the clinic
.
Nat Rev Clin Oncol
2016
;
13
:
228
41
.
34.
Galon
J
,
Costes
A
,
Sanchez-Cabo
F
,
Kirilovsky
A
,
Mlecnik
B
,
Lagorce-Pagès
C
, et al
Type, density, and location of immune cells within human colorectal tumors predict clinical outcome
.
Science
2006
;
313
:
1960
4
.
35.
Nawaz
S
,
Yuan
Y
. 
Computational pathology: exploring the spatial dimension of tumor ecology
.
Cancer Lett
2016
;
380
:
296
303
.
36.
Buisseret
L
,
Desmedt
C
,
Garaud
S
,
Fornili
M
,
Wang
X
,
Van den Eyden
G
, et al
Reliability of tumor-infiltrating lymphocyte and tertiary lymphoid structure assessment in human breast cancer
.
Mod Pathol
2017
;
30
:
1204
.
37.
Khan
AM
,
Yuan
Y
. 
Biopsy variability of lymphocytic infiltration in breast cancer subtypes and the ImmunoSkew score
.
Sci Reps
2016
;
6
:
36231
.
38.
Ball
GH
,
Hall
DJ
.
Isodata, a novel method of data analysis and pattern classification
.
Menlo Park (CA)
:
Stanford Research Institute
; 
1965
.
39.
Banfield
JD
,
Raftery
AE
. 
Model-based Gaussian and non-Gaussian clustering
.
Biometrics
1993
;
49
:
803
21
.
40.
Hubert
L
,
Schultz
J
. 
Quadratic assignment as a general data analysis strategy
.
Br J Math Stat Psychol
1976
;
29
:
190
241
.
41.
Scott
AJ
,
Symons
MJ
. 
Clustering methods based on likelihood ratio criteria
.
Biometrics
1971
;
27
:
387
97
.