Purpose:

Using standard-of-care CT images obtained from patients with a diagnosis of non–small cell lung cancer (NSCLC), we defined radiomics signatures predicting the sensitivity of tumors to nivolumab, docetaxel, and gefitinib.

Experimental Design:

Data were collected prospectively and analyzed retrospectively across multicenter clinical trials [nivolumab, n = 92, CheckMate017 (NCT01642004), CheckMate063 (NCT01721759); docetaxel, n = 50, CheckMate017; gefitinib, n = 46, (NCT00588445)]. Patients were randomized to training or validation cohorts using either a 4:1 ratio (nivolumab: 72T:20V) or a 2:1 ratio (docetaxel: 32T:18V; gefitinib: 31T:15V) to ensure an adequate sample size in the validation set. Radiomics signatures were derived from quantitative analysis of early tumor changes from baseline to first on-treatment assessment. For each patient, 1,160 radiomics features were extracted from the largest measurable lung lesion. Tumors were classified as treatment sensitive or insensitive; reference standard was median progression-free survival (NCT01642004, NCT01721759) or surgery (NCT00588445). Machine learning was implemented to select up to four features to develop a radiomics signature in the training datasets and applied to each patient in the validation datasets to classify treatment sensitivity.

Results:

The radiomics signatures predicted treatment sensitivity in the validation dataset of each study group with AUC (95 confidence interval): nivolumab, 0.77 (0.55–1.00); docetaxel, 0.67 (0.37–0.96); and gefitinib, 0.82 (0.53–0.97). Using serial radiographic measurements, the magnitude of exponential increase in signature features deciphering tumor volume, invasion of tumor boundaries, or tumor spatial heterogeneity was associated with shorter overall survival.

Conclusions:

Radiomics signatures predicted tumor sensitivity to treatment in patients with NSCLC, offering an approach that could enhance clinical decision-making to continue systemic therapies and forecast overall survival.

Translational Relevance

New patterns of response and progression have been observed in patients treated with immunotherapy, such as pseudoprogression and hyperprogression, prompting the need for alternative metrics for response assessment and therapeutic decision-making. Radiomic signatures, derived from quantitative, artificial intelligence-based analysis of standard-of-care CT images, offer the potential to enhance clinical decision-making as on-treatment markers of efficacy. In patients with non–small cell lung cancer treated with a wide spectrum of systemic cancer therapies (nivolumab, docetaxel, or gefitinib), radiomic signatures detected early changes from baseline to first on-treatment tumor assessment that were associated with sensitivity to treatment. Using serial radiographic measurements, we observed that an exponential increase in signature features deciphering tumor volume, invasion of tumor boundaries, or tumor spatial heterogeneity was associated with treatment insensitivity and shorter overall survival. This indicates radiomic signatures offer an approach that could guide clinical decision-making to continue or modify systemic therapies.

Selecting patients for targeted therapies or immunotherapy is crucial to match individuals to the treatment most likely to benefit them. In patients with a diagnosis of non–small cell lung cancer (NSCLC), personalization of therapy currently relies on pretreatment biomarkers acquired in a tumor biopsy taken at baseline. Tumor biopsies are used to perform genomic analyses to find therapeutically actionable mutations (e.g., EGFR and anaplastic lymphoma kinase, ALK), as well as the expression of proteins that might help predict sensitivity to immunotherapy (e.g., programmed cell death 1 ligand, PD-L1). These measures are typically limited to a single biopsy sample, are difficult to perform repeatedly, and thus cannot capture the spatial and temporal heterogeneity of disease.

Progress in artificial intelligence (AI) has transformed the field of radiology, especially radiomics. Radiomics depends on the quantitative transformation of images into comprehensive datasets that enables high-throughput data mining and automated analysis of patterns present in images. These quantitative imaging biomarkers, defined a priori using mathematical formulas, could guide treatment decision. Radiomics features are calculated by algorithmic analysis of tumor images and have been linked to characteristics of NSCLC. AI can be trained to recognize clinically relevant patterns on CT images and perform a “digital biopsy” of the imaging phenotype of the entire tumor volume. Quantitative imaging features have been associated with therapeutically actionable mutations, such as EGFR mutation status (1–15). Early volumetric assessment of variation in imaging phenotype on CT scan (16–23) has been shown to predict the biologic activity of targeted therapies such as anti-EGFR agents (17, 23). Finally, the imaging phenotype of NSCLC has been linked to patients' outcome (23, 24–30) by predicting metastasis risk, recurrence risk, gross residual disease, and survival.

Currently, the main strategy for response assessment is based on a radiologist's evaluation of the changes in the size and the appearance of new tumor lesions. In the case of “targeted therapies,” shrinkage of target lesions is considered a hallmark of dependency on the targeted pathway and consequently of treatment sensitivity. With low variation due to lesion sampling, acquisition protocol, and observer effect, tumor shrinkage was a robust tool in the chemotherapy era to assess anticancer treatment efficacy, and is widely employed using RECIST 1.1 (31–33). In immune oncology, the utility of RECIST is more limited due to atypical patterns of response and progression on medical imaging. Despite the use of new imaging response assessment criteria, an unmet clinical need for improved assessment remains, which justifies the need for alternative approaches.

The investigation of pretreatment biomarkers to identify patients who might benefit from immunotherapy has gained traction within the research community. The unique challenge is unconventional patterns of response and progression including hyperprogression and pseudoprogression. As ancillary studies, given the clear implication of oncologic progression, it was our hypothesis that extending our imaging evaluation to include pretreatment imaging features as well as serial radiographic measurements might be an untapped source of complementary prognostic information.

Our focus was on advanced/metastatic NSCLC because it is a unique candidate for implementing a radiomics approach. First, there is a strong clinical need because lung cancer is the second most common cancer and a leading cause of cancer death for men and women. Second, the segmentation of lung lesion can be easily implemented in clinical routine because the healthy air-filled lung parenchyma is the most hypodense human tissue. Third, from a biological perspective, the paradigm of response was developed in cytotoxic chemotherapies, which generates a need to explore change in imaging phenotype induced by other types of systemic treatments such as targeted therapy and immunotherapy. Finally, radiographic response evaluation is a standard of care, and the response rate is known to be sufficient to make the results meaningful (34).

We aimed to explore whether AI techniques may be of clinical utility to oncologists as on-treatment markers of efficacy to help decide which patients should continue treatment. To this end, we evaluated the performance of treatment-specific radiomics signatures measured at baseline and at the first response assessment in patients with NSCLC receiving treatment with either of three cancer therapies: an immunotherapy blocking a negative regulator of T-cell activation and response (nivolumab, a monoclonal IgG4 antibody targeting PD-1), a chemotherapy targeting microtubules (docetaxel), and a molecular targeted therapy interrupting EGFR signaling (gefitinib).

Our primary objective was to train and validate three on-treatment signatures to detect NSCLC tumors sensitive to nivolumab, docetaxel, or gefitinib using a quantitative analysis of early tumor changes from baseline to first on-treatment tumor assessment on serial CT scans. Our secondary objective was to test whether features were generalizable across treatment arms.

### Selection of eligible trials

The following inclusion criteria were used for trial eligibility: completed study; primary tumor type with a large proportion of measurable, quantifiable disease (NSCLC); ≥40 patients accrued; FDA-approved drug; and CT images centrally collected and archived. We selected completed trials evaluating three different types of drug classes (an immunotherapy, a cytotoxic chemotherapy, and a molecularly targeted agent) so that imaging metrics could be studied across a range of different therapies.

### Participants

We retrospectively analyzed three NSCLC clinical trials using three treatments (nivolumab, docetaxel, or gefitinib) and 188 patients. Patients treated with nivolumab (n = 92) had enrolled in the NCT01642004 and NCT01721759 multicenter phase II–III trials. Patients who received docetaxel (n = 50) had been treated in the NCT01642004 trial. Patients prescribed gefitinib (n = 46) received the agent on NCT00588445, a single arm phase II trial in NSCLC that sought to correlate the radiographic response induced by gefitinib with mutations in the protein-tyrosine kinase domain of the EGFR gene. Patient characteristics are summarized in Table 1 and in Supplementary SI.1 and SII.1.

Table 1.

Patients' characteristics and performance of the signatures to detect tumor sensitivity to treatment.

Treatment armNivolumabDocetaxelGefitinib
Tumor characteristics
Tumor type Squamous cell carcinoma Squamous cell carcinoma Bronchioloalveolar
Lesion segmented Lung lesion Lung lesion Primary lung cancer
Original clinical trial characteristics
Biomarker PD-L1 PD-L1 EGFR mutational status
Treatment regimen
Immunotherapy Nivolumab None None
Chemotherapy None Docetaxel None
Anti-EGFR mAbs None None Gefitinib
Primary endpoint
Outcome OS, PFS OS, PFS Surgery at 3 weeks
First CT scan assessment Baseline/8-week Baseline/8-week Baseline/3-week
Reference standard for treatment sensitivity Above/below median PFS Above/below median PFS Surgery at 3 weeks
Clinical trial number NCT01642004, NCT01721759 NCT01642004 NCT00588445
Available
n 153 patients 65 patients 46 patients
Included
n 92 patients 50 patients 46 patients
Randomization
Ratio 4: 1 2: 1 2: 1
Training
n 72 patients 32 patients 31 patients
Reference standard
Sensitive 28 patients 5 patients 13 patients
Insensitive 44 patients 27 patients 18 patients
Signature
AUC (95 CI) 0.80 (0.69–0.89) 0.68 (0.38–0.98) 0.81 (0.61–0.92)
Delta features 1. Volume (burden) 1. Volume (burden) 1. Shape SI4 (boundaries)
2. GLCM IMC1 (heterogeneity)  2. LoG Z Entropy (heterogeneity)
3. DWT1 (heterogeneity)  3. GTDM Contrast (heterogeneity)
4. Sigmoid slope (boundaries)  4. LoG X Entropy (heterogeneity)
Algorithm (42) Random Forest Random Forest Random Forest
Validation
n 20 patients 18 patients 15 patients
Reference standard
Sensitive 5 patients 6 patients 7 patients
Insensitive 15 patients 12 patients 8 patients
Signature
AUC (95 CI) 0.77 (0.55–1.00) 0.67 (0.37–0.96) 0.82 (0.53–0.97)
Sensitivity 0.80 0.92 0.83
Specificity 0.53 0.45 0.88
Treatment armNivolumabDocetaxelGefitinib
Tumor characteristics
Tumor type Squamous cell carcinoma Squamous cell carcinoma Bronchioloalveolar
Lesion segmented Lung lesion Lung lesion Primary lung cancer
Original clinical trial characteristics
Biomarker PD-L1 PD-L1 EGFR mutational status
Treatment regimen
Immunotherapy Nivolumab None None
Chemotherapy None Docetaxel None
Anti-EGFR mAbs None None Gefitinib
Primary endpoint
Outcome OS, PFS OS, PFS Surgery at 3 weeks
First CT scan assessment Baseline/8-week Baseline/8-week Baseline/3-week
Reference standard for treatment sensitivity Above/below median PFS Above/below median PFS Surgery at 3 weeks
Clinical trial number NCT01642004, NCT01721759 NCT01642004 NCT00588445
Available
n 153 patients 65 patients 46 patients
Included
n 92 patients 50 patients 46 patients
Randomization
Ratio 4: 1 2: 1 2: 1
Training
n 72 patients 32 patients 31 patients
Reference standard
Sensitive 28 patients 5 patients 13 patients
Insensitive 44 patients 27 patients 18 patients
Signature
AUC (95 CI) 0.80 (0.69–0.89) 0.68 (0.38–0.98) 0.81 (0.61–0.92)
Delta features 1. Volume (burden) 1. Volume (burden) 1. Shape SI4 (boundaries)
2. GLCM IMC1 (heterogeneity)  2. LoG Z Entropy (heterogeneity)
3. DWT1 (heterogeneity)  3. GTDM Contrast (heterogeneity)
4. Sigmoid slope (boundaries)  4. LoG X Entropy (heterogeneity)
Algorithm (42) Random Forest Random Forest Random Forest
Validation
n 20 patients 18 patients 15 patients
Reference standard
Sensitive 5 patients 6 patients 7 patients
Insensitive 15 patients 12 patients 8 patients
Signature
AUC (95 CI) 0.77 (0.55–1.00) 0.67 (0.37–0.96) 0.82 (0.53–0.97)
Sensitivity 0.80 0.92 0.83
Specificity 0.53 0.45 0.88

Note: The EGFR signature was designed in a cohort of patients with metastatic colorectal cancer treated with anti-EGFR using CT scans acquired at baseline and 8 weeks (44). The signature was then transferred and validated in Gefitinib patients with NSCLC treated with gefitinib. Signature features are all delta features measuring the change in the imaging feature between baseline and first response assessment.

The investigators of these clinical trials obtained written informed consent from the patients. The studies were conducted in accordance with recognized ethical guidelines (Declaration of Helsinki), and the studies were approved by an Institutional Review Board.

Data were collected up to the completion date of the clinical trials. Patients were randomly assigned to either training (T) or validation (V) sets using either a 4:1 ratio (nivolumab: 72T:20V) or a 2:1 ratio (docetaxel: 32T:18V; gefitinib: 31T:15V) to ensure an adequate sample size in the validation set. We estimated that a sample size of a minimum of 14 patients was required in the validation set based on the following input and assumption: a type I error of 0.05, a power of 0.8, an AUC of 0.85, and an allocation ratio of 1 (35).

In NCT01642004 and NCT01721759, CT scan imaging was performed at baseline and again every 8 weeks until disease progression or withdrawal. In NCT00588445, CT scan imaging occurred at baseline and at 3 weeks, just prior to day-23 surgery. Patients with missing data were excluded. Additional trial details are included in Supplementary SI.1.

### Quality of CT scan acquisition

We selected 188 patients out of 264 eligible patients (Fig. 1, Table 1) based on eligibility criteria ensuring improved imaging quality for this quantitative retrospective analysis: (i) measurable lung lesions according to RECIST 1.1 at baseline; (ii) no significant imaging artifacts; (iii) lung reconstruction kernel; (iv) pixel spacing <1 mm; (v) slice thickness <10 mm; and (vi) CT scans available at baseline and first response evaluation (landmark).

Figure 1.

Disposition of study patients. Patients could be excluded for multiple reasons. The withdrawal boxes show the number of patients excluded at each step. CT scans acquired at sites are transferred to our academic core. Image selection and quality check using a computer-aided algorithm designed by machine learning. Step 1. Segmentation of the largest measurable lung tumor on CT scan by an expert radiologist at baseline in all patient (inclusion criteria), as well as all available radiographics measurement. Steps 2–3. Tumor imaging phenotype in each patient based on imaging features extraction in the largest measurable segmented lung lesion (1,160 imaging features characterizing changes between baseline and first CT assessment). Step 4. Dimension reduction using machine learning. Identification of reproducible, nonredundant, and informative candidate imaging features for model building. Step 5. Signature building in the training set to enhance strategic decision-making and predict treatment sensitivity. Step 6. Signature validation. Step 7. Transfer of the signature features for evaluation of g and d values using serial radiographic measurements. Step 8. A subset of imaging biomarker is identified.

Figure 1.

Disposition of study patients. Patients could be excluded for multiple reasons. The withdrawal boxes show the number of patients excluded at each step. CT scans acquired at sites are transferred to our academic core. Image selection and quality check using a computer-aided algorithm designed by machine learning. Step 1. Segmentation of the largest measurable lung tumor on CT scan by an expert radiologist at baseline in all patient (inclusion criteria), as well as all available radiographics measurement. Steps 2–3. Tumor imaging phenotype in each patient based on imaging features extraction in the largest measurable segmented lung lesion (1,160 imaging features characterizing changes between baseline and first CT assessment). Step 4. Dimension reduction using machine learning. Identification of reproducible, nonredundant, and informative candidate imaging features for model building. Step 5. Signature building in the training set to enhance strategic decision-making and predict treatment sensitivity. Step 6. Signature validation. Step 7. Transfer of the signature features for evaluation of g and d values using serial radiographic measurements. Step 8. A subset of imaging biomarker is identified.

Close modal

The radiomics quality score (RQS) has been proposed as a guideline to evaluate the quality of radiomics studies (36). For the current study, the RQS was estimated to be 28 out of 36 points (78%). More information about CT scan characteristics, CT scan quality (37–39), and RQS can be found in Supplementary SI.1. A depiction of the radiomics workflow is shown in Fig. 1.

### Lesion segmentation and feature extraction

The largest measurable lung tumor present at baseline was segmented in the baseline and in the first on-treatment response assessment CT scan (nivolumab, 8 weeks; docetaxel, 8 weeks; gefitinib, 3 weeks) in all patients that met the inclusion criteria. Segmentation was performed using an algorithm developed in-house that enables semiautomatic creation of contours on all available CT scans (40). Imaging features were extracted from lung tumors using a priori definitions of radiomics features (38). In total, 1,160 quantitative image features were extracted from the images of each lesion from both the baseline and the first on-treatment CT scan. Delta radiomics features were calculated to characterize the early changes in the features. Full details of lesion segmentation and feature extraction can be found in Supplementary SI.2 and SI.3.

### Signature building in each treatment arm

In the training set of each cohort, we developed a multivariable prediction model, i.e., the signature, to predict treatment sensitivity. Using machine learning, quantitative imaging features were combined by high-throughput mining to build the signature. In keeping with our ultimate aim to generate simple to use, noninvasive clinical decision tools using standard of care CT scan images, the radiomics signature ranging from 0 (highest treatment sensitivity) to 1 (highest treatment insensitivity) for each patient was based on the analysis of the change of the largest measurable lung lesion identified at baseline on CT scan.

In the implementation, a “coarse” to “fine” strategy was developed to select optimal features from the large number of the extracted quantitative image features to build the signature. The coarse selection approach consists of reproducibility analysis, redundancy analysis, and feature ranking that eliminate those nonreproducible, redundant, and noninformative features. The fine selection approach was composed of “forward” search and feature combination, aiming to select the most significant features to build the best predictive model. To prevent overfitting, up to the top four features (in terms of prediction importance outputted by the machine-learning algorithm; ref. 41) of the identified image features in the best predictive model developed in the training set were integrated in the signature. Full details of the model building can be found in Supplementary SI.4.

Due to the limited number of patients available in the gefitinib cohort (n = 46 patients), we defined an alternative strategy for signature building. We trained (n = 202 patients) and validated (n = 100 patients) a four-feature treatment sensitivity signature (Random Forest algorithm; ref. 41) in a cohort of patients with metastatic colorectal cancer treated with anti-EGFR monoclonal antibodies. To prevent overfitting, the four features identified in the colorectal cancer signature were transferred to be calibrated to predict treatment sensitivity in the training set of the gefitinib cohort.

The radiomics signature was trained to predict the tumor sensitivity to systemic anticancer treatment. All patients with cancer were divided into two groups: sensitive and insensitive to treatment. In patients with NSCLC treated with nivolumab and docetaxel, the reference standard to determine treatment sensitivity was median progression-free survival (PFS). The reference standard for gefitinib-treated patients was derived from the analysis of the surgical specimen 2 days after stopping 21 days of gefitinib therapy. The independent validation dataset, consisting of unseen data that were not used for training, was used to evaluate the performance of the signature. Further details for the reference standards are provided in Supplementary SI.5.1.

### Generalizability of the features

Our primary objective was to evaluate one single lesion at two timepoints (baseline and first on-treatment assessment). As an ancillary study, we evaluated the generalizability of imaging features and compared them with alternative outcome measures. To this end, we used all measurable lesions when we studied the clinical value of imaging features using baseline measurement (one timepoint) or serial radiographic measurement (≥two timepoints).

### Estimating rates of tumors with exponential changes in radiomics features using serial radiographic measurements

We attempted to reduce the risk of type I error/overfitting and to generalize the features identified in the three radiomics signatures (nivolumab, docetaxel, and gefitinib). To this end, we modeled the evolution of these features across time using serial radiographic measurements of tumors. We evaluated if these models could be used to understand disease behavior during treatment, compare study interventions, and forecast overall survival (OS; ref. 42).

Previous studies have demonstrated the simultaneous occurrence of two processes in the overwhelming majority of tumors: exponential growth of the treatment-insensitive fraction of the tumor at a rate described by a growth rate constant designated g for growth, and exponential regression of the treatment-sensitive portion of the tumor at a rate described by a regression rate constant designated d for decay. Both processes occur exponentially, and the rates of growth and regression can be estimated using simple mathematical equations. Using the same equations, we were able to estimate the rates of exponential change in the radiomics features over time and based on the rate of change for each radiomic feature assigned tumors into one of three categories: (i) those in which the designated feature was observed to only exponentially grow/increase or become more prominent during treatment (gx), (ii) those in which the analyzed feature only decreased or disappeared exponentially in quantity during therapy (dx); and (iii) those in which the data were best described by an equation that considered that there had occurred concurrent exponential increase and disappearance of the radiomic feature examined, as portions of the tumor sensitive to the therapy disappeared while those resistant to the treatment increased in abundance (gd; ref. 42). Rates of growth or decay of each feature were obtained by using all available radiographic measurements from baseline to a landmark analysis at 8 months. The differences in OS between patients with g above median and below median were analyzed using a cox proportional hazards model and log-rank test (Kaplan–Meier analysis) in which landmark analysis divided the follow-up time at 8-month time point and a P value less than 0.05 was considered significant.

### Baseline imaging predictors of radiographic progression

We evaluated whether baseline radiomics features could predict the best overall response of individual lung lesions. To this end, we classified all measurable lung lesions in the nivolumab cohort into two categories: progressive versus nonprogressive per iRECIST criteria (43). Supplementary SII.3 contains additional methodology details.

### Statistical analysis

Statistical analysis was conducted using Matlab2016a and SPSS23.0. The reported P values were two-sided, with the level for statistical significance set at α = 0.05. The performance of the model was evaluated by computing the area under the ROC curve.

### Performance of radiomics signature: nivolumab

In the training set (n = 72), the tumors of 28 patients were classified as sensitive to nivolumab. The nivolumab radiomics signature included four delta-radiomics features characterizing the change in tumor volume, heterogeneity, and margin sharpness: delta-Volume, delta-GLCM IMC1 (Gray-Level Co-Occurrence Matrix), delta-DWT1 (Discrete Wavelet Transform), and delta-sigmoid slope (Table 1). The nivolumab signature achieved an AUC [95 confidence interval (CI)] of 0.80 (0.69–0.89; P < 10−4) in its ability to identify the sensitivity of a patient's tumor to nivolumab.

In the validation set (n = 20), the tumors of 5 patients were classified as sensitive to nivolumab. The performance of the nivolumab signature in the validation set was AUC (95 CI) of 0.77 (0.55–1.00). Using Kaplan–Meier plots, the estimated median PFS of the overall population (95 CI) was 2.1 (1.3–2.9) months (Fig. 2A). The estimated median PFS (95 CI) was 2.0 (1.8–2.2) versus 6.3 (4.0–8.6) months for patients with (n = 57 patients) or without (n = 35 patients) a high-risk nivolumab signature (signature > 0.5) respectively (P < 10−4).

Figure 2.

Probability of PFS and OS over time as a function of signature score and signature features. Prolonged PFS was observed in patients with low-risk/treatment-sensitive signatures (≤0.5) in both treatment study groups (Fig. 2) using baseline and 8-week CT scans. In the nivolumab cohort (A), median PFS (95% CI) was 2.0 months (1.8–2.2) for patients whose tumors had a signature score > 0.5 (predicted insensitivity, n = 57) and 6.3 months (4.0–8.6) for patients with a signature score ≤ 0.5 (predicted sensitivity, n = 35; P < 10−4). In the docetaxel cohort (B), median PFS (95% CI) was 2.1 months (2.0–2.3) for patients whose tumors had a signature score > 0.5 (predicted insensitivity, n = 39) and 6.2 months (5.5–7.0) for signature score ≤ 0.5 (predicted sensitivity, n = 11; P < 10−4). Using serial radiographic measurements and a landmark at 8-month after drug initiation, we observed that the rate of exponential increase (g) of the radiomic features included in the signatures was associated with OS in patients from both treatment groups (C–E, pooled groups). The magnitude of exponential increase in tumor volume (gVolume, C), tumor spatial heterogeneity (gGLCM-IMC1, D), or boundary irregularity (gShape-SI4, E) was associated with shorter OS.

Figure 2.

Probability of PFS and OS over time as a function of signature score and signature features. Prolonged PFS was observed in patients with low-risk/treatment-sensitive signatures (≤0.5) in both treatment study groups (Fig. 2) using baseline and 8-week CT scans. In the nivolumab cohort (A), median PFS (95% CI) was 2.0 months (1.8–2.2) for patients whose tumors had a signature score > 0.5 (predicted insensitivity, n = 57) and 6.3 months (4.0–8.6) for patients with a signature score ≤ 0.5 (predicted sensitivity, n = 35; P < 10−4). In the docetaxel cohort (B), median PFS (95% CI) was 2.1 months (2.0–2.3) for patients whose tumors had a signature score > 0.5 (predicted insensitivity, n = 39) and 6.2 months (5.5–7.0) for signature score ≤ 0.5 (predicted sensitivity, n = 11; P < 10−4). Using serial radiographic measurements and a landmark at 8-month after drug initiation, we observed that the rate of exponential increase (g) of the radiomic features included in the signatures was associated with OS in patients from both treatment groups (C–E, pooled groups). The magnitude of exponential increase in tumor volume (gVolume, C), tumor spatial heterogeneity (gGLCM-IMC1, D), or boundary irregularity (gShape-SI4, E) was associated with shorter OS.

Close modal

### Performance of radiomics signature: docetaxel

In the training set (n = 32), the tumors of 5 patients were classified as sensitive to docetaxel. The radiomics signature for the docetaxel cohort was a single delta-radiomics feature, delta-Volume (Table 1). The performance of the docetaxel signature was AUC (95 CI) of 0.68 (0.38–0.98).

In the validation set (n = 18), the tumors of 6 patients were classified as sensitive to docetaxel, and the performance of the docetaxel radiomics signature was AUC (95 CI) of 0.67 (0.37–0.96). Using Kaplan–Meier analyses, we observed that compared with tumors with a high-risk docetaxel signature (signature > 0.5, n = 39 patients) those without the high-risk signature (signature |$\le 0.5,\$|n = 35 patients) had a longer estimated median PFS (95 CI) of 6.2 (5.5–7.0) months as compared with 2.1 (2.0–2.3) months (P < 0.001; Fig. 2B).

### Performance of radiomics signature: gefitinib

In the training set (n = 31), the tumors of 13 patients were classified as sensitive to gefitinib. The gefitinib signature was composed of four delta-radiomics features characterizing the change in tumor shape and heterogeneity: delta-Shape-SI4, delta-LOG-X-Entropy, delta-LOG-Z-Entropy, and delta-GTDM-Contrast (Table 1). The performance of the gefitinib signature in the training set was AUC (95 CI) of 0.81 (0.61–0.92).

In the validation set (n = 15), the tumors of 7 patients were classified as sensitive to gefitinib. The performance of the gefitinib radiomics signature was AUC (95 CI) of 0.82 (0.53–0.97). Table 1 includes a summary of the patient information, treatment-specific radiomics signatures, and performance metrics.

### Estimating rates of tumors with exponential changes in radiomics features using serial radiographic measurements

In addition to assessing the radiomic features at baseline and in the first on-treatment tumor assessment, we were also interested in examining the kinetics of change in these same radiomic features as treatment was administered. To do this, we used the values of the eight radiomic features mentioned above from serial scans to assess which equation previously shown to describe the rates of change in tumor volume over time could best describe the kinetics of change in the radiomic features with therapy. Table 2 shows that in the majority of the 224 patients with data available for analysis, simple mathematical equations could be used to describe three categories for each radiomics feature in the analyzed tumors: those with only exponential increase in the designated radiomic feature (gx), those with only exponential decrease in the analyzed feature (dx), and those tumors in which the kinetic change in the studied radiomics feature was best described by an equation that included concurrent exponential increase and reduction (gd). Figure 3 shows the distribution (i.e., bimodal, trimodal) of g and d values for each radiomics feature.

Table 2.

Summary of the percentage of patients fitting g and d models for the eight radiomics features discovered in the three signatures.

Tumor burdenHeterogeneityBoundaries
Log XLog ZDWT1GLCMGTDMShapeSigmoid
AverageaUniBiVolumeAverageaEntropyEntropyHHLIMC1contrastAverageaSI4Slope
NIVOLUMAB
Patients analyzed, n (%) Total included 129 (81.6) 122 (77.2) 128 (81.0) 137 (86.7) 85 (54.1) 62 (39.2) 88 (55.7) 93 (58.9) 87 (55.1) 97 (61.4) 112 (70.6) 114 (72.2) 109 (69.0)
dx 10 (6.6) 11 (7) 9 (5.7) 11 (7) 37 (23.4) 33 (20.9) 34 (21.5) 39 (24.7) 30 (19) 49 (31) 29 (18.4) 20 (12.7) 38 (24.1)
gx 77 (48.5) 68 (43) 75 (47.5) 87 (55.1) 34 (21.7) 39 (24.7) 35 (22.2) 31 (19.6) 30 (19) 36 (22.8) 63 (39.6) 61 (38.6) 64 (40.5)
gd 37 (23.6) 40 (25.3) 39 (24.7) 33 (20.9) 18 (11.7) 23 (14.6) 15 (9.5) 21 (13.3) 23 (14.6) 10 (6.3) 18 (11.1) 29 (18.4) 6 (3.8)
Patients not analyzed, n (%) Total excluded 29 (18.4) 36 (22.8) 30 (19.0) 21 (13.3) 73 (45.9) 96 (60.8) 70 (44.3) 65 (41.1) 71 (44.9) 61 (38.6) 47 (29.4) 44 (27.8) 49 (31.0)
Erroneous data — — — — 1 (0.5) — 4 (2.5) — — — — — —
No measurement 1 (0.6) 1 (0.6) 1 (0.6) 1 (0.6) 1 (0.6) 1 (0.6) 1 (0.6) 1 (0.6) 1 (0.6) 1 (0.6) 4 (2.2) 1 (0.6) 6 (3.8)
Two evaluations ≤20% difference 17 (10.8) 24 (15.2) 17 (10.8) 10 (6.3) 39 (24.8) 42 (26.6) 42 (26.6) 39 (24.7) 45 (28.5) 28 (17.7) 23 (14.6) 36 (22.8) 10 (6.3)
Not fit 11 (7.0) 11 (7) 12 (7.6) 10 (6.3) 25 (15.7) 19 (12) 23 (14.6) 25 (15.8) 25 (15.8) 32 (20.3) 20 (12.7) 7 (4.4) 33 (20.9)
DOCETAXEL
Patients analyzed, n (%) Total included 49 (74.7) 46 (69.7) 53 (80.3) 49 (74.2) 36 (55.2) 33 (50.0) 33 (50.0) 42 (63.6) 30 (45.5) 44 (66.7) 41 (62.1) 36 (54.5) 46 (69.7)
dx 3 (4.0) 4 (6.1) 2 (3) 2 (3) 14 (21.2) 10 (15.2) 13 (19.7) 15 (22.7) 9 (13.6) 23 (34.8) 8 (11.4) 5 (7.6) 10 (15.2)
gx 28 (42.9) 23 (34.8) 32 (48.5) 30 (45.5) 15 (22.4) 15 (22.7) 12 (18.2) 20 (30.3) 14 (21.2) 13 (19.7) 24 (35.6) 15 (22.7) 32 (48.5)
gd 17 (25.8) 18 (27.3) 18 (27.3) 15 (22.7) 7 (11.2) 7 (10.6) 8 (12.1) 7 (10.6) 7 (10.6) 8 (12.1) 9 (12.9) 14 (21.2) 3 (4.5)
Patients not analyzed, n (%) Total excluded 17 (25.3) 20 (30.3) 13 (19.7) 17 (25.8) 30 (44.8) 33 (50.0) 33 (50.0) 24 (36.4) 36 (54.5) 22 (33.3) 25 (37.9) 30 (45.5) 20 (30.3)
Erroneous data — — — — — — — — — — — — —
No measurement — — — — — — — — — — — — —
Two evaluations ≤20% difference 12 (18.7) 19 (28.8) 8 (12.1) 10 (15.2) 20 (30) 24 (36.4) 23 (34.8) 18 (27.3) 24 (36.4) 10 (15.2) 14 (20.5) 22 (33.3) 5 (7.6)
Not fit 4 (6.6) 1 (1.5) 5 (7.6) 7 (10.6) 10 (14.9) 9 (13.6) 10 (15.2) 6 (9.1) 12 (18.2) 12 (18.2) 12 (17.4) 8 (12.1) 15 (22.7)
GEFITINIB
Patients analyzed, n (%) Total included 5 (10.9) 4 (8.7) 5 (10.9) 6 (13.0) 3 (6.5) 2 (4.3) 3 (6.5) 3 (6.5) 3 (6.5) 4 (8.7) 4 (8.7) 4 (8.7) 4 (8.7)
dx 0.7 (1.5) 1 (2.2) 1 (2.2) — 0.2 (0.4) — — 1 (2.2) — — 1 (2.2) — 2 (4.3)
gx 0.3 (0.7) — — 1 (2.2) 1.8 (3.9) — 3 (6.5) 2 (4.3) 2 (4.3) 2 (4.3) 1 (2.2) 1 (2.2) 1 (2.2)
gd 4 (8.7) 3 (6.5) 4 (8.7) 5 (10.9) 1 (2.2) 2 (4.3) — — 1 (2.2) 2 (4.3) 2 (4.3) 3 (6.5) 1 (2.2)
Patients not analyzed, n (%) Total excluded 41 (89.1) 42 (91.3) 41 (89.1) 40 (87.0) 43 (93.5) 44 (95.6) 43 (93.5) 43 (93.5) 43 (93.5) 42 (91.3) 42 (91.3) 42 (91.3) 42 (91.3)
Erroneous data — — — — — — — — — — — — —
No measurement — — — — — — — — — — 0.5 (1.1) — 1 (2.2)
Two evaluations ≤20% difference 0.7 (1.5) 1 (2.2) 1 (2.2) — 1 (2.2) 1 (2.2) 1 (2.2) 1 (2.2) 1 (2.2) 1 (2.2) — — —
Not fit 0.3 (0.7) 1 (2.2) — — 2 (4.3) 3 (6.5) 2 (4.3) 2 (4.3) 2 (4.3) 1 (2.2) 2 (4.3) 2 (4.3) 2 (4.3)
Tumor burdenHeterogeneityBoundaries
Log XLog ZDWT1GLCMGTDMShapeSigmoid
AverageaUniBiVolumeAverageaEntropyEntropyHHLIMC1contrastAverageaSI4Slope
NIVOLUMAB
Patients analyzed, n (%) Total included 129 (81.6) 122 (77.2) 128 (81.0) 137 (86.7) 85 (54.1) 62 (39.2) 88 (55.7) 93 (58.9) 87 (55.1) 97 (61.4) 112 (70.6) 114 (72.2) 109 (69.0)
dx 10 (6.6) 11 (7) 9 (5.7) 11 (7) 37 (23.4) 33 (20.9) 34 (21.5) 39 (24.7) 30 (19) 49 (31) 29 (18.4) 20 (12.7) 38 (24.1)
gx 77 (48.5) 68 (43) 75 (47.5) 87 (55.1) 34 (21.7) 39 (24.7) 35 (22.2) 31 (19.6) 30 (19) 36 (22.8) 63 (39.6) 61 (38.6) 64 (40.5)
gd 37 (23.6) 40 (25.3) 39 (24.7) 33 (20.9) 18 (11.7) 23 (14.6) 15 (9.5) 21 (13.3) 23 (14.6) 10 (6.3) 18 (11.1) 29 (18.4) 6 (3.8)
Patients not analyzed, n (%) Total excluded 29 (18.4) 36 (22.8) 30 (19.0) 21 (13.3) 73 (45.9) 96 (60.8) 70 (44.3) 65 (41.1) 71 (44.9) 61 (38.6) 47 (29.4) 44 (27.8) 49 (31.0)
Erroneous data — — — — 1 (0.5) — 4 (2.5) — — — — — —
No measurement 1 (0.6) 1 (0.6) 1 (0.6) 1 (0.6) 1 (0.6) 1 (0.6) 1 (0.6) 1 (0.6) 1 (0.6) 1 (0.6) 4 (2.2) 1 (0.6) 6 (3.8)
Two evaluations ≤20% difference 17 (10.8) 24 (15.2) 17 (10.8) 10 (6.3) 39 (24.8) 42 (26.6) 42 (26.6) 39 (24.7) 45 (28.5) 28 (17.7) 23 (14.6) 36 (22.8) 10 (6.3)
Not fit 11 (7.0) 11 (7) 12 (7.6) 10 (6.3) 25 (15.7) 19 (12) 23 (14.6) 25 (15.8) 25 (15.8) 32 (20.3) 20 (12.7) 7 (4.4) 33 (20.9)
DOCETAXEL
Patients analyzed, n (%) Total included 49 (74.7) 46 (69.7) 53 (80.3) 49 (74.2) 36 (55.2) 33 (50.0) 33 (50.0) 42 (63.6) 30 (45.5) 44 (66.7) 41 (62.1) 36 (54.5) 46 (69.7)
dx 3 (4.0) 4 (6.1) 2 (3) 2 (3) 14 (21.2) 10 (15.2) 13 (19.7) 15 (22.7) 9 (13.6) 23 (34.8) 8 (11.4) 5 (7.6) 10 (15.2)
gx 28 (42.9) 23 (34.8) 32 (48.5) 30 (45.5) 15 (22.4) 15 (22.7) 12 (18.2) 20 (30.3) 14 (21.2) 13 (19.7) 24 (35.6) 15 (22.7) 32 (48.5)
gd 17 (25.8) 18 (27.3) 18 (27.3) 15 (22.7) 7 (11.2) 7 (10.6) 8 (12.1) 7 (10.6) 7 (10.6) 8 (12.1) 9 (12.9) 14 (21.2) 3 (4.5)
Patients not analyzed, n (%) Total excluded 17 (25.3) 20 (30.3) 13 (19.7) 17 (25.8) 30 (44.8) 33 (50.0) 33 (50.0) 24 (36.4) 36 (54.5) 22 (33.3) 25 (37.9) 30 (45.5) 20 (30.3)
Erroneous data — — — — — — — — — — — — —
No measurement — — — — — — — — — — — — —
Two evaluations ≤20% difference 12 (18.7) 19 (28.8) 8 (12.1) 10 (15.2) 20 (30) 24 (36.4) 23 (34.8) 18 (27.3) 24 (36.4) 10 (15.2) 14 (20.5) 22 (33.3) 5 (7.6)
Not fit 4 (6.6) 1 (1.5) 5 (7.6) 7 (10.6) 10 (14.9) 9 (13.6) 10 (15.2) 6 (9.1) 12 (18.2) 12 (18.2) 12 (17.4) 8 (12.1) 15 (22.7)
GEFITINIB
Patients analyzed, n (%) Total included 5 (10.9) 4 (8.7) 5 (10.9) 6 (13.0) 3 (6.5) 2 (4.3) 3 (6.5) 3 (6.5) 3 (6.5) 4 (8.7) 4 (8.7) 4 (8.7) 4 (8.7)
dx 0.7 (1.5) 1 (2.2) 1 (2.2) — 0.2 (0.4) — — 1 (2.2) — — 1 (2.2) — 2 (4.3)
gx 0.3 (0.7) — — 1 (2.2) 1.8 (3.9) — 3 (6.5) 2 (4.3) 2 (4.3) 2 (4.3) 1 (2.2) 1 (2.2) 1 (2.2)
gd 4 (8.7) 3 (6.5) 4 (8.7) 5 (10.9) 1 (2.2) 2 (4.3) — — 1 (2.2) 2 (4.3) 2 (4.3) 3 (6.5) 1 (2.2)
Patients not analyzed, n (%) Total excluded 41 (89.1) 42 (91.3) 41 (89.1) 40 (87.0) 43 (93.5) 44 (95.6) 43 (93.5) 43 (93.5) 43 (93.5) 42 (91.3) 42 (91.3) 42 (91.3) 42 (91.3)
Erroneous data — — — — — — — — — — — — —
No measurement — — — — — — — — — — 0.5 (1.1) — 1 (2.2)
Two evaluations ≤20% difference 0.7 (1.5) 1 (2.2) 1 (2.2) — 1 (2.2) 1 (2.2) 1 (2.2) 1 (2.2) 1 (2.2) 1 (2.2) — — —
Not fit 0.3 (0.7) 1 (2.2) — — 2 (4.3) 3 (6.5) 2 (4.3) 2 (4.3) 2 (4.3) 1 (2.2) 2 (4.3) 2 (4.3) 2 (4.3)

aAverage: average value for the features included in the categories tumor burden, heterogeneity, and boundaries. Uni (Unidimensional), Bi (Bidimensional). Computation of the rate of decay (d) and growth (g) of radiomics features using serial radiographics measurement. The eight features discovered in the three signatures (nivolumab, docetaxel, and gefitinib) are generalized and applied to the three cohorts. In the cohort gefitinib, patients had only two evaluations, hence d and g values were not assessable in most patients.

Figure 3.

Distribution of the rates of patients with an exponential increase (g) or decrease (d) in the eight features included in the Radiomic signatures. Using serial radiographic measurements, the eight features discovered in the three signatures (nivolumab, docetaxel, gefitinib) (A) were generalized and applied to the all cohorts. Patients with exponential increase (g values) or decrease (d values) in Radiomic features are displayed using tumor burden (B), heterogeneity (C), and boundary features (D). This is a proof of concept that AI can be trained to differentiate the simultaneous occurrence of two processes in the overwhelming majority of tumors: exponential growth of the treatment-insensitive fraction of the tumor at a rate described by a growth rate constant designated g for growth, and exponential regression of the treatment-sensitive portion of the tumor, at a rate described by a regression rate constant designated d for decay. Strikingly, the distribution is bimodal in the Gefitinib cohort suggesting a wider variability between sensitive and insensitive tumors.

Figure 3.

Distribution of the rates of patients with an exponential increase (g) or decrease (d) in the eight features included in the Radiomic signatures. Using serial radiographic measurements, the eight features discovered in the three signatures (nivolumab, docetaxel, gefitinib) (A) were generalized and applied to the all cohorts. Patients with exponential increase (g values) or decrease (d values) in Radiomic features are displayed using tumor burden (B), heterogeneity (C), and boundary features (D). This is a proof of concept that AI can be trained to differentiate the simultaneous occurrence of two processes in the overwhelming majority of tumors: exponential growth of the treatment-insensitive fraction of the tumor at a rate described by a growth rate constant designated g for growth, and exponential regression of the treatment-sensitive portion of the tumor, at a rate described by a regression rate constant designated d for decay. Strikingly, the distribution is bimodal in the Gefitinib cohort suggesting a wider variability between sensitive and insensitive tumors.

Close modal

To simplify the analysis and its interpretation, the eight features mentioned above were divided into three categories: tumor burden (volume), tumor heterogeneity (LOG-X-Entropy, LOG-Z-Entropy, GTDM-Contrast, GLCM-IMC1, GTDM-contrast), and boundaries (Shape-SI4, sigmoid-slope). Tumor burden was established with either unidimensional or bidimensional measurements. The average percentage (min–max) of tumors with estimable rates of decrease (dx), increase (gx), or concurrent increase and decrease (gd) in the various radiomic features was computed for the three categories of radiomics features (Table 2). For the tumor burden features, the rates for nivolumab were dx 7% (6–7), gx 49% (43–55), and gd 24% (21–25); whereas the rates for docetaxel were dx 4% (3–6), gx 43% (35–49), and gd 26% (23–27). For the heterogeneity features, the rates for nivolumab were dx 23% (19–31), gx 22% (19–25), and gd 12% (6–15); whereas the rates for docetaxel were dx 21% (14–35), gx 22% (18–30), and gd 11% (11–12). For boundaries features, the nivolumab rates were dx 18% (13–24), gd 11% (4–18), and gx 40% [39–41]; whereas the rates for docetaxel were dx 11% (8–15), gx 36% (23–48), and gd 13% (4–21).

Figure 2CE show shorter OS for patients whose kinetics were such that the rate of exponential increase in radiomics feature (g) was above the median for gVolume (P = 0.005), gGLCM-IMC1 (P = 0.02), and gShape-SI4 (P < 0.001). Therefore, an exponential increase in either tumor volume, or tumor heterogeneity, or shape irregularity can forecast shorter OS. The imaging feature Shape-SI4 provided clinically useful information across four treatment arms. Shape-SI4, originally identified in patients with colorectal cancer (44), was transferred to predict sensitivity to gefitinib in patients with NSCLC. Shape-SI4 was associated with OS in NSCLC treated with docetaxel and nivolumab. See additional information in Supplementary SI.6.2 and SII.4.

### Baseline imaging predictors of radiographic progression

On a per-lesion analysis, the best overall response was objective radiographical progression in 136 (36.4%) lesions treated with nivolumab and evaluated in at least two timepoints (baseline and 8 weeks). The best overall response in other lesions (63.6%) was pseudoprogression (n = 4, 1.1%), response (n = 26, 7.0%), and stability (n = 207, 55.5%). An analysis of baseline radiomics features of these lung lesions demonstrated that the best baseline predictors of progression were features associated with tumor heterogeneity [RUN PLU, AUC (95 CI) = 0.82 (0.72–0.92)], shape [Shape index 3, AUC (95 CI) = 0.82 (0.89–67)], and volume [AUC (95 CI) = 0.78 (0.89–0.67)]. These findings suggest that larger infiltrative lung lesions are more likely to progress per iRECIST criteria. Additional details can be found in Supplementary SII.3.

Baseline imaging predictors of radiographic progression in lung lesions treated with gefitinib were surrogates of tumor heterogeneity and were reported previously (39, 45).

Using standard-of-care CT images acquired in multicenter clinical trials, machine-learning techniques successfully performed a specific complex task: identifying a pattern of baseline and treatment-induced changes on CT images associated with sensitivity to systemic nivolumab, docetaxel, and gefitinib therapy in patients with a diagnosis of NSCLC (Fig. 4).

Figure 4.

Distribution of the rates of patients with an exponential increase (g) or decrease (d) in the eight features included in the Radiomic signatures. Visual representation of the imaging features included in the signature. The changes in tumor imaging phenotype of the “most sensitive” patient treated with nivolumab is displayed below. Tumor was segmented, and its shape and volume are represented using volume rendering (A). As demonstrated, CT scan images are transformed to other mathematical spaces for feature extraction, e.g., CT image is transformed to LOG space for computing the entropy value (spatial heterogeneity), and tumor pixels within segmentation contour are transformed to GLCM matrix (B). Using this information, a radiomic signature predicts treatment sensitivity which is associated with patients' OS (C).

Figure 4.

Distribution of the rates of patients with an exponential increase (g) or decrease (d) in the eight features included in the Radiomic signatures. Visual representation of the imaging features included in the signature. The changes in tumor imaging phenotype of the “most sensitive” patient treated with nivolumab is displayed below. Tumor was segmented, and its shape and volume are represented using volume rendering (A). As demonstrated, CT scan images are transformed to other mathematical spaces for feature extraction, e.g., CT image is transformed to LOG space for computing the entropy value (spatial heterogeneity), and tumor pixels within segmentation contour are transformed to GLCM matrix (B). Using this information, a radiomic signature predicts treatment sensitivity which is associated with patients' OS (C).

Close modal

Using baseline and first on-treatment assessment (nivolumab: 8 weeks, docetaxel: 8 weeks, gefitinib: 3 weeks), the radiomics signatures output a probability ranging from 0 to 1, corresponding respectively to the highest treatment sensitivity and treatment insensitivity. The innovation of our work compared with the existing literature was that the signatures were dynamic, used a limited subset of features to reduce overfitting, and signature features were generalized to three treatments evaluated in multicenter studies (46, 47). The eight signature features were reproducible across image reconstruction settings and were robust across tumor sites (Supplementary SII.6). Once designed using machine learning, these signatures can be computed for a given patient using a laptop based on the segmentation of the largest measurable lung lesion by an experienced radiologist on routinely acquired CT scans. Such an analysis could also be incorporated into hybrid imaging modalities such as on the CT portion of a noncontrast PET/CT. Therefore, they could be leveraged—once fine-tuned and optimized in larger cohorts—to guide clinical decisions such as changing systemic therapies at an appropriate time.

These signatures can be understood by both clinical oncologists and radiologists as noninvasive in vivo surrogates of biological changes following treatment (Table 1). The eight imaging biomarkers included in the signatures fall into three categories: (i) indicators of change in tumor burden, (ii) indicators of change in tumor spatial heterogeneity, and (iii) characterization of tumor-parenchyma boundaries. Therefore, we can assume that we identified three imaging hallmarks that appear to be prognostic and generalizable quantitative CT radiomics response biomarkers predicting sensitivity to therapies: a decrease in tumor volume, heterogeneity, and tumor-parenchyma invasiveness. The combination of these biomarkers into a signature successfully identified tumors sensitive to cancer treatments.

Using serial radiographic measurement and a landmark analysis at 8 months, we demonstrated that a substantial percentage of tumors exhibit an exponential increase in either tumor volume (gVolume), tumor spatial heterogeneity of contrast-enhanced images (gGLCM-IMC1), or shape irregularity (gShape-SI4). We demonstrated that the magnitude of this exponential increase can be leveraged to forecast shorter OS. In sensitive tumor, we observed tumor shrinkage, an increase in tumor homogeneity (decrease in heterogeneity), and a progressive regularization of tumor contours. This is the first demonstration that the evolution of radiomics features deciphering tumor phenotype under systemic therapy selection pressure follows exponential kinetics and coincides with previous work demonstrating that the quantity of tumor increases and decreases exponentially.

Changes in tumor volume predicted treatment sensitivity in all cohorts. However, it played a more important role in the chemotherapy signature (docetaxel) than in the immunotherapy signature (nivolumab), and was not of importance in the anti-EGFR signature (gefitinib). This is interesting because size-based response criteria were originally developed to assess response of tumors to cytotoxic chemotherapies, with many questioning their general value in assessing targeted molecular agents which are often said to be cytostatic and immunotherapy agents which have been reported to lead to unconventional patterns of response and progression. We demonstrated that temporal changes in intratumoral spatial heterogeneity were associated with sensitivity of NSCLC to anti–PD-1 (nivolumab) as well as anti-EGFR (gefitinib) therapies. Because CT scans are standard of care, noninvasive, informative of the entire tumor burden, and can be performed serially, they are well suited to address spatial and temporal heterogeneity. Tumor heterogeneity is a hallmark of cancer, and some have argued can emerge from the inherent dynamic evolution and adaptation of clones in the presence of drug selection pressure, albeit more likely over prolonged periods of time and not over weeks (48). However, contrast-enhanced CT scans may capture macroscopic patterns of accumulation of iodine linked to tumor neovascularization occurring over shorter time periods. This neovasculature is marked by heterogeneous and excessive blood flow, reduced drug delivery, hypoxia, immune evasion, tumor progression, and metastasis (48). Our modeling framework using the largest measurable lung lesion is supported by studies demonstrating that under drug selection pressure, a similar dynamic is observed in the majority of individual lesions within the same tumor site as well as in lesions with different anatomic locations (49). The incremental value of imaging surrogates of tumor heterogeneity and the kinetics or rates of their evolution should therefore be investigated prospectively as potential candidates to guide precision medicine approaches in systemic therapies with unconventional patterns of response and progression. Pilot results support our findings in the field of immune therapies because lung tumor heterogeneity on contrast-enhanced CT scan has been linked to OS (50) and immune contexture (51).

Early changes in tumor-parenchyma boundaries were strongly associated with tumor sensitivity in patients treated with nivolumab and gefitinib. Although these features are influenced by segmentation (39) and the possibility of change in the shape of lung tumors during respiration, they might capture macroscopically the growth pattern at tumor–lung parenchyma interfaces in NSCLC. Complex lung interfaces are indeed associated with aggressive malignant tumors and poorer survival (52, 53).

In an attempt to generalize the features included in the three signatures, we succeeded in the majority of patients in describing the kinetics of change of the eight radiomics features. We estimated rate constants for both the regression (decrease, d) and growth (increase, g) of the radiomic features using serial radiographic measurements from multiple timepoints. This is the first demonstration that the effect of a treatment on the rates of growth and/or decay in radiomics features can be estimated. Because OS remains the gold standard for efficacy in oncology, we performed a landmark analysis for OS and could demonstrate that increased gvolume (tumor burden), gGLCM IMC1 (heterogeneity), or gShape SI4 (boundaries invasiveness) were associated with shorter OS. This is the first demonstration that the rate of growth of radiomics features can be used to predict OS (42).

In an ancillary lesion-based analysis conducted on baseline lung lesions, we observed that small invasive lung tumors with high heterogeneity (low RUN—Primitive length uniformity) are more likely to progress per iRECIST criteria and further corroborate the clinical significance of the hallmarks of treatment sensitivity used in the nivolumab signature.

Our study has limitations. This is a proof-of-concept study with relatively small sample size. The sample was representative of a population of NSCLC in a large multicenter clinical trial. Using a multicenter clinical trial and randomization of patients in training and validation sets reduced the risk of overfitting. There was no apparent selection bias with the covariates balanced between included and excluded patients, and between training and validation cohorts for each treatment. However, there was a selection based on the presence of a measurable lung lesion. The high frequency of measurable lung lesions in NSCLC in this series made our model applicable to 61% of patients with refractory NSCLC treated with nivolumab and 100% of patients with early stage NSCLC treated with gefitinib. The overall classification performance in docetaxel was likely underfitted. This was because the dataset was unbalanced with a minority (16%) of tumor sensitive to treatment. Hence, the final machine-learning model included one single feature to accurately predict treatment sensitivity because more complex models would have been overfitted. The radiomics pipeline is complex, which makes the selection and identification of imaging biomarkers difficult to be widely adopted in nonacademic institutions. Nonetheless, we extensively described our methodology and clinically relevant features. In addition, we have identified a limited subset of imaging biomarkers that could be easily implemented and computed on any laptop. The study was not designed to evaluate how various time intervals alter feature selection and classification. However, delta features identified at 3 weeks (NCT00588445) were generalizable at 8 weeks (NCT01642004, NCT01721759).

The radiomics signatures were applicable in selected patients with measurable lung tumors reaching predefined clinical and imaging quality criteria at baseline and at first CT evaluation (Fig. 1). The output to be predicted by radiomics signatures was PFS rather than OS because OS suffered from limitations such as unbalanced number of events (nivolumab: 33 deaths occurring with a median follow-up of 8.6 months), the occurrence of crossovers with new treatment lines in patients experiencing disease progression (nivolumab and docetaxel), and the excellent outcome of resectable early stage NSCLC (gefitinib). Perhaps more or different information would be obtained with evaluations earlier than 8 weeks (nivolumab and docetaxel), as demonstrated by the value of a 3-week evaluation (gefitinib).

In conclusion, this study is a proof of concept that AI support could provide clinicians an early indication of the likelihood of success of treatment with the new generation of systemic anticancer therapies using conventional imaging techniques. Computers excelled in mining and integrating large amounts of data from a quantitative CT analysis of a single lung lesion segmented by an expert radiologist. Using this data, early changes in tumor imaging phenotype on standard-of-care CT scan were translated into a quantitative and synthetic signature to predict treatment sensitivity. Treatment sensitivity was associated with changes in the interface between lung tumor and normal lung parenchyma, as well as heterogeneity of the lung tumor. Once further prospectively validated, these signatures could be used clinically to enhance the strategic decision-making of a practicing clinical oncologist optimizing precision treatment. Consequently, AI-generated on-treatment signatures could allow for more accurate treatment decision-making which could constitute the basis for the implementation of adapted treatment guided by quantitative CT scan interpretation in patients with NSCLC treated with systemic therapies.

M. Fronheiser is an employee for Bristol-Myers Squibb. S. Du is an employee for Bristol-Myers Squibb. W. Hayes is an employee for and holds ownership interest (including patents) in Bristol-Myers Squibb. D.K. Leung is an employee for Bristol-Myers Squibb. A. Roy is an employee for and holds ownership interest (including patents) in Bristol-Myers Squibb. L.H. Schwartz is a paid advisory board member for Roche and Novartis, and reports receiving commercial research grants from Merck and Boehringer Ingelheim. No potential conflicts of interest were disclosed by the other authors.

Conception and design: L. Dercle, M. Fronheiser, S. Du, W. Hayes, D.K. Leung, A. Roy, A.T. Fojo, L.H. Schwartz, B. Zhao

Development of methodology: L. Dercle, M. Fronheiser, L. Lu, S. Du, W. Hayes, D.K. Leung, A. Roy, A.T. Fojo, L.H. Schwartz, B. Zhao

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): L.H. Schwartz, B. Zhao

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): L. Dercle, M. Fronheiser, L. Lu, S. Du, D.K. Leung, A. Roy, J. Wilkerson, A.T. Fojo, L.H. Schwartz, B. Zhao

Writing, review, and/or revision of the manuscript: L. Dercle, M. Fronheiser, L. Lu, S. Du, W. Hayes, D.K. Leung, A. Roy, A.T. Fojo, L.H. Schwartz, B. Zhao

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): D.K. Leung, P. Guo, L.H. Schwartz, B. Zhao

Study supervision: L. Dercle, D.K. Leung, A.T. Fojo, L.H. Schwartz, B. Zhao

Authors acknowledge financial support from the NIH (U01 CA225431) and Bristol-Myers Squibb. L. Dercle's work was partially funded by grants from Fondation Philanthropia and Fondation Nuovo-Soldati. The content is solely the responsibility of the authors and does not necessarily represent the funding sources.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Yamamoto
S
,
Korn
RL
,
Oklu
R
,
Migdal
C
,
Gotway
MB
,
Weiss
GJ
, et al
ALK molecular phenotype in non-small cell lung cancer: CT radiogenomic characterization
.
2014
;
272
:
568
76
.
2.
K
,
Yeh
YC
,
D'Angelo
SP
,
Moreira
AL
,
Kuk
D
,
Sima
CS
, et al
Associations between mutations and histologic patterns of mucin in lung adenocarcinoma: invasive mucinous pattern and extracellular mucin are associated with KRAS mutation
.
Am J Surg Pathol
2014
;
38
:
1118
27
.
3.
Kim
TJ
,
Lee
CT
,
Jheon
SH
,
Park
JS
,
Chung
JH
.
Radiologic characteristics of surgically resected non-small cell lung cancer with ALK rearrangement or EGFR mutations
.
Ann Thorac Surg
2016
;
101
:
473
80
.
4.
Zhou
JY
,
Zheng
J
,
Yu
ZF
,
Xiao
WB
,
Zhao
J
,
Sun
K
, et al
Comparative analysis of clinicoradiologic characteristics of lung adenocarcinomas with ALK rearrangements or EGFR mutations
.
2015
;
25
:
1257
66
.
5.
Yang
Y
,
Yang
Y
,
Zhou
X
,
Song
X
,
Liu
M
,
He
W
, et al
EGFR L858R mutation is associated with lung adenocarcinoma patients with dominant ground-glass opacity
.
Lung Cancer
2015
;
87
:
272
7
.
6.
Liu
Y
,
Kim
J
,
Qu
F
,
Liu
S
,
Wang
H
,
Balagurunathan
Y
, et al
CT features associated with epidermal growth factor receptor mutation status in patients with lung adenocarcinoma
.
2016
;
280
:
271
80
.
7.
Rizzo
S
,
Petrella
F
,
Buscarino
V
,
De Maria
F
,
Raimondi
S
,
Barberis
M
, et al
CT radiogenomic characterization of EGFR, K-RAS, and ALK mutations in non-small cell lung cancer
.
2016
;
26
:
32
42
.
8.
Lee
HJ
,
Kim
YT
,
Kang
CH
,
Zhao
B
,
Tan
Y
,
Schwartz
LH
, et al
Epidermal growth factor receptor mutation in lung adenocarcinomas: relationship with CT characteristics and histologic subtypes
.
2013
;
268
:
254
64
.
9.
Choi
CM
,
Kim
MY
,
Hwang
HJ
,
Lee
JB
,
Kim
WS
.
Advanced adenocarcinoma of the lung: comparison of CT characteristics of patients with anaplastic lymphoma kinase gene rearrangement and those with epidermal growth factor receptor mutation
.
2015
;
275
:
272
9
.
10.
Shi
Z
,
Zheng
X
,
Shi
R
,
Song
C
,
Yang
R
,
Zhang
Q
, et al
Radiological and clinical features associated with epidermal growth factor receptor mutation status of exon 19 and 21 in lung adenocarcinoma
.
Sci Rep
2017
;
7
:
364
.
11.
Hsu
JS
,
Huang
MS
,
Chen
CY
,
Liu
GC
,
Liu
TC
,
Chong
IW
, et al
Correlation between EGFR mutation status and computed tomography features in patients with advanced pulmonary adenocarcinoma
.
J Thorac Imaging
2014
;
29
:
357
63
.
12.
Ozkan
E
,
West
A
,
Dedelow
JA
,
Chu
BF
,
Zhao
W
,
Yildiz
VO
, et al
CT gray-level texture analysis as a quantitative imaging biomarker of epidermal growth factor receptor mutation status in adenocarcinoma of the lung
.
AJR Am J Roentgenol
2015
;
205
:
1016
25
.
13.
Liu
Y
,
Kim
J
,
Balagurunathan
Y
,
Li
Q
,
Garcia
AL
,
Stringfield
O
, et al
.
Clin Lung Cancer
2016
;
17
:
441
8
.
e6
.
14.
Yoon
HJ
,
Sohn
I
,
Cho
JH
,
Lee
HY
,
Kim
JH
,
Choi
YL
, et al
Decoding tumor phenotypes for ALK, ROS1, and RET fusions in lung adenocarcinoma using a radiomics approach
.
Medicine (Baltimore)
2015
;
94
:
e1753
.
15.
Wu
W
,
Parmar
C
,
Grossmann
P
,
Quackenbush
J
,
Lambin
P
,
Bussink
J
, et al
Exploratory study to identify radiomics classifiers for lung cancer histology
.
Front Oncol
2016
;
6
:
71
.
16.
Liu
F
,
Zhao
B
,
Krug
LM
,
Ishill
NM
,
Lim
RC
,
Guo
P
, et al
Assessment of therapy responses and prediction of survival in malignant pleural mesothelioma through computer-aided volumetric measurement on computed tomography scans
.
J Thorac Oncol
2010
;
5
:
879
84
.
17.
Zhao
B
,
Oxnard
GR
,
Moskowitz
CS
,
Kris
MG
,
Pao
W
,
Guo
P
, et al
A pilot study of volume measurement as a method of tumor response evaluation to aid biomarker development
.
Clin Cancer Res
2010
;
16
:
4647
53
.
18.
Chow
DS
,
Qi
J
,
Guo
X
,
Miloushev
VZ
,
Iwamoto
FM
,
Bruce
JN
, et al
Semiautomated volumetric measurement on postcontrast MR imaging for analysis of recurrent and residual disease in glioblastoma multiforme
.
2014
;
35
:
498
503
.
19.
Chang
K
,
Zhang
B
,
Guo
X
,
Zong
M
,
Rahman
R
,
Sanchez
D
, et al
Multimodal imaging patterns predict survival in recurrent glioblastoma patients treated with bevacizumab
.
Neuro-oncol
2016
;
18
:
1680
7
.
20.
Ha
R
,
Mema
E
,
Guo
X
,
Mango
V
,
Desperito
E
,
Ha
J
, et al
Three-dimensional quantitative validation of breast magnetic resonance imaging background parenchymal enhancement assessments
.
2016
;
45
:
297
303
.
21.
Ha
R
,
Mema
E
,
Guo
X
,
Mango
V
,
Desperito
E
,
Ha
J
, et al
Quantitative 3D breast magnetic resonance imaging fibroglandular tissue analysis and correlation with qualitative assessments: a feasibility study
.
Quant Imaging Med Surg
2016
;
6
:
144
50
.
22.
Koshkin
VS
,
Bolejack
V
,
Schwartz
LH
,
Wahl
RL
,
Chugh
R
,
Reinke
DK
, et al
Assessment of imaging modalities and response metrics in Ewing sarcoma: correlation with survival
.
J Clin Oncol
2016
;
34
:
3680
5
.
23.
Aerts
HJ
,
Grossmann
P
,
Tan
Y
,
Oxnard
GG
,
Rizvi
N
,
Schwartz
LH
, et al
Defining a radiomic response phenotype: a pilot study using targeted therapy in NSCLC
.
Sci Rep
2016
;
6
:
33860
.
24.
Coroller
TP
,
Agrawal
V
,
Narayan
V
,
Hou
Y
,
Grossmann
P
,
Lee
SW
, et al
Radiomic phenotype features predict pathological response in non-small cell lung cancer
.
2016
;
119
:
480
6
.
25.
Mattonen
SA
,
Palma
DA
,
Johnson
C
,
Louie
AV
,
Landis
M
,
Rodrigues
G
, et al
Detection of local cancer recurrence after stereotactic ablative radiation therapy for lung cancer: physician performance versus radiomic assessment
.
Int J Radiat Oncol Biol Phys
2016
;
94
:
1121
8
.
26.
Cunliffe
A
,
Armato
SG
3rd
,
Castillo
R
,
Pham
N
,
Guerrero
T
,
Al-Hallaq
HA
.
Lung texture in serial thoracic computed tomography scans: correlation of radiomics-based features with radiation therapy dose and radiation pneumonitis development
.
Int J Radiat Oncol Biol Phys
2015
;
91
:
1048
56
.
27.
Fried
DV
,
Tucker
SL
,
Zhou
S
,
Liao
Z
,
Mawlawi
O
,
Ibbott
G
, et al
Prognostic value and reproducibility of pretreatment CT texture features in stage III non-small cell lung cancer
.
Int J Radiat Oncol Biol Phys
2014
;
90
:
834
42
.
28.
Coroller
TP
,
Grossmann
P
,
Hou
Y
,
Rios Velazquez
E
,
Leijenaar
RT
,
Hermann
G
, et al
.
2015
;
114
:
345
50
.
29.
Huang
Y
,
Liu
Z
,
He
L
,
Chen
X
,
Pan
D
,
Ma
Z
, et al
Radiomics signature: a potential biomarker for the prediction of disease-free survival in early-stage (I or II) non-small cell lung cancer
.
2016
;
281
:
947
57
.
30.
N
,
Qian
W
,
Guan
Y
,
Tan
M
,
Qiu
Y
,
Liu
H
, et al
Fusion of quantitative image and genomic biomarkers to improve prognosis assessment of early stage lung cancer patients
.
IEEE Trans Biomed Eng
2016
;
63
:
1034
43
.
31.
Oxnard
GR
,
Zhao
B
,
Sima
CS
,
Ginsberg
MS
,
James
LP
,
Lefkowitz
RA
, et al
Variability of lung tumor measurements on repeat computed tomography scans taken within 15 minutes
.
J Clin Oncol
2011
;
29
:
3114
9
.
32.
Zhao
B
,
James
LP
,
Moskowitz
CS
,
Guo
P
,
Ginsberg
MS
,
Lefkowitz
RA
, et al
Evaluating variability in tumor measurements from same-day repeat CT scans of patients with non-small cell lung cancer
.
2009
;
252
:
263
72
.
33.
Eisenhauer
EA
,
Therasse
P
,
Bogaerts
J
,
Schwartz
LH
,
Sargent
D
,
Ford
R
, et al
New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1)
.
Eur J Cancer
2009
;
45
:
228
47
.
34.
Planchard
D
,
Popat
S
,
Kerr
K
,
Novello
S
,
Smit
EF
,
Faivre-Finn
C
, et al
Metastatic non-small cell lung cancer: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up
.
Ann Oncol
2019
;
30
:
863
70
.
35.
Obuchowski
NA
.
ROC analysis
.
Am J Roentgenol
2005
;
184
:
364
72
.
36.
Lambin
P
,
Leijenaar
RT
,
Deist
TM
,
Peerlings
J
,
de Jong
EE
,
van Timmeren
J
, et al
Radiomics: the bridge between medical imaging and personalized medicine
.
Nat Rev Clin Oncol
2017
;
14
:
749
62
.
37.
Dercle
L
,
Lu
L
,
Lichtenstein
P
,
Yang
H
,
Wang
D
,
Zhu
J
, et al
Impact of variability in portal venous phase acquisition timing in tumor density measurement and treatment response assessment: metastatic colorectal cancer as a paradigm
.
JCO Clin Cancer Inform
2017
:
1
8
.
38.
Zhao
B
,
Tan
Y
,
Tsai
W-Y
,
Qi
J
,
Xie
C
,
Lu
L
, et al
Reproducibility of radiomics for deciphering tumor phenotype with imaging
.
Sci Rep
2016
;
6
:
23428
.
39.
Huang
Q
,
Lu
L
,
Dercle
L
,
Lichtenstein
P
,
Li
Y
,
Yin
Q
, et al
Interobserver variability in tumor contouring affects the use of radiomics to predict mutational status
.
J Med Imaging
2018
;
5
:
011005
.
40.
Tan
Y
,
Schwartz
LH
,
Zhao
B
.
Segmentation of lung lesions on CT scans using watershed, active contours, and Markov random field
.
Med Phys
2013
;
40
:
043502
.
41.
Breiman
L
.
Random forests
.
Mach learn
2001
;
45
:
5
32
.
42.
Wilkerson
J
,
Abdallah
K
,
Hugh-Jones
C
,
Curt
G
,
Rothenberg
M
,
Simantov
R
, et al
Estimation of tumour regression and growth rates during treatment in patients with advanced prostate cancer: a retrospective analysis
.
Lancet Oncol
2017
;
18
:
143
54
.
43.
Seymour
L
,
Bogaerts
J
,
Perrone
A
,
Ford
R
,
Schwartz
LH
,
Mandrekar
S
, et al
iRECIST: guidelines for response criteria for use in trials testing immunotherapeutics
.
Lancet Oncol
2017
;
18
:
e143
e52
.
44.
Dercle
L
,
Lu
L
,
Schwartz
LH
,
Qian
M
,
Tejpar
S
,
Eggleton
P
, et al
Radiomics response signature for identification of metastatic colorectal cancer sensitive to therapies targeting EGFR pathway
.
JNCI
:
Journal of the National Cancer Institute
,
2020
.
45.
Li
Y
,
Lu
L
,
Xiao
M
,
Dercle
L
,
Huang
Y
,
Zhang
Z
, et al
CT slice thickness and convolution kernel affect performance of a radiomic model for predicting EGFR status in non-small cell lung cancer: a preliminary study
.
Sci Rep
2018
;
8
:
17913
.
46.
Sun
R
,
Limkin
EJ
,
Vakalopoulou
M
,
Dercle
L
,
Champiat
S
,
Han
SR
, et al
A radiomics approach to assess tumour-infiltrating CD8 cells and response to anti-PD-1 or anti-PD-L1 immunotherapy: an imaging biomarker, retrospective multicohort study
.
Lancet Oncol
2018
;
19
:
1180
91
.
47.
Limkin
E
,
Sun
R
,
Dercle
L
,
Zacharaki
E
,
Robert
C
,
Reuzé
S
, et al
Promises and challenges for the implementation of computational medical imaging (radiomics) in oncology
.
Ann Oncol
2017
;
28
:
1191
206
.
48.
Hanahan
D
,
Weinberg
RA
.
Hallmarks of cancer: the next generation
.
Cell
2011
;
144
:
646
74
.
49.
Terranova
N
,
Girard
P
,
Ioannou
K
,
U
,
Munafo
A
.
Assessing similarity among individual tumor size lesion dynamics: the CICIL methodology
.
CPT Pharmacometrics Syst Pharmacol
2018
;
7
:
228
36
.
50.
Trebeschi
S
,
Kurilova
I
,
Călin
AM
,
Lambregts
DMJ
,
Smit
EF
,
Aerts
H
, et al
Radiomic biomarkers for the prediction of immunotherapy outcome in patients with metastatic non-small cell lung cancer
.
J Clin Oncol
2017
;
35
:
e14520
.
51.
Grossmann
P
,
Stringfield
O
,
El-Hachem
N
,
Bui
MM
,
Rios Velazquez
E
,
Parmar
C
, et al
Defining the biological basis of radiomic phenotypes in lung cancer
.
Elife
2017
;
6
.
52.
Grove
O
,
Berglund
AE
,
Schabath
MB
,
Aerts
HJ
,
Dekker
A
,
Wang
H
, et al
Quantitative computed tomographic descriptors associate tumor shape complexity and intratumor heterogeneity with prognosis in lung adenocarcinoma
.
PLoS One
2015
;
10
:
e0118261
.
53.
OS
,
Watson
D
.
Texture analysis of aggressive and nonaggressive lung tumor CE CT images
.
IEEE Trans Biomed Eng
2008
;
55
:
1822
30
.