Immunotherapy by immune checkpoint inhibitors has become a standard treatment strategy for many types of solid tumors. However, the majority of patients with cancer will not respond, and predicting response to this therapy is still a challenge. Artificial intelligence (AI) methods can extract meaningful information from complex data, such as image data. In clinical routine, radiology or histopathology images are ubiquitously available. AI has been used to predict the response to immunotherapy from radiology or histopathology images, either directly or indirectly via surrogate markers. While none of these methods are currently used in clinical routine, academic and commercial developments are pointing toward potential clinical adoption in the near future. Here, we summarize the state of the art in AI-based image biomarkers for immunotherapy response based on radiology and histopathology images. We point out limitations, caveats, and pitfalls, including biases, generalizability, and explainability, which are relevant for researchers and health care providers alike, and outline key clinical use cases of this new class of predictive biomarkers.

Response prediction in immunotherapy—one of the major challenges of oncology

The unprecedented results of immune checkpoint inhibitors (ICI) in melanoma patients led to the extensive use of immunotherapy in many tumor types (1). However, while some patients achieve excellent responses, many others do not respond, but can still suffer from serious toxicity. Therefore, accurate predictive biomarkers towards a more precise patient stratification for ICI are needed to optimize patient selection and to minimize undesirable side effects. Several molecular biomarkers for ICI response prediction have been introduced. These include microsatellite instability (MSI), the tumor mutational burden (TMB), the expression of programmed death ligand-1 (PD-L1), as well as the number of tumor-infiltrating lymphocytes (TIL; refs. 2, 3). However, the performance for predicting the response of each biomarker individually or when combined is suboptimal, with some tumors showing resistance despite the presence of the biomarker and vice versa (4). Hence, there is a need for improved predictive biomarkers for supporting clinical decisions, ideally with high reproducibility, high accuracy, and low costs.

Artificial intelligence in medical imaging

Artificial intelligence (AI) is a branch of computer science that integrates different technologies able to replicate human brain functions such as learning and problem solving. Machine learning (ML) is a subset of AI techniques capable of teaching a computer to detect patterns in datasets. ML has been used in oncology for many years, for example to define gene signatures of treatment response (5). A particularly powerful method within ML is artificial neural networks (ANN). An ANN is an interconnected group of units which collectively can perform complex computations, and is loosely modeled after biological neural networks. Multi-layered ANNs are particularly powerful, and using them in ML is called “Deep Learning” (DL). In the last decade, DL has provided strong performance gains in many scientific domains. One of its most common fields of application is computer-based image processing. In image processing, ANNs typically use a mathematical operation called convolutions to reduce the raw pixel information to relevant concepts. ANNs relying on convolutions are called convolutional neural networks (CNN). For decades, teaching computer programs to automatically read and interpret digital images was a very hard task, but in the last decade, the use of CNNs coupled with better hardware and larger data collections have resulted in methodologic breakthroughs. Today, DL with CNNs is the state-of-the-art method for almost any image processing task in nonmedical and medical applications.

Medical applications of AI

As of 2022, dozens of AI-based systems have received regulatory approval to be used in clinical routine (6). Many of these systems are tools to analyze digital medical images, including radiology, histopathology, dermoscopy, and endoscopy images. In oncology, image data is abundant and instrumental for medical decisions: the suspicion of a malignant tumor is usually confirmed with radiology imaging, which also serves to determine the spread of the disease. The diagnosis and most components of the stage of a malignant tumor are usually obtained through histopathology, i.e., visual examination of a piece of tumor tissue under a microscope by an expert pathologist. Hence, for almost every single patient with a solid tumor, radiology and histopathology images are available. The central paradigm of image-based biomarkers in oncology is that these routinely available images contain much more information than is currently being used in clinical care and that this information can be extracted by AI.

Image-based biomarkers for cancer immunotherapy response prediction

In this work, we will review applications of AI to extract biomarkers for ICI response from radiology and histopathology images. The methods can be classified either as a “surrogate” strategy or an “end-to-end” strategy. The “surrogate” strategy means that AI is used to predict TMB, MSI status, TILs or PD-L1 expression. The “end-to-end” strategy means that AI is used to directly predict treatment response (Fig. 1). Together, these approaches have the potential to yield new diagnostic methods for better treatment decisions in clinical routine. However, before clinical adoption, it is important to discuss potential limitations and enable clinicians to understand and interpret the output of these systems.

Classical radiomics and DL radiomics

Radiology imaging modalities, such as CT, MRI, and PET are powerful tools for cancer detection, characterization, and follow-up, providing a comprehensive view of the entire tumor repeatedly throughout the course of the disease. As early as 2014, ML has been used to quantify tumor features in CT images, which are linked to clinical outcome (7). This method has been termed “radiomics” (7). Classical radiomics follow a two-step approach. First, to derive a quantifiable set of handcrafted features from imaging data. Second, to train ML methods for predicting clinically relevant categories from these features. Handcrafted features provide information about the intensity, shape, and texture of the tumor phenotype. However, handcrafting sets of features limit the type of information that ML models can learn, potentially reducing the performance. To overcome this, a more recent approach is “Deep Radiomics”, which uses DL (mostly CNNs) to predict a target category directly from image data, learning features and their combination in one go. CNNs can learn a larger number of features at different levels of abstraction (Fig. 2A). Thereby, the extracted features are selected and weighted on the basis of the task at hand, which provides more flexibility and can improve performance.

Radiomics to predict surrogate biomarkers for immunotherapy response

Several studies have used AI to predict surrogate markers of immunotherapy response from radiology images. These markers include MSI (8–10), TMB (11–13), TILs (14–16), and PD-L1 (Table 1; refs. 17–19). The basic idea behind these studies is that the molecular properties of tumors change the tumor's phenotype, which can be observed in radiology images (Fig. 2B). Many of these studies use classical radiomics to classify a discrete surrogate biomarker, such as microsatellite stable versus MSI-high, high versus low TMB, high versus low CD8-cell infiltration, or PD-L1 positive versus negative. These studies showed promising results: the area under the receiver operating characteristic curve (AUC) usually falls within the range of 0.70 to 0.90. Other studies have used DL methods, showing similar performances for MSI and PD-L1 prediction (17, 20). The ability to predict the status of such biomarkers from radiology images opens exciting opportunities for clinical practice. For example, monitoring changes of these biomarkers during treatment, analyzing the whole tumor and not just a small part of tumor tissue, and the noninvasiveness are clear advantages compared with invasive tissue-based methods. However, the performance of these imaging-based biomarkers is not perfect, and it remains to be shown if it is sufficient for clinical decision-making.

Radiomics for end-to-end prediction of immunotherapy response

While “surrogate” approaches predict an established biomarker from image data, “end-to-end” approaches can directly predict immunotherapy response, as measured by the response evaluation criteria in solid tumors (RECIST), progression-free survival, or overall survival (OS) as endpoints (Table 1; refs. 21–23). Successful applications of this approach have been shown in melanoma (23), lung cancer (21, 22), and bladder cancer, among other tumor types, and are generally based on “classical” handcrafted radiomics, not DL radiomics. In general, these approaches have achieved a good performance with AUC values greater than 0.70 for predicting ICI response. Furthermore, in most studies, these predictive radiomics scores are significantly associated with other surrogate biomarkers with the aim of explaining the underlying biology of the radiomics responsive phenotypes (21, 22). OS is the less common endpoint of response, given that it is highly dependent on other factors, such as previous treatments. However, it is the most relevant endpoint in model development for reaching clinical practice (24). In summary, end-to-end imaging biomarkers may play a key role in complementing surrogate biomarkers to improve the understanding of the tumor phenotype.

The rise of computational pathology

Histopathology images reflect properties of tumors, which are associated with immunotherapy response. Compared with radiology, the spatial scale of histopathology is much lower, such that phenotypes at cellular and subcellular levels can be directly observed (Fig. 2B). Even routine tissue slides stained with hematoxylin and eosin (H&E) are sufficient to identify many different types of immune cells in the lymphoid and myeloid spectrum. A key disadvantage of histopathology, however, is that it requires invasive procedures. Also, while radiomics biomarkers can be recomputed in serial imaging during the course of the disease, histopathology image biomarkers are usually only measured in the initial tumor sample. In the last 5 to 10 years, AI approaches have been widely used to extract biomarkers from H&E slides of solid tumors (25). AI can extract diagnostic (26), prognostic (27), and predictive (28) information from H&E slides. This use of AI methods in pathology image analysis is referred to as “computational pathology”.

Quantification of IHC

One of the applications of DL methods is to facilitate the quantification of established biomarkers in IHC slides. For example, the expression level of the PD-L1 protein on tumor cells and immune cells (combined positivity score), is assessed by manual observation of IHC. Several studies have used DL to automate the subjective scoring of PD-L1 status (29–33). Similarly, the number of TILs in H&E or IHC slides is associated with survival and immunotherapy response in many tumor types (34, 35). As early as 2006, several years before DL was used as a tool in biomedicine, handcrafted image analysis pipelines with classical ML methods were used to count TILs (36). More recently, several studies have used DL for TIL quantification (37, 38). In addition to quantifying lymphocytes, ML methods were also used to quantify other types of innate and lymphoid immune cells, demonstrating that cell counts are prognostic of survival in multiple tumor types (39, 40).

Extracting molecular biomarkers from H&E slides

However, the abilities of DL are not limited to simple quantification tasks such as counting cells. DL can solve complex visual pattern recognition tasks and can piece together subtle visual cues related to cell numbers, cellular shape and textures, relative cell positions and phenotypes of connective tissue. For example, by training DL systems on raw histopathology slides, it is possible to infer the MSI status in colorectal, gastric, and endometrial cancer (41–50). This is particularly relevant for immunotherapy because MSI is a clinically approved biomarker to select patients for immunotherapy, independent of the tumor type (46). MSI prediction systems have been made explainable. When they are queried for relevant visual features driving the classifications, immune cell–rich tumor regions are highlighted (50). Similarly, DL has been used to predict TMB directly from H&E in multiple major tumor types, including lung, breast, and colorectal cancer (51–54). Also, PD-L1 expression levels have been inferred from H&E directly, without the need for a dedicated IHC (55, 56). In addition, the expression level of gene signatures which predict ICI response can be inferred by DL from H&E slides, in hepatocellular carcinoma (57), lung cancer (58), and multiple other tumor types (48). This could help to solve some of the practical problems associated with widespread clinical use of these signatures. Likewise, DL can predict immunotherapy-sensitive tumor subtypes. For example, the luminal subtype of urothelial carcinoma which conveys a better anti–PD-L1 response (59) has been predicted from H&E slides alone (60). Relatedly, virus positivity in head and neck and gastric cancer—which associates with better ICI response—can be inferred from H&E slides with DL (45, 61). Finally, some mutations in clinically relevant driver genes have been shown to increase the probability of immunotherapy response and these mutations can be inferred from H&E. Prominent examples are detection of mutant BRAF in melanoma (62) which predicts benefit to anti–PD-1 response (63). In summary, routinely available H&E slides of solid tumors seem to harbor a wealth of information, some of which is related to ICI response, and which can be extracted by DL.

Computational pathology for end-to-end prediction of immunotherapy response

All of these above-mentioned approaches use DL to predict a known biological marker from H&E images. An alternative strategy is to train DL directly on response or outcome data. This has been attempted by at least two studies (64, 65) that used CNNs or graph neural networks (GNN) to predict immunotherapy responses. These studies reported an AUC of 0.778 for prediction of responders in melanoma and an AUC of 0.69 for predicting response in lung cancer. Another report about the prediction of response from the morphology of cancer cell nuclei is publicly available (66). In general, such end-to-end prediction studies are quite difficult to perform for practical reasons: they require the DL system to be trained on clinical outcomes, ideally RECIST data. In practice, it is very difficult to collect a sufficient number of pathology tissue specimens of ICI-treated patients with matched response data. In addition, cancer immunotherapy is often administered in late lines of therapy, after patients have received one or more regimens of chemotherapy and other treatments. However, tissue for histopathology is typically acquired at the initial diagnosis, and re-biopsies at later time points are not commonly performed in most tumor types. This means that DL systems are trained on treatment-naïve tumor tissue while patients might start immunotherapy only months later after multiple previous lines of treatment have failed. This is a strong conceptual limitation that could only be resolved by a repetitive sampling of tissue, or a move of immunotherapies towards earlier lines of treatment.

Key limitations

AI biomarkers have a number of conceptual limitations. The first limitation is data quality. Beyond the large amount of data that AI models require for achieving accurate, generalizable results, this data must be of high quality (67). If we train a model with noisy or artefactual images, many more cases will be necessary for the model to converge and achieve a good performance. The second limitation is generalization. If the training data is not representative of real-world populations, DL models can fail to generalize. This is especially an issue in medical contexts, where data distributions vary markedly between different countries or even different hospitals. Without adequate precautions, such batch effects can inflate performance statistics (68). Mitigation strategies are to train on diverse datasets (42) or augmenting data (69). The third limitation is biases. AI models can be biased, which means that the performance can be dependent on patient characteristics like age, gender, or ethnicity (70). For AI models to be deployed in medicine, large-scale validation studies with predefined performance metrics are required to guarantee model performance in the real world (26). The fourth limitation is the quality of the ground truth. When developing a model using molecular biomarkers as a surrogate of response to immunotherapy, the performance of the model will be limited to the established molecular biomarker predictive capacity. This is a motivation for clear definition of ground truth for AI biomarker training, but also a motivation for end-to-end training of biomarkers on clinical outcome data. In all of these efforts, data standardization and quality control systems are paramount before application in clinical routine. Best practice guidelines for such quality aspects are formalized and collected in the Equator Network (https://www.equator-network.org/), which also includes AI-specific aspects, such as in the STARD-AI (71) and TRIPOD-AI (72) guidelines. In addition to radiology, an Image Biomarker Standardization Initiative (73) as well as a radiomics quality score (74) have been reported with the aim of ensuring reproducible and trustable predictive models. In addition to these research-focused guidelines, the FDA has published a list of guiding principles to promote the safe, effective, and high-quality application of AI and ML in the medical field (75).

Multimodal models

One way to combine the benefits of radiology and histology, as well as other data types, would be to develop multimodal AI models, which can integrate multiple data types. Multimodality also helps with interpretability of the resulting models because many image features only make sense in the light of specific host factors, such as age, immune status, comorbidities/disease, genomics. ML models have been used to extract immunotherapy biomarkers from non-image data such as serum profiles from liquid biopsies of patients with cancer (76–78). Combining such data with image data could further improve the predictive performance. On the technical side, transformer neural networks have achieved remarkable performance in nonmedical tasks, especially for combinations of different types of image data. However, evaluating such multimodal models exacerbated the practical problems associated with data collection: for most researchers at academic institutions, it is currently very difficult if not impossible to collect radiology, histopathology, and clinical data for a set of patients. Systematic data collection in clinical trials could be a solution to this problem if the data is made available to researchers with low barriers.

Explainability

The increasing use of AI for developing immune surrogate biomarkers and the potential impact on clinical decision-making has raised the need for humans to understand these algorithms. This has been termed “Explainable AI” (XAI). Many classical ML algorithms are intrinsically explainable by their structure, e.g., logistic regression or decision trees. However, models with higher complexity, such as DL models, tend to lack explainability. To solve the trade-off between performance and explainability, several post hoc techniques have been developed for understanding the decision-making procedure of these so-called black-box models. XAI aims to comprehend model predictions and explain them in human and understandable terms. This could increase the trustability of all the stakeholders: medical doctors and patients to rely on fair decisions, regulatory agencies for quality control, and developers for improving the product.

Regulatory approval

Routine clinical implementation of imaging AI-based tools should be driven by clear demonstration of clinical value and strict ethical and regulatory requirements. An added complexity for validating AI biomarkers, compared with other medical devices, is the ability for AI systems to learn from real-world data in real time. These evolving biomarkers require appropriately tailored regulatory frameworks. AI-based imaging biomarkers are tools embedded in software applications that are intended to be used, alone or in combination, for predicting or monitoring cancer response, therefore being considered as Medical Device Software. The framework to develop, clinically qualify and implement a software tool as a medical device will depend on the local regulations where the device is planned to be used. In the United States, the FDA has defined a framework to enable developers, users and the Agency itself to evaluate and monitor Clinical Decision Support (CDS) software from its premarket development through post-market performance accounting for its iterative nature, while still ensuring its continued safety and effectiveness evaluation. In Europe, CDS software should be compliant with the European Union Medical Devices Regulation (MDR 2017/745), that sets the standards of performance, quality, safety, and efficacy (79).

Embedding in routine workflows

The main constraints of AI-based tool implementation in clinical practice are numerous, particularly related to a lack of clinical qualification, but also to the complexity of conducting prospective clinical trials that evaluate biomarkers. Moreover, advances in the field are sometimes hampered by the reluctance of a part of the medical community to embrace these technologies due to these potential users’ lack of confidence and their presumed resistance to the heavier workload that high-throughput imaging analysis may involve. Importantly, the implementation of AI-based tools should not increase physician workload but actually reduce and facilitate radiologists’ and pathologists’ workflows, enabling standardized data reporting. Therefore, efforts are necessary to integrate these software applications within the clinical routine analysis platforms of radiology and pathology departments. In this regard, all the stakeholders (i.e., manufacturers, researchers, clinicians, patients) involved in the development of imaging AI-based tools should work together to accelerate the validation and implementation of these tools in clinical routine and truly impact clinical practice. Imaging AI-based tools must be accessible, user-friendly, rapid to compute and able to promote equality in healthcare to be implemented in routine clinical practice. Finally, the results must be considered as a decision support tool to assist physicians, rather than a substitute for expert physician decision-making.

R. Perez-Lopez reports grants from AstraZeneca and Roche Pharma outside the submitted work. J.N. Kather reports personal fees from Owkin, Panakeia, MSD, Eisai, and Bayer outside the submitted work. No disclosures were reported by the other authors.

R. Perez-Lopez is supported by La Caixa Foundation, a CRIS Foundation Talent Award (TALENT19–05), the FERO Foundation, the Instituto de Salud Carlos III-Investigacion en Salud (PI18/01395 and PI21/01019), and the Prostate Cancer Foundation (18YOUN19). M. Ligero is supported by PERIS PIF-Salut Grant. J.N. Kather is supported by the German Federal Ministry of Health (DEEP LIVER, ZMVI1–2520DAT111) and the Max-Eder-Programme of the German Cancer Aid (grant #70113864).

1.
Havel
JJ
,
Chowell
D
,
Chan
TA
.
The evolving landscape of biomarkers for checkpoint inhibitor immunotherapy
.
Nat Rev Cancer
2019
;
19
:
133
50
.
2.
Center for Drug Evaluation, Research
.
FDA approves pembrolizumab for adults and children with TMB-H solid tumors
. In:
U.S. Food and Drug Administration
.
17 Jun
2020
[cited 25 Apr 2022]. Available from
: https://www.fda.gov/drugs/drug-approvals-and-databases/fda-approves-pembrolizumab-adults-and-children-tmb-h-solid-tumors.
3.
Planchard
D
,
Popat
S
,
Kerr
K
,
Novello
S
,
Smit
EF
,
Faivre-Finn
C
, et al
.
Metastatic non–small cell lung cancer: ESMO Clinical Practice Guidelines for diagnosis, treatment, and follow-up
.
Ann Oncol
2018
;
29
:
iv192
237
.
4.
Lee
JS
,
Ruppin
E
.
Multiomics prediction of response rates to therapies to inhibit programmed cell death 1 and programmed cell death 1 ligand 1
.
JAMA Oncol
2019
;
5
:
1614
8
.
5.
Wiesweg
M
,
Mairinger
F
,
Reis
H
,
Goetz
M
,
Kollmeier
J
,
Misch
D
, et al
.
Machine learning reveals a PD-L1–independent prediction of response to immunotherapy of non–small cell lung cancer by gene expression context
.
Eur J Cancer
2020
;
140
:
76
85
.
6.
Benjamens
S
,
Dhunnoo
P
,
Meskó
B
.
The state of artificial intelligence-based FDA-approved medical devices and algorithms: an online database
.
NPJ Digit Med
2020
;
3
:
118
.
7.
Aerts
HJWL
,
Velazquez
ER
,
Leijenaar
RTH
,
Parmar
C
,
Grossmann
P
,
Carvalho
S
, et al
.
Decoding tumor phenotype by noninvasive imaging using a quantitative radiomics approach
.
Nat Commun
2014
;
5
:
4006
.
8.
Li
Z
,
Zhong
Q
,
Zhang
L
,
Wang
M
,
Xiao
W
,
Cui
F
, et al
.
Computed tomography–based radiomics model to preoperatively predict microsatellite instability status in colorectal cancer: a multicenter study
.
Front Oncol
2021
;
11
:
666786
.
9.
Pei
Q
,
Yi
X
,
Chen
C
,
Pang
P
,
Fu
Y
,
Lei
G
, et al
.
Pretreatment CT-based radiomics nomogram for predicting microsatellite instability status in colorectal cancer
.
Eur Radiol
2022
;
32
:
714
24
.
10.
Cao
Y
,
Zhang
G
,
Zhang
J
,
Yang
Y
,
Ren
J
,
Yan
X
, et al
.
Predicting microsatellite instability status in colorectal cancer based on triphasic enhanced computed tomography radiomics signatures: a multicenter study
.
Front Oncol
2021
;
11
:
687771
.
11.
He
B
,
Dong
D
,
She
Y
,
Zhou
C
,
Fang
M
,
Zhu
Y
, et al
.
Predicting response to immunotherapy in advanced non–small cell lung cancer using tumor mutational burden radiomic biomarker
.
J Immunother Cancer
2020
;
8
:
e000550
.
12.
Veeraraghavan
H
,
Friedman
CF
,
DeLair
DF
,
Ninčević
J
,
Himoto
Y
,
Bruni
SG
, et al
.
Machine learning–based prediction of microsatellite instability and high tumor mutation burden from contrast-enhanced computed tomography in endometrial cancers
.
Sci Rep
2020
;
10
:
17769
.
13.
Liu
E-T
,
Zhou
S
,
Li
Y
,
Zhang
S
,
Ma
Z
,
Guo
J
, et al
.
Development and validation of an MRI-based nomogram for the preoperative prediction of tumor mutational burden in lower-grade gliomas
.
Quant Imaging Med Surg
2022
;
12
:
1684
97
.
14.
Sun
R
,
Limkin
EJ
,
Vakalopoulou
M
,
Dercle
L
,
Champiat
S
,
Han
SR
, et al
.
A radiomics approach to assess tumor-infiltrating CD8 cells and response to anti–PD-1 or anti–PD-L1 immunotherapy: an imaging biomarker, retrospective multicohort study
.
Lancet Oncol
2018
;
19
:
1180
91
.
15.
Chen
S
,
Feng
S
,
Wei
J
,
Liu
F
,
Li
B
,
Li
X
, et al
.
Pretreatment prediction of immunoscore in hepatocellular cancer: a radiomics-based clinical model based on Gd-EOB-DTPA-enhanced MRI imaging
.
Eur Radiol
2019
;
29
:
4177
87
.
16.
Liao
H
,
Zhang
Z
,
Chen
J
,
Liao
M
,
Xu
L
,
Wu
Z
, et al
.
Preoperative radiomic approach to evaluate tumor-infiltrating CD8+ T cells in hepatocellular carcinoma patients using contrast-enhanced computed tomography
.
Ann Surg Oncol
2019
;
26
:
4537
47
.
17.
Mu
W
,
Jiang
L
,
Shi
Y
,
Tunali
I
,
Gray
JE
,
Katsoulakis
E
, et al
.
Noninvasive measurement of PD-L1 status and prediction of immunotherapy response using deep learning of PET/CT images
.
J Immunother Cancer
2021
;
9
:
e002118
.
18.
Iwatate
Y
,
Hoshino
I
,
Yokota
H
,
Ishige
F
,
Itami
M
,
Mori
Y
, et al
.
Radiogenomics for predicting p53 status, PD-L1 expression, and prognosis with machine learning in pancreatic cancer
.
Br J Cancer
2020
;
123
:
1253
61
.
19.
Tang
C
,
Hobbs
B
,
Amer
A
,
Li
X
,
Behrens
C
,
Canales
JR
, et al
.
Development of an immune-pathology informed radiomics model for non–small cell lung cancer
.
Sci Rep
2018
;
8
:
1922
.
20.
Zhang
W
,
Yin
H
,
Huang
Z
,
Zhao
J
,
Zheng
H
,
He
D
, et al
.
Development and validation of MRI-based deep learning models for prediction of microsatellite instability in rectal cancer
.
Cancer Med
2021
;
10
:
4164
73
.
21.
Trebeschi
S
,
Drago
SG
,
Birkbak
NJ
,
Kurilova
I
,
Călin
AM
,
Delli
Pizzi A
, et al
.
Predicting response to cancer immunotherapy using noninvasive radiomic biomarkers
.
Ann Oncol
2019
;
30
:
998
1004
.
22.
Ligero
M
,
Garcia-Ruiz
A
,
Viaplana
C
,
Villacampa
G
,
Raciti
MV
,
Landa
J
, et al
.
A CT-based radiomics signature is associated with response to immune checkpoint inhibitors in advanced solid tumors
.
Radiology
2021
;
299
:
109
19
.
23.
Dercle
L
,
Zhao
B
,
Gönen
M
,
Moskowitz
CS
,
Firas
A
,
Beylergil
V
, et al
.
Early readout on overall survival of patients with melanoma treated with immunotherapy using a novel imaging analysis
.
JAMA Oncol
2022
;
8
:
385
92
.
24.
Alban
TJ
,
Chan
TA
.
Immunotherapy biomarkers: the long and winding road
.
Nature reviews. Nat Rev Clin Oncol
2021
;
18
:
323
4
.
25.
Echle
A
,
Rindtorff
NT
,
Brinker
TJ
,
Luedde
T
,
Pearson
AT
,
Kather
JN
.
Deep learning in cancer pathology: a new generation of clinical biomarkers
.
Br J Cancer
2021
;
124
:
686
96
.
26.
Kleppe
A
,
Skrede
O-J
,
De Raedt
S
,
Liestøl
K
,
Kerr
DJ
,
Danielsen
HE
.
Designing deep learning studies in cancer diagnostics
.
Nat Rev Cancer
2021
;
21
:
199
211
.
27.
Skrede
O-J
,
De Raedt
S
,
Kleppe
A
,
Hveem
TS
,
Liestøl
K
,
Maddison
J
, et al
.
Deep learning for prediction of colorectal cancer outcome: a discovery and validation study
.
Lancet
2020
;
395
:
350
60
.
28.
Cifci
D
,
Foersch
S
,
Kather
JN
.
Artificial intelligence to identify genetic alterations in conventional histopathology
.
J Pathol
2022
;
257
:
430
44
.
29.
Hondelink
LM
,
Hüyük
M
,
Postmus
PE
,
Smit
VTHBM
,
Blom
S
,
von der Thüsen
JH
, et al
.
Development and validation of a supervised deep learning algorithm for automated whole-slide programmed death-ligand 1 tumor proportion score assessment in non–small cell lung cancer
.
Histopathology
2022
;
80
:
635
47
.
30.
Liu
J
,
Zheng
Q
,
Mu
X
,
Zuo
Y
,
Xu
B
,
Jin
Y
, et al
.
Automated tumor proportion score analysis for PD-L1 (22C3) expression in lung squamous cell carcinoma
.
Sci Rep
2021
;
11
:
15907
.
31.
Kapil
A
,
Meier
A
,
Zuraw
A
,
Steele
KE
,
Rebelatto
MC
,
Schmidt
G
, et al
.
Deep semi supervised generative learning for automated tumor proportion scoring on NSCLC tissue needle biopsies
.
Sci Rep
2018
;
8
:
17343
.
32.
Wu
J
,
Liu
C
,
Liu
X
,
Sun
W
,
Li
L
,
Gao
N
, et al
.
Artificial intelligence–assisted system for precision diagnosis of PD-L1 expression in non–small cell lung cancer
.
Mod Pathol
2022
;
35
:
403
11
.
33.
Wang
X
,
Chen
P
,
Ding
G
,
Xing
Y
,
Tang
R
,
Peng
C
, et al
.
Dual-scale categorization based deep learning to evaluate programmed cell death ligand 1 expression in non–small cell lung cancer
.
Medicine
2021
;
100
:
e25994
.
34.
Hendry
S
,
Salgado
R
,
Gevaert
T
,
Russell
PA
,
John
T
,
Thapa
B
, et al
.
Assessing tumor-infiltrating lymphocytes in solid tumors: A practical review for pathologists and proposal for a standardized method from the International Immuno-oncology biomarkers Working Group: Part 2: TILs in melanoma, gastrointestinal tract carcinomas, non–small cell lung carcinoma and mesothelioma, endometrial and ovarian carcinomas, squamous cell carcinoma of the head and neck, genitourinary carcinomas, and primary brain tumors
.
Adv Anat Pathol
2017
;
24
:
311
35
.
35.
Yu
Y
,
Zeng
D
,
Ou
Q
,
Liu
S
,
Li
A
,
Chen
Y
, et al
.
Association of survival and immune-related biomarkers with immunotherapy in patients with non–small cell lung cancer: a meta-analysis and individual patient-level analysis
.
JAMA Netw Open
2019
;
2
:
e196879
.
36.
Galon
J
,
Costes
A
,
Sanchez-Cabo
F
,
Kirilovsky
A
,
Mlecnik
B
,
Lagorce-Pagès
C
, et al
.
Type, density, and location of immune cells within human colorectal tumors predict clinical outcome
.
Science
2006
;
313
:
1960
4
.
37.
Schirris
Y
,
Engelaer
M
,
Panteli
A
,
Horlings
HM
,
Gavves
E
,
WeakSTIL
TJ
, et al
.
Weak whole-slide image level stromal tumor-infiltrating lymphocyte scores are all you need
.
arXiv [eess.IV
].
2021
.
Available from
: http://arxiv.org/abs/2109.05892.
38.
Shaban
M
,
Khurram
SA
,
Fraz
MM
,
Alsubaie
N
,
Masood
I
,
Mushtaq
S
, et al
.
A novel digital score for abundance of tumor infiltrating lymphocytes predicts disease-free survival in oral squamous cell carcinoma
.
Sci Rep
2019
;
9
:
13341
.
39.
Kather
JN
,
Hörner
C
,
Weis
C-A
,
Aung
T
,
Vokuhl
C
,
Weiss
C
, et al
.
CD163+ immune cell infiltrates and presence of CD54+ microvessels are prognostic markers for patients with embryonal rhabdomyosarcoma
.
Sci Rep
2019
;
9
:
9211
.
40.
Kather
JN
,
Suarez-Carmona
M
,
Charoentong
P
,
Weis
C-A
,
Hirsch
D
,
Bankhead
P
, et al
.
Topography of cancer-associated immune cells in human solid tumors
.
eLife
2018
;
7
:
36967
.
41.
Kather
JN
,
Pearson
AT
,
Halama
N
,
Jäger
D
,
Krause
J
,
Loosen
SH
, et al
.
Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer
.
Nat Med
2019
;
25
:
1054
6
.
42.
Echle
A
,
Grabsch
HI
,
Quirke
P
,
van den Brandt
PA
,
West
NP
,
Hutchins
GGA
, et al
.
Clinical-grade detection of microsatellite instability in colorectal tumors by deep learning
.
Gastroenterology
2020
;
159
:
1406
16
.
43.
Bilal
M
,
Raza
SEA
,
Azam
A
,
Graham
S
,
Ilyas
M
,
Cree
IA
, et al
.
Development and validation of a weakly supervised deep learning framework to predict the status of molecular pathways and key mutations in colorectal cancer from routine histology images: a retrospective study
.
Lancet Digit Health
2021
;
3
:
e763
72
.
44.
Yamashita
R
,
Long
J
,
Longacre
T
,
Peng
L
,
Berry
G
,
Martin
B
, et al
.
Deep learning model for the prediction of microsatellite instability in colorectal cancer: a diagnostic study
.
Lancet Oncol
2021
;
22
:
132
41
.
45.
Muti
HS
,
Heij
LR
,
Keller
G
,
Kohlruss
M
,
Langer
R
,
Dislich
B
, et al
.
Development and validation of deep learning classifiers to detect Epstein–Barr virus and microsatellite instability status in gastric cancer: a retrospective multicenter cohort study
.
The Lancet Digital Health
2021
;
3
:
e654
64
.
46.
Echle
A
,
Laleh
NG
,
Schrammen
PL
,
West
NP
,
Trautwein
C
,
Brinker
TJ
, et al
.
Deep learning for the detection of microsatellite instability from histology images in colorectal cancer: a systematic literature review
.
ImmunoInformatics
2021
;
3–4
:
100008
.
47.
Schrammen
PL
,
Laleh
NG
,
Echle
A
,
Truhn
D
,
Schulz
V
,
Brinker
TJ
, et al
.
Weakly supervised annotation-free cancer detection and prediction of genotype in routine histopathology
.
J Pathol
2022
;
256
:
50
60
.
48.
Kather
JN
,
Heij
LR
,
Grabsch
HI
,
Loeffler
C
,
Echle
A
,
Muti
HS
, et al
.
Pan-cancer image-based detection of clinically actionable genetic alterations
.
Nature Cancer
2020
;
1
:
789
99
.
49.
Schmauch
B
,
Romagnoni
A
,
Pronier
E
,
Saillard
C
,
Maillé
P
,
Calderaro
J
, et al
.
A deep learning model to predict RNA-seq expression of tumors from whole slide images
.
Nat Commun
2020
;
11
:
3877
.
50.
Echle
A
,
Laleh
NG
,
Quirke
P
,
Grabsch
HI
,
Muti
HS
,
Saldanha
OL
, et al
.
Artificial intelligence for detection of microsatellite instability in colorectal cancer—a multicentric analysis of a prescreening tool for clinical application
.
ESMO Open
2022
;
7
:
100400
.
51.
Jain
MS
,
Massoud
TF
.
Predicting tumor mutational burden from histopathological images using multiscale deep learning
.
bioRxiv
2020
:.
2020.06.15.153379
.
52.
Xu
H
,
Park
S
,
Lee
SH
,
Hwang
TH
.
Using transfer learning on whole slide images to predict tumor mutational burden in bladder cancer patients
.
bioRxiv
2019
:
554527
.
53.
Chen
S
,
Xiang
J
,
Wang
X
,
Zhang
J
,
Yang
S
,
Huang
J
, et al
.
Pan-cancer computational histopathology reveals tumor mutational burden status through weakly supervised deep learning
.
arXiv [cs.CV]
.
2022
.
Available from
: http://arxiv.org/abs/2204.03257.
54.
Niu
Y
,
Wang
L
,
Zhang
X
,
Han
Y
,
Yang
C
,
Bai
H
, et al
.
Predicting tumor mutational burden from lung adenocarcinoma histopathological images using deep learning
.
Front Oncol
2022
;
12
:
927426
.
55.
Sha
L
,
Osinski
BL
,
Ho
IY
,
Tan
TL
,
Willis
C
,
Weiss
H
, et al
.
Multi-field-of-view deep learning model predicts non–small cell lung cancer programmed death-ligand 1 status from whole-slide hematoxylin and eosin images
.
J Pathol Inform
2019
;
10
:
24
.
56.
Ebert
MP
,
Meindl-Beinker
NM
,
Gutting
T
,
Maenz
M
,
Betge
J
,
Schulte
N
, et al
.
Second-line therapy with nivolumab plus ipilimumab for older patients with esophageal squamous cell cancer (RAMONA): a multicenter, open-label, phase II trial
.
The Lancet Healthy Longevity
2022
;
3
:
e417
27
.
57.
Zeng
Q
,
Klein
C
,
Caruso
S
,
Maille
P
,
Laleh
NG
,
Sommacale
D
, et al
.
Artificial intelligence predicts immune and inflammatory gene signatures directly from hepatocellular carcinoma histology
.
J Hepatol
2022
;
77
:
116
27
.
58.
Shen
C
,
Schlager
C
,
Rajan
D
,
Pouryahya
M
,
Lin
M
,
Mountain
V
, et al
.
Abstract 1922: Application of an interpretable graph neural network to predict gene expression signatures associated with tertiary lymphoid structures in histopathological images
.
Cancer Res
2022
;
82
:
1922
.
59.
Rosenberg
JE
,
Hoffman-Censits
J
,
Powles
T
,
van der Heijden
MS
,
Balar
AV
,
Necchi
A
, et al
.
Atezolizumab in patients with locally advanced and metastatic urothelial carcinoma who have progressed following treatment with platinum-based chemotherapy: a single-arm, multicenter, phase II trial
.
Lancet
2016
;
387
:
1909
20
.
60.
Woerl
A-C
,
Eckstein
M
,
Geiger
J
,
Wagner
DC
,
Daher
T
,
Stenzel
P
, et al
.
Deep learning predicts molecular subtype of muscle-invasive bladder cancer from conventional histopathological slides
.
Eur Urol
2020
;
78
:
256
64
.
61.
Kather
JN
,
Schulte
J
,
Grabsch
HI
,
Loeffler
C
,
Muti
H
,
Dolezal
J
, et al
.
Deep learning detects virus presence in cancer histology
.
bioRxiv
2019
:
690206
.
62.
Kim
RH
,
Nomikou
S
,
Coudray
N
,
Jour
G
,
Dawood
Z
,
Hong
R
, et al
.
Deep learning and pathomics analyses reveal cell nuclei as important features for mutation prediction of BRAF-mutated melanomas
.
J Invest Dermatol
2022
;
142
:
1650
8
.
63.
Wolchok
JD
,
Chiarion-Sileni
V
,
Gonzalez
R
,
Rutkowski
P
,
Grob
J-J
,
Cowey
CL
, et al
.
Overall survival with combined nivolumab and ipilimumab in advanced melanoma
.
N Engl J Med
2017
;
377
:
1345
56
.
64.
Hu
J
,
Cui
C
,
Yang
W
,
Huang
L
,
Yu
R
,
Liu
S
, et al
.
Using deep learning to predict anti–PD-1 response in melanoma and lung cancer patients from histopathology images
.
Transl Oncol
2021
;
14
:
100921
.
65.
Xie
C
,
Vanderbilt
C
,
Feng
C
,
Ho
D
,
Campanella
G
,
Egger
J
, et al
.
Computational biomarker predicts lung ICI response via deep learning-driven hierarchical spatial modelling from H&E. Research Square
.
2022
.
66.
Madabhushi
A
,
Wang
X
,
Barrera
C
.
Predicting response to immunotherapy using computer extracted features of cancer nuclei from hematoxylin and eosin (HandE) stained images of non–small cell lung cancer (NSCLC)
.
US Patent 11,055,844
.
2021
.
Available
: https://patents.google.com/patent/US11055844B2/en.
67.
Schömig-Markiefka
B
,
Pryalukhin
A
,
Hulla
W
,
Bychkov
A
,
Fukuoka
J
,
Madabhushi
A
, et al
.
Quality control stress test for deep learning-based diagnostic model in digital pathology
.
Mod Pathol
2021
;
34
:
2098
108
.
68.
Howard
FM
,
Dolezal
J
,
Kochanny
S
,
Schulte
J
,
Chen
H
,
Heij
L
, et al
.
The impact of site-specific digital histology signatures on deep learning model accuracy and bias
.
Nat Commun
2021
;
12
:
4423
.
69.
Yamashita
R
,
Long
J
,
Banda
S
,
Shen
J
,
Rubin
DL
.
Learning Domain-agnostic visual representation for computational pathology using medically irrelevant style transfer augmentation
.
IEEE Trans Med Imaging
2021
;
40
:
3945
54
.
70.
Obermeyer
Z
,
Powers
B
,
Vogeli
C
,
Mullainathan
S
.
Dissecting racial bias in an algorithm used to manage the health of populations
.
Science
2019
;
366
:
447
53
.
71.
Sounderajah
V
,
Ashrafian
H
,
Golub
RM
,
Shetty
S
,
De Fauw
J
,
Hooft
L
, et al
.
Developing a reporting guideline for artificial intelligence–centered diagnostic test accuracy studies: the STARD-AI protocol
.
BMJ Open
2021
;
11
:
e047709
.
72.
Collins
GS
,
Dhiman
P
,
Navarro
CLA
,
Ma
J
,
Hooft
L
,
Reitsma
JB
, et al
.
Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence
.
BMJ Open
2021
;
11
:
e048008
.
73.
Zwanenburg
A
,
Vallières
M
,
Abdalah
MA
,
Aerts
HJWL
,
Andrearczyk
V
,
Apte
A
, et al
.
The image biomarker standardization initiative: standardized quantitative radiomics for high-throughput image-based phenotyping
.
Radiology
2020
;
295
:
328
38
.
74.
Lambin
P
,
Leijenaar
RTH
,
Deist
TM
,
Peerlings
J
,
de Jong
EEC
,
van Timmeren
J
, et al
.
Radiomics: the bridge between medical imaging and personalized medicine
.
Nat Rev Clin Oncol
2017
;
14
:
749
62
.
75.
U.S. Food and Drug Administration
.
Good machine learning practice for medical device development: guiding principles
. Available from: https://www.fda.gov/medical-devices/software-medical-device-samd/good-machine-learning- practice-medical-device-development-guiding-principles.
76.
Park
Y
,
Kim
MJ
,
Choi
Y
,
Kim
NH
,
Kim
L
,
Hong
SPD
, et al
.
Role of mass spectrometry-based serum proteomics signatures in predicting clinical outcomes and toxicity in patients with cancer treated with immunotherapy
.
J Immunother Cancer
2022
;
10
:
e003566
.
77.
Wei
C
,
Wang
M
,
Gao
Q
,
Yuan
S
,
Deng
W
,
Bie
L
, et al
.
Dynamic peripheral blood immune cell markers for predicting the response of patients with metastatic cancer to immune checkpoint inhibitors
.
Cancer Immunol Immunother
2022
.
78.
Kato
S
,
Li
B
,
Adashek
JJ
,
Cha
SW
,
Bianchi-Frias
D
,
Qian
D
, et al
.
Serial changes in liquid biopsy-derived variant allele frequency predict immune checkpoint inhibitor responsiveness in the pan-cancer setting
.
Oncoimmunology
2022
;
11
:
2052410
.
79.
European Union. Regulation (EU) 2017/745 of the European Parliament and of the Council of 5 April 2017 on medical devices, amending Directive 2001/83/EC, Regulation (EC) No 178/2002 and Regulation (EC) No 1223/2009 and repealing Council Directives 90/385/EEC and 93/42/EEC (Text with EEA relevance). Available from:
https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32017R0745.
80.
Klein
S
,
Quaas
A
,
Quantius
J
,
Löser
H
,
Meinel
J
,
Peifer
M
, et al
.
Deep learning predicts HPV association in oropharyngeal squamous cell carcinomas and identifies patients with a favorable prognosis using regular H&E stains
.
Clin Cancer Res
2021
;
27
:
1131
8
.
81.
Sirinukunwattana
K
,
Domingo
E
,
Richman
SD
,
Redmond
KL
,
Blake
A
,
Verrill
C
, et al
.
Image-based consensus molecular subtype (imCMS) classification of colorectal cancer using deep learning
.
Gut
2021
;
70
:
544
54
.