Abstract
Cancer is largely a disease of the tumor cell genome. As a result, the majority of genetics research in oncology has concentrated on the role of tumor somatic mutations, as well as inherited risk variants, in disease susceptibility and response to targeted treatments. The advent and success of cancer immunotherapies, however, have opened new perspectives for the investigation of the role of inherited genetic variation in codetermining outcome and safety. It is increasingly likely that the entirety of germline genetic variation involved in regulating immune responses accounts for a significant fraction of the observed variability in responses to cancer immunotherapies. Although germline genetic data from patients treated with cancer immunotherapies are still scarce, this line of research benefits from a vast body of knowledge derived from studies into autoimmune and infectious disease phenotypes, thus not requiring a start from a blank slate. Here, we discuss how a thorough investigation of genomic variation relevant for individuals’ variability in (auto)immune responses can contribute to the discovery of novel treatment approaches and drug targets, and yield predictive biomarkers to stratify cancer patient populations in precision and personalized medicine settings.
Introduction
The entirety of factors that influence antitumor immunity or tolerance has been described as the cancer–immune set point, defined as the variable threshold above which an immune response is likely to occur. An individual's genetic differences comprise one aspect of this multidimensional conceptual framework (1). As such, it is possible that differences in response to cancer immunotherapy (CIT) are in part driven by how our genomes configure our immune system to respond to immunologic challenges in general, and cancer in particular.
The toolbox of human genomics research has improved our understanding of the pathophysiology and causes of (auto)immunity and infectious diseases. In general, a complex pattern of common genetic variation underlies most complex disease phenotypes (2). Unlike genetic disorders with high penetrance caused by single genes that obey simple Mendelian inheritance, these phenotypes reflect the combined contributions of a multitude of variants, each of them exhibiting a subtle, mostly additive effect size (2). Applied to the cancer–immune set point framework, complex genetics controls the probability of mounting an immune response to cancer, the likelihood of response to a specific therapy, and the risk of developing a therapy-associated immune-mediated adverse event (imAE).
Here, we classify germline genomic variation according to its potential role in cancer immunology (Fig. 1), review initial published evidence, and make a case for large-scale genetic and integrated analyses of CIT-relevant phenotypes.
Genome-Wide Discovery and the Blank Slate
A prime example of what human genomics can contribute to our understanding of complex immune-related traits is the progress achieved by the International Multiple Sclerosis (MS) Genetics Consortium. The first genome-wide study from this consortium, which was published in 2007, identified risk variants in IL2RA, IL7RA as well as the MHC locus (3), and its most recent effort resulted in a genomic map of more than 200 risk loci, explaining almost 50% of the genetic contribution of MS (4). The identification of these risk loci, combined with the use of gene regulation and protein interaction data, made it possible to map associated loci to specific immune cell types (5). While supporting the established role of B cells, these studies also suggest microglia as important players in disease pathophysiology. It is too early to find evidence for successful clinical translation, but the results have inspired further investigation of implicated pathways and potential therapeutic strategies (6). Overall, it has been shown that drug targets have significantly higher chances of getting approved if there is supportive genetic data (7, 8).
In contrast to the trailblazing efforts that defined complex disease genomics for inflammatory disorders, germline genomics research in CIT does not need to start from a blank slate as it can benefit from preexisting knowledge of a wide spectrum of autoimmunity-associated variants and genes (Fig. 1A). There is considerable overlap in the genetic architecture of complex diseases; immune-related traits cluster among each other as well as with infectious disease phenotypes (9). Some genes have been implicated in many immune-relevant traits (7). For example, a nonsynonymous variant (rs2476601) in the gene coding protein tyrosine phosphatase nonreceptor type 22 (PTPN22) is associated with a range of autoimmune diseases, including type 1 diabetes and lupus erythematosus (10, 11). PTPN22 has also been suggested as a target for CIT, based on experiments in genetically modified mice that showed a link between PTPN22 phosphatase activity and antitumor immunity (12). This suggestion is also supported by human genetics evidence. The autoimmune risk variant rs2476601 is associated with decreased risk of skin cancer, as well as better overall survival and increased risk for hyperthyroidism and hypothyroidism in patients treated with anti–PD-L1.
Generalizing this approach, our knowledge of the polygenic architecture of complex immune traits allows us to define and analyze credible sets of common variants in the context of CIT. Although genome-wide screens are warranted once a critical mass of data is available, more focused approaches that evaluate the contributions of individual candidate genes can reduce the multiple testing burden and generate valuable insight in relatively smaller clinical cohorts (Fig. 2). With regard to such credible sets of variants, we are not limited to SNPs associated with immune-relevant, clinically defined disease phenotypes, but can also make use of recent studies into the genetic contributions to variations in the baseline immune responses inferred simply from the profile of cells found in the peripheral blood. Sayaman and colleagues reported a comprehensive investigation of the role of germline genetic variation in shaping the tumor immune landscape, using immune traits derived from The Cancer Genome Atlas (13). Among other things, they found variants in IFIH1 and STING1 associated with differences in IFN signaling, and SNPs in RBL1 associated with the abundance of various T-cell subsets.
Large-scale analyses of blood-cell phenotypes in >500,000 participants in the UK Biobank yielded >5,000 independent genetic associations with variables including cell counts, relative frequencies of white blood cells, and hematopoiesis (14). A different study involving 1,000 healthy Western Europeans found that phenotypic variation in innate immune cells has a stronger genetic component than in the adaptive immune system, and suggested a strong genetic control of cell-surface expression of several immune cell markers (15). For example, a variant close to the gene coding for sphingosine 1-phosphate receptor S1P1 (CD363) was associated with cell-surface protein expression of CD69 in CD16hi natural killer (NK) cells. A third study of 3,757 Sardinians also identified >100 genetic associations with immune cell traits (16). Of note, CD28 levels on diverse T-cell subsets were affected by SNPs in the genomic locus harboring the CD28 and CTLA4 genes, but also in trans by a variant in proximity of BACH2. All of these associations were also previously found to be associated with several autoimmune diseases (16). It is likely that genetic associations study results for CIT-relevant outcome and toxicity phenotypes will be enriched for variants previously implicated in autoimmunity and immune-cell traits, and the wealth of existing literature will be very useful to obtain mechanistic hypotheses and inform downstream forward translational research.
In clinical trial settings, precise measures of response and outcome (overall and progression-free survival) are usually available and can be used as endpoints for genetic association studies. This is more challenging in “real-world” settings, where outcome often must be estimated from other variables including date of treatment onset or death. But comparable with most complex, heterogeneous, and “fuzzy” phenotypes, large sample sizes can alleviate the problem and increase the signal-to-noise ratio. Another promising approach is the utilization of intermediate phenotypes, which are quantitative and heritable biological traits that can be derived from multiple sources including molecular analyses of the tumor microenvironment. For example, we can now aim to identify genetic predictors of tumor immune phenotypes that have been shown to be predictive for outcome in patients treated with CIT (Fig. 2; refs. 17, 18). It is conceivable that different germline and tumor genetic profiles predispose patients to develop inflamed, immune-excluded or immune desert tumors. Furthermore, several gene signatures derived from tumor transcriptomic data have been shown to predict patient outcomes (19, 20), and the strength of such signatures could be codetermined by germline genetic variation.
In contrast to the investigation of clinical outcomes, studies focusing on intermediate phenotypes can provide a more direct inside into the biology underlying a statistical association, thereby possibly offering a shortcut for the nomination of novel drug targets and combinations, as well as functional investigations into relevant pathways.
Autoimmune Polygenic Risk
If the entirety of immune-relevant genomic variation contributes to the positioning of an individual on a spectrum between tolerance and immunity, or between immune suppression and inflammation, then one large area of focus should be the evaluation of autoimmune polygenic risk in the context of CIT outcome and safety phenotypes (Fig. 1B). Polygenic risk scores (PRS) have emerged as promising biomarkers for the prediction of disease risk, not only in the area of cardiovascular disorders, but also oncology (21). These risk scores also have become increasingly available for a multitude of phenotypes and are systematically curated in a free online database (22).
It has been shown that certain preexisting autoimmune diseases as well as the occurrence of imAE upon treatment are associated with better response to checkpoint inhibitors (23). This link between autoimmunity and antitumor immunity likely reflects a link between the propensity of an individual patient's ability to respond to therapy and the propensity of response to any immune agonist (24). It is thus reasonable to hypothesize that PRS for autoimmune diseases might be predictive of both outcome as well as the risk for imAE in patients treated with CIT (25). In fact, one study has demonstrated an association of PRS for dermatologic autoimmunity with outcome. A high PRS for vitiligo and psoriasis, as well as a low PRS for atopic dermatitis, was found to be associated with longer overall survival in patients with bladder cancer treated with the anti–PD-L1 atezolizumab (26). Of note, this finding possibly reflects the fact that psoriasis is largely driven by Th17 biology, in contrast to a Th2 polarization for atopic dermatitis (27). Th2 polarization, mechanistically, is associated with poor immune responses to cancer (28, 29).
Genetic risk for hypothyroidism, estimated using a PRS derived from UK Biobank data, was found associated with increased risk for thyroid dysfunction in patients with cancer treated with atezolizumab, and also with lower risk of death among patients with triple-negative breast cancer (30). A similar investigation with patients with non–small cell lung cancer treated with diverse immune checkpoint inhibitors yielded similar results with respect to hypothyroidism risk, although no association with outcome was observed (31).
PRS, which consist of a multitude of single variants with small effect sizes, have the potential to be developed into biomarkers for outcome and safety. It would also be of interest to dissect them into their functional components to identify or prioritize potential therapeutic targets. One possible approach to this problem is the generation of partitioned polygenic scores according to factors of disease heterogeneity, as successfully demonstrated for type 2 diabetes (32). Another strategy could be the mapping of statistically associated genetic loci to different immune-cell subtypes according to gene expression patterns derived from single-cell RNA sequencing (33).
Autoimmune PRS, possibly in combination with other genetic and nongenetic predictors, may be of importance to manage the risk of imAE in patients treated with CIT while maintaining treatment efficacy. This is especially relevant in the context of combination therapies, because high-grade imAE are more likely to occur with CIT combinations versus monotherapies (34).
Although the use of autoimmune PRS is of great value, polygenic scores derived directly from genome-wide association studies (GWAS) in patients treated with CIT would also be expected to contribute new insights. However, the GWAS approach will likely require sample sizes in the tens of thousands, similar to available studies in the field of autoimmune diseases, and possibly more difficult to accrue in cancer. It is also important to note that the predictive potential of PRS is heavily biased toward patients of White European ancestry, a consequence of the same bias in the underlying GWAS (35). A greater diversity in large-scale genetic association studies is needed to alleviate health disparities.
Battle of the Genomes
An attractive feature of human genetics research in oncology is the interaction of two (closely related) genomes, one (the tumor) evolving in a framework codefined by the other (the host). The selective pressure underlying tumor evolutionary trajectories is exhibited by components of the immune system, whose properties can be defined to be in part determined by inherited genetic variation. Both genome sequences being available, it is thus possible to investigate whether the likelihood for a somatic mutation to become clonal is associated with germline genetic factors (Fig. 1C). Such studies have been referred to as “genome-to-genome” association studies in infectious disease research, and were successful in finding footprints of selective pressure exhibited by the host genome on human immunodeficiency virus and hepatitis C virus (36, 37).
The central question is whether we can use germline genomic variation to predict the evolutionary path a tumor will take to avoid immune recognition. Marty and colleagues have shown that a given somatic mutation in a tumor is less likely to occur if the patient carries classical class I or class II human leukocyte antigen (HLA) alleles that are predicted to present peptides containing the mutation (38, 39). Such studies depend on the quality of computational algorithms for antigen presentation prediction, and they might strongly benefit from improvements made in this active field of research.
Not all “genome-to-genome” associations need to be primarily related to the immune system. For example, an intronic variant in the RBFOX1 gene, which encodes an RNA-binding protein involved in splicing, was found to be strongly associated with tumor somatic mutations in SF3B1, which encodes a component of the U2snRNP spliceosome (40). However, the same study also showed an association of germline variants on chromosomes 5 and 18 with increased risk for mutations in CD86, which encodes a ligand for the CD28 costimulatory receptor and the CTLA-4 checkpoint.
It is as yet unknown whether autoimmune polygenic risk, which is likely to be a codeterminant of a patient's cancer–immune set point, also impacts the somatic mutation pattern of that patient's tumor, thus possibly exhibiting both a direct and indirect effect on therapeutic outcome. To summarize, genome-to-genome studies can identify possible drivers of tumor evolution, and they also might be useful to determine whether germline factors associated with outcome are due to a direct role of these variants in immune responses, or possibly result in tumors with different molecular characteristics and associated prognoses.
Genetics of Signal 1 and Antigen Presentation
The primary importance of T-cell receptor (TCR) signaling in adaptive antitumor immunity is well established. Human genetic variation clearly plays a key role in regulating TCR signaling given the allelic variation in HLA proteins, the low hanging fruit for most immune-related phenotypes in terms of genetic associations (Fig. 1D). An individual patient's HLA profile determines the spectrum of neoantigens that can be presented on the surface of tumor cells, ideally eliciting an antitumor immune response by CD8+ T cells. Specific HLA alleles are strongly associated with the risk for many autoimmune and infectious diseases, and at least in some cases causally linked to disease-specific self-antigens (41, 42). This is conceptually different in a cancer setting, where no two patients share the exact same mutational profile. It is therefore perhaps not surprising that a previously published association of the two HLA class I supertypes HLA-B44 and HLA-B62 with worse and better outcome in patients treated with immune checkpoint blockade could not be replicated in a large meta-analysis (43, 44). Another study found a different allele, HLA-A*03, to be a predictive biomarker for poor outcome (45). but the same allele was not found to be associated with outcome in a different publication (46). Further studies and large-scale meta-analyses will be required to answer this question, and to shed light on a potential indication or treatment specificity of such associations.
Because tumor genomes can share important and common driver mutations, it is possible that statistical associations will be found for alleles that are predicted to present such shared neoantigens. Variability in HLA genotypes can also be used to estimate differences in the diversity of antigens presented to individual TCRs. For example, individuals who are homozygous for all class I HLA genes would be expected to be able to present in average fewer neoantigens than individuals with increased heterozygosity. In a study including 1,535 patients treated with CIT, maximum heterozygosity was associated with better outcome compared to cases homozygous at one or more loci (44). Furthermore, HLA diversity can also be measured in terms of evolutionary divergence, quantifying physiochemical characteristics of amino acids in HLA proteins that are relevant for peptide binding. Increased evolutionary divergence also has been found to be associated with better outcome (47). However, both results could not be replicated in a larger meta-analysis, raising doubt about the usefulness of HLA diversity metrics as univariable biomarkers to predict outcome (43, 46). It is likely that the number of different HLA alleles, and even the diversity of presented neoantigens, are not good proxies for immunogenicity and T cell–mediated immune responses. Other factors upstream and downstream of antigen presentation might mask possible effects, and one can speculate that these factors include imbalances in HLA protein expression, for example, through HLA loss or downregulation in the tumor. Furthermore, T-cell responses are often specific to only few of the presented neoantigens (immunodominance), possibly restricting the relevance of a diverse antigen pool.
In addition to their central role in antigen presentation, HLA class I proteins also contribute to the education of NK cells, showing differential binding capacities to killer cell immunoglobulin-like receptors (KIR) that are predominantly expressed on NK cells. KIRs are important regulators of NK-cell tolerance and activation, and they can have inhibitory or activating function (48). NK-cell education is a dynamic process that determines their degree of responsiveness to “missing self” phenotypes, for example, as a result of HLA loss or downregulation, a presumptive and relatively common immune escape mechanism in cancer (49, 50). HLA alleles can thus be grouped according to their interaction with specific KIR. For example, the HLA-Bw4 epitope is defined according to a specific amino acid sequence allowing Bw4 alleles to bind to KIR3DL1 receptors. KIR3DL1+ NK cells from Bw4 homozygous donors display increased responsiveness to HLA-deficient tumors in terms of IFNγ production, and KIR3DL1 predominantly occurs on KIR A haplotypes, which have been associated with improved response to pathogens (51, 52). Patients with follicular lymphoma carrying KIR2DL2, KIR3DL1, and their respective ligands HLA-C1 and HLA-Bw4 show improved outcome, duration of response and tumor shrinkage upon treatment with the anti-CD20 rituximab (53). Such associations remain to be investigated in large CIT-treated patient cohorts with solid tumors, which is of relevance given the high frequency of genetic and epigenetic immune escape mechanisms resulting in reduced HLA expression (50).
A Fast Track Through Puberty?
Decades of investigations into the genetics of complex diseases have created a solid basis for human genomics research in CIT, and also offer a blueprint for how to practically deliver on the promise of meaningful scientific insight (54, 55). Large-scale collaborations involving teams of experts in human genomics, cancer immunology, and clinical discovery research will be required to drive high-powered analyses, including the open sharing of data within the boundaries of an ethical framework protecting genetic privacy. Because of the historical focus in oncology on somatic mutations and the tumor genome, it will be important to inform and educate the clinical community of the potential and therefore the importance of collecting germline genomic information, to help physicians explain it to their patients, and to provide appropriate consent language for clinical trials and real-world data collections. The good news is that, with proper consent, the collection of patient samples for germline analysis is inexpensive and minimally invasive (i.e., peripheral blood).
One possible obstacle might be the necessity to generate as much additional data as possible beyond genome-wide SNP genotyping. From a statistical perspective, a clean genetic association signal requires the availability of covariates that also have an influence on the phenotype of interest. This could, for example, be tumor mutational patterns affecting outcome, or considering the role of cytomegalovirus in shaping blood cell traits, for example, the amount of effector memory T cells (15). The germline genome, tumor genome and the sum of all extrinsic and nongenetic factors are likely intertwined in a way that makes it difficult if not impossible to investigate the role of one without considering the others. This can be conceptualized as a “three body problem of cancer immunology.” Although a comprehensive and systematic collection of specimen and data is a challenging endeavor in terms of operationalization and cost, molecular data generation including SNP genotyping and tumor sequencing have become increasingly affordable. Another important consideration is the granularity of clinical phenotyping. For example, if we aim to identify genetic drivers of imAE, it is likely not sufficient to only capture high-grade events that require therapeutic intervention or regimen changes. Most imAE are low grade, but if we assume shared biology between low- and high-grade events, it will be important to capture both to build sufficiently sized and well-defined cohorts of cases and controls.
The ever-increasing availability of germline genomic data from patients with cancer treated in clinical trial or standard-of-care settings offers opportunities for highly collaborative initiatives aiming to shed light on a key component of the cancer–immune setpoint. In addition, the vast pool of existing knowledge about immune-relevant genomic variation fuels hopes for an expedited path from variant (statistical association) to function (understanding how a variant affects a disease process). It is therefore an exciting time for those working in the field of cancer immunology to embrace the potential of human genomics research.
Authors' Disclosures
C. Hammer and I. Mellman report other support from Genentech during the conduct of the study; other support from Genentech outside the submitted work.