Studies of the inherited or germline genome have identified rare mutations with large effects and common polymorphisms of more modest effect sizes that are associated with cancer risk. This research has substantially illuminated the etiology and development of cancer, with particular relevance to cancer prevention. In parallel, studies of the somatic or tumor genome have been instrumental in identifying the key drivers of cancer progression, significantly informing modern cancer therapy. While these studies have thus far largely been performed separately, integrative studies where the germline and somatic genomes are mapped in the same individuals have the potential to yield novel and holistic insights into cancer biology. In this issue of Cancer Research, Liu and colleagues report the results of integrative germline–somatic analyses in over 12,000 patients with cancer and 11 cancer types, identifying several associations where inherited variants that regulate the expression of a nearby gene in normal tissues are associated with tumor mutations in the same gene or with genome-wide somatic traits such as the tumor mutational burden. Although considerable follow-up work is required, the study is an important contribution to an emerging body of evidence that is demonstrating that the germline has a vital role in shaping the tumor genome.

See related article by Liu et al., p. 1191

The first genome-wide association study (GWAS) of cancer susceptibility was published in 2007, and the first tumor site-specific study from The Cancer Genome Atlas (TCGA) project was published in 2008. In the decade and a half since then, germline and somatic genomic studies of cancer have contributed immensely to our understanding of cancer etiology, development, progression, and treatment. However, a defining feature of these germline and somatic (or more specifically, tumor) studies so far has been that they have—with a few noteworthy exceptions—been performed independently of each other. Systematic efforts that integrate large-scale germline and somatic data sets to uncover interactions between host-inherited genetics and the tumor genome/epigenome are urgently required for insights into the mechanisms that underpin the evolution from normal tissue to metastasis and yield potential targets for early preventative or therapeutic intervention. Given recent rapid advances in immuno-oncology, germline–somatic associations must also be mapped if we are to comprehensively unravel the complex interplay between the host's immune system, the tumor, and immunotherapy response.

The relative paucity of combined germline–somatic studies is largely due to the lack of cancer cohorts that have both these genomic data types on the same individuals at the sample sizes required to power the discovery of associations. While the focus of TCGA has been on tumor multiomics, the TCGA data set does include germline (blood DNA) genotype calls for over 9,000 cancer cases spanning 30+ cancer types along with their matched tumor gene expresssion, copy number, methylation, and mutation data. These matched data have formed the basis for most of the germline–somatic association studies published to date. Some of these studies have identified germline genetic determinants with large effects on PTEN and SF3B1 somatic mutation status (1), suggested that roughly 13% of the variation in tumor mutational burden can be explained by an inherited polygenic component (2), and revealed that tumors developing on a background of elevated polygenic cancer risk are likely to have an earlier age of onset and lower overall burden of somatic aberrations (3). Whole-genome sequencing of more than 2,600 tumors and their matched normal tissue DNA samples by the Pan-Cancer Analysis of Whole Genomes Consortium, which partially overlaps the TCGA cohort, identified one genome-wide significant germline signal tagging a 30 kilobase deletion in APOBEC3B associated with APOBEC3B-like somatic mutagenesis across multiple cancer types (4). Consistent with what we know from GWAS of cancer risk, common inherited polymorphisms are likely to have associations of small-to-moderate effect sizes with tumor genomic traits and therefore bigger sample collections are required for germline–somatic association studies.

An innovative way that has recently emerged to dramatically improve the power of germline–somatic association studies is to leverage tumor-only targeted gene panel sequencing cohorts that are currently available at scale and use these sequences to infer the germline genotype, followed by downstream germline–somatic association testing. This is not trivial and takes advantage of algorithms initially developed for imputation of common germline variant genotypes from low-coverage germline sequencing by exploiting patterns of linkage disequilibrium learnt from the low-coverage sequences and publicly available population genetic reference panels such as 1000 Genomes. One such algorithm, STITCH (5), was previously applied to matched tumor panel sequencing and germline array genotyping data (the latter being used only to validate the approach). Using the combination of on-target reads (i.e., reads that cover the genes targeted by the panel) and off-target reads (i.e., reads that are outside these regions and traditionally discarded) from the tumor DNA, STITCH was shown to recover the genome-wide germline genotypes with high accuracy (6). In this issue of Cancer Research, Liu and colleagues apply the STITCH algorithm to data from over 12,000 patients with cancer from the Dana-Farber Profile cohort spanning 11 cancer types and report the results of subsequent germline–somatic association analyses (7). They evaluated the hypothesis that specific germline genetic variants termed as cis-acting expression quantitative trait loci (eQTL), which are associated with and likely causally regulate the expression of nearby genes (usually within 1 megabase of the variant) in normal tissues, are determinants of tumor mutational traits via their impact on gene expression.

Tumor samples from patients in the Profile cohort had been sequenced for a targeted panel of known oncogenes and tumor suppressor genes involved in well-established (and often clinically targetable) cancer signaling pathways. The authors examined tumor mutational traits at two levels—global and local. The global level included the tumor mutational burden (total missense mutations per megabase of the gene panel sequenced), a recognized biomarker of immunotherapy response in multiple cancer types. It also included tumor mutation counts (number of putative cancer driver genes that carried at least one missense mutation). Mutations in any one gene on the panel and hotspot mutations (specific mutations that were observed in over 5% of the samples) were considered to be at the local level. On the germline side of this study, the authors first identified over 28,000 genome-wide significant eQTLs for 114 of the cancer genes on the targeted sequencing panel using variant-gene expression associations available from the normal tissue-based Genotype-Tissue Expression (GTEx) project. Given that gene expression is often correlated across tissue types, the authors maximized the power to detect eQTLs by performing a meta-analysis of eQTL signals identified in single tissue types in GTEx. They then evaluated the relationship between these germline eQTL variants and the global and local tumor mutational traits.

At the global level, the authors found that eQTLs for GLI2, WRN, and CBFB were associated with the tumor mutational burden in ovarian, glioma, and esophagogastric cancers, respectively, while eQTLs for APC, WRN, GLI1, FANCA, and TP53 were associated with the tumor mutation counts in endometrial cancer. An eQTL for EPHA5 was also associated with the tumor mutation count in colorectal cancer. At the local level, a germline eQTL associated with reduced expression of ATM was found to be associated with lower risks of having somatic ATM mutations in eight cancer types including glioma, breast, prostate, ovarian, endometrial, colorectal, bladder, and non–small cell lung cancers.

Like all good research, this work gives rise to new scientific questions and future directions for potential exploration. First, the identified associations are between germline variants and tumor mutational traits and it is not known whether these mutational associations develop at the precancer/tumor initiation stage or during cancer progression. Studies that map the evolution of these mutational associations in time, such as the NCI-sponsored PreCancer Genome Atlas, may eventually provide answers to this question. Second, only in a few instances were the authors able to demonstrate that the eQTL signal was in the normal tissue type that corresponds to the cancer type. Laboratory-based functional genomic experiments in the appropriate model system involving genome editing of the germline variant followed by RNA sequencing of the target gene may help conclusively elaborate the link between germline variant, gene expression, and induction of somatic mutation, but the best molecular and cellular read-outs incorporating the mutation in such an experiment will have to be determined with care. Third, the vast majority of germline variants identified by GWAS of cancer predisposition lie in non-coding regions of the genome and mechanistic follow-up of these variants have thus far largely assessed their influence on transcriptomic and epigenomic expression. The current study design can easily be repurposed to dissect the association between cancer risk variants and somatic mutations and mutational signatures. Fourth, many of the germline–somatic associations uncovered in this study involve genes such as ATM, TP53, APC, and EPHA5 that have been shown in at least one cancer type to be related to immunotherapy response. The germline eQTL variants for these genes are quite common in the population with minor allele frequencies ranging from 6% to 45% and, unlike gene expression or somatic mutation, the germline is fixed at conception. This gives rise to the intriguing possibility that these variants could be evaluated in isolation or collectively as a multivariant score as pretreatment predictors of immunotherapy response (and toxicity) in clinical trials with matched germline data as has recently been done for other germline variants and polygenic scores (8, 9).

No disclosures were reported.

S.P. Kar is supported by a United Kingdom Research and Innovation Future Leaders Fellowship (grant number MR/T043202/1).

1.
Carter
H
,
Marty
R
,
Hofree
M
,
Gross
AM
,
Jensen
J
,
Fisch
KM
, et al
.
Interaction landscape of inherited polymorphisms with somatic events in cancer
.
Cancer Discov
2017
;
7
:
410
23
.
2.
Sun
X
,
Xue
A
,
Qi
T
,
Chen
D
,
Shi
D
,
Wu
Y
, et al
.
Tumor mutational burden is polygenic and genetically associated with complex traits and diseases
.
Cancer Res
2021
;
81
:
1230
9
.
3.
Namba
S
,
Saito
Y
,
Kogure
Y
,
Masuda
T
,
Bondy
ML
,
Gharahkhani
P
, et al
.
Common germline risk variants impact somatic alterations and clinical features across cancers
.
Cancer Res
2023
;
83
:
20
7
.
4.
ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium
.
Pan-cancer analysis of whole genomes
.
Nature
2020
;
578
:
82
93
.
5.
Davies
RW
,
Flint
J
,
Myers
S
,
Mott
R
.
Rapid genotype imputation from sequence without reference panels
.
Nat Genet
2016
;
48
:
965
9
.
6.
Gusev
A
,
Groha
S
,
Taraszka
K
,
Semenov
YR
,
Zaitlen
N
.
Constructing germline research cohorts from the discarded reads of clinical tumor sequences
.
Genome Med
2021
;
13
:
179
.
7.
Liu
Y
,
Gusev
A
,
Kraft
P
.
Germline cancer gene expression quantitative trait loci are associated with local and global tumor mutations
.
Cancer Res
2023
;
83
:
1191
203
.
8.
Khan
Z
,
Hammer
C
,
Carroll
J
,
Di Nucci
F
,
Acosta
SL
,
Maiya
V
, et al
.
Genetic variation associated with thyroid autoimmunity shapes the systemic immune response to PD-1 checkpoint blockade
.
Nat Commun
2021
;
12
:
3355
.
9.
Robert
C
,
Vagner
S
,
Mariette
X
.
Using genetics to predict toxicity of cancer immunotherapy
.
Nat Med
2022
;
28
:
2471
2
.