Abstract
With the recent explosion in high-throughput genotyping technology, the amount and quality of SNP data have increased exponentially, facilitating the discovery of multiple uncommon SNPs in the human population. To provide unified and centralized resources for the scientific community, several repositories have been developed that aggregate numerous population studies and serve widely as references to filter natural variants in genetic analyses. However, they are largely biased toward European populations. TP53 gene is the most frequently mutated gene in human cancer, and pathogenic germline TP53 variants are associated with several cancer susceptibility disorders such as Li–Fraumeni syndrome. For these reasons, it is essential that TP53 SNPs are rigorously evaluated to avoid misclassifications that could impair patient management. The recent discovery of numerous benign SNPs within the coding region of TP53 can be attributed to surveillance of both global repositories and population-specific databases, with the latter enabling the recognition of additional TP53 SNPs in Japanese, African, and Indian populations. This review summarizes the body of evidence behind the identification of 21 TP53 variants and the information defining them as bona fide SNPs. This illustrates the need to include populations of different ethnic origins in genetic studies and the substantial benefits that can be derived from the information.
Introduction
The emergence of massively parallel sequencing technology, also called next-generation sequencing (NGS), more than a decade ago represented a major technological advance over traditional Sanger sequencing (1). NGS technologies enabled ambitious large-scale sequencing projects that have transformed our understanding of human health and disease, including in the settings of Mendelian disorders and cancers (2). It has also ushered in a new era for the analysis of the human population, and, with it, the recognition that the human genome shows far more polymorphism than initially suspected and includes many SNP variants at frequencies well below the previously-used 1% threshold (3). In 2014, the Exome Aggregation Consortium (ExAC) released exome data from 60,706 individuals (4). The resulting database was designed to be used as a novel reference set for determining SNP allele frequency. In 2016, the ExAC database was transformed into the Genome Aggregation Database (gnomAD). New data accompanied that transformation, creating thus a dataset with information from 123,136 exome sequences and 15,496 whole‐genome sequences of unrelated individuals (5). ExAC and gnomAD have both been widely used as a substitute or complement for the single nucleotide polymorphism database (dbSNP). Nonetheless, a lack of diversity is an important concern with these various databases, as they tend to be highly biased toward European populations. For example, most of the larger databases like ExAC or gnomAD are more than 50% European. The most notable exception to that tendency is the 1,000 Genomes Project, which is, inversely, largely non-European (80%; Supplementary Fig. S1). Furthermore, some specific populations, such as the Japanese, are nearly absent from all these repositories. That bias in the databases is highly detrimental to the accurate analysis of patients from populations not fully represented in them.
Historically, SNPs were defined as single-nucleotide variants differing between an individual and a reference genome. Furthermore, an arbitrary cut off of 1% was assigned: SNPs with frequencies above that threshold were thought to be natural variations occurring in the human population and those below it associated with various genetic diseases (6, 7). This assumption was based on the analysis of protein coding regions performed before the advent of NGS. Today however, because of the discovery of the high frequency of rare, nonpathogenic, variants (less than 1%) in the human population, the time has come to reassess that assumption (8). Multiple studies have now shown that rare variants with no functional or clinical consequences are scattered about in the human genome. Furthermore, they show important population diversity, which leads to such classifications as “private,” “shared,” or “monomorphic” SNPs to define variants restricted to either families or specific ancestries (9). This issue also raises the question of what is or should be considered as a reference.
More than 300,000 cancer genomes (partial or total) have been identified via large sequencing projects, such as the The Cancer Genome Atlas (TCGA) consortium, Project Genie and ICGC, or through individual studies (10–12). One major issue in these works is the lack of matched normal (germline) samples due to cost constraints or the unavailability of materials in retrospective studies. That dearth makes it difficult to distinguish true somatic mutations from constitutional benign variants. SNP databases like gnomAD are currently widely used to filter out potential SNPs. Although several thresholds based on allele number (AN) or allele frequency (AF) have been suggested to identify them, true polymorphic variants can vary greatly from one gene to another and no consensus thereto has been reached as of this writing. Furthermore, the population bias in these databases may lead to variant misinterpretation for population-specific SNPs and, additionally, it is likely that the repositories include rare germline SNPs.
TP53 is the most frequently mutated gene in cancer, and somatic mutations within it are usually associated with poor clinical outcomes (13, 14). In chronic lymphocytic leukemia (CLL), the analysis of TP53 mutations has been incorporated into routine clinical diagnostics to improve patient stratification, optimize therapeutic decisions, and enable novel targeted therapies. CLL is thus the first cancer to benefit from TP53 research (15). Li–Fraumeni syndrome (LFS) is a rare cancer predisposition disorder associated with an inherited germline mutation in TP53 (16). Carriers of the mutation are at risk of developing multiple primary cancers at a young age. Surveillance protocols for TP53 mutations have the potential to improve outcomes in individuals and families in the LFS setting as well (17).
As both germline and somatic mutations of TP53 are used in the clinic, it is essential that pathogenic variants are clearly identified from rare benign SNPs. Taking into consideration the two issues described above, that is, the high frequency of rare SNPs and the lack of diversity in some reference databases, we recently performed an extensive analysis of TP53 variants in both global and population-specific databases from different countries such as Japan, Korea, China, Taiwan, or Australia (Supplementary Figs. S2–S4; ref. 18). That analysis, buttressed by subsequent reports, revealed that the coding region of TP53 contains numerous functional germline polymorphisms including specific population variants, some of which are still considered as somatic pathogenic variants in a number of databases (Fig. 1A; ref. 19).
This work provides a summary of current knowledge on these SNPs and multiple lines of evidence showing that TP53 is far more polymorphic than previously described, including the identification of functional population-specific variants. For the present review, these variants have been reassessed, using either updated or novel population databases, to increase the accuracy of the analyses (Supplementary Table S1). Furthermore, because they are more relevant for diagnostics, only missense variants located in the main isoform of TP53 will be discussed here (NM_000546.6 or LRG_321t1). SNPs found in exons β and γ have been described previously (14).
SNPs in the p53 Signaling Pathway: Current Knowledge
As of this writing, two missense SNPs in TP53, rs1042522;p.Pro72Arg, and rs1800371;p.Pro47Ser have been ascertained and extensively characterized (Fig. 1A and B; Table 1; Supplementary Table S2). Both are included in ClinVar and considered benign according to the American College of Medical Genetics and Genomics (ACMG) criteria (20, 21). SNP rs1042522;p.Pro72Arg is the most frequent SNP in the coding region of TP53 (AF 0.4 to 0.8). It is common in all ethnicities but with a lower frequency of the ancestral allele (Pro) in the African population (Fig. 1A).
RS_ID . | cDNA_Varianta . | Protein_variantb . | Ethnicity . | Clinvarc . | ACMG_UMD . |
---|---|---|---|---|---|
rs201382018 | c.31G>C | p.Glu11Gln | Asian | CIPd (LB6;VUS4) | Benign |
rs201753350 | c.91G>A | p.Val31Ile | Japanese/Korean | CIP (B1;LN6;VUS2) | Benign |
rs1800371 | c.139C>T | p.Pro47Ser | African | B (FDA approved) | Benign |
rs587780728 | c.145G>C | p.Asp49His | Japanese/Korean | LB (FDA approved) | Benign |
rs144386518 | c.173C>G | p.Pro58Arg | African | CIP (LB6; VUS1) | Benign |
rs1042522 | c.215C>G | p.Pro72Arg | Global | B (FDA approved) | Benign |
rs587782423 | c.217G>A | p.Val73Met | European (Finn?) | B (FDA approved) | Benign |
rs368771578 | c.319T>C | p.Tyr107His | African | B (FDA approved) | Benign |
rs11540654 | c.329G>A | p.Arg110His | Global | LB (FDA approved) | Likely benign |
rs563378859 | c.466C>T | p.Arg156Cys | Global | CIP (LB2; VUS4) | Likely benign |
rs150607408 | c.554G>A | p.Ser185Asn | African | LB (FDA approved) | Likely benign |
rs121912665 | c.566C>T | p.Ala189Val | Asian | CIP (B1; LB2; VUS3) | Benign |
rs146340390 | c.665C>T | p.Pro222Leu | European | CIP (B1; LB2; VUS5) | Uncertain significance |
rs144340710 | c.704A>G | p.Asn235Ser | Global; Asian low | B (FDA approved) | Benign |
rs72661119 | c.787A>G | p.Asn263Asp | South Asian | CIP (LB5; VUS2) | Benign |
rs149633775 | c.847C>T | p.Arg283Cys | Global; Asian low | VUS (FDA approved) | Benign |
rs55819519 | c.869G>A | p.Arg290His | Global; Asian low | B (FDA approved) | Benign |
rs145151284 | c.935C>G | p.Thr312Ser | Global; Asian low | B (FDA approved) | Benign |
rs17882252 | c.1015G>A | p.Glu339Lys | Global; European low | LB (FDA approved) | Likely benign |
rs773553186 | c.1073A>T | p.Glu358Val | Asian | CIP (LB4; VUS1) | Benign |
rs35993958 | c.1079G>C | p.Gly360Ala | Global | LB (FDA approved) | Benign |
RS_ID . | cDNA_Varianta . | Protein_variantb . | Ethnicity . | Clinvarc . | ACMG_UMD . |
---|---|---|---|---|---|
rs201382018 | c.31G>C | p.Glu11Gln | Asian | CIPd (LB6;VUS4) | Benign |
rs201753350 | c.91G>A | p.Val31Ile | Japanese/Korean | CIP (B1;LN6;VUS2) | Benign |
rs1800371 | c.139C>T | p.Pro47Ser | African | B (FDA approved) | Benign |
rs587780728 | c.145G>C | p.Asp49His | Japanese/Korean | LB (FDA approved) | Benign |
rs144386518 | c.173C>G | p.Pro58Arg | African | CIP (LB6; VUS1) | Benign |
rs1042522 | c.215C>G | p.Pro72Arg | Global | B (FDA approved) | Benign |
rs587782423 | c.217G>A | p.Val73Met | European (Finn?) | B (FDA approved) | Benign |
rs368771578 | c.319T>C | p.Tyr107His | African | B (FDA approved) | Benign |
rs11540654 | c.329G>A | p.Arg110His | Global | LB (FDA approved) | Likely benign |
rs563378859 | c.466C>T | p.Arg156Cys | Global | CIP (LB2; VUS4) | Likely benign |
rs150607408 | c.554G>A | p.Ser185Asn | African | LB (FDA approved) | Likely benign |
rs121912665 | c.566C>T | p.Ala189Val | Asian | CIP (B1; LB2; VUS3) | Benign |
rs146340390 | c.665C>T | p.Pro222Leu | European | CIP (B1; LB2; VUS5) | Uncertain significance |
rs144340710 | c.704A>G | p.Asn235Ser | Global; Asian low | B (FDA approved) | Benign |
rs72661119 | c.787A>G | p.Asn263Asp | South Asian | CIP (LB5; VUS2) | Benign |
rs149633775 | c.847C>T | p.Arg283Cys | Global; Asian low | VUS (FDA approved) | Benign |
rs55819519 | c.869G>A | p.Arg290His | Global; Asian low | B (FDA approved) | Benign |
rs145151284 | c.935C>G | p.Thr312Ser | Global; Asian low | B (FDA approved) | Benign |
rs17882252 | c.1015G>A | p.Glu339Lys | Global; European low | LB (FDA approved) | Likely benign |
rs773553186 | c.1073A>T | p.Glu358Val | Asian | CIP (LB4; VUS1) | Benign |
rs35993958 | c.1079G>C | p.Gly360Ala | Global | LB (FDA approved) | Benign |
Note: See Supplementary Table S2 for more information.
aTP53 variant description using the NM_000546.5 reference.
bTP53 protein variant descriptions according to Ref-Seq proteins TP53alpha NP_000537.3.
chttps://www.ncbi.nlm.nih.gov/clinvar/: last accessed October 2021.
dCIP, conflicting interpretations of pathogenicity in Clinvar.
Several studies have reported that TP53 mutation was preferentially observed with the Arg allele in patients who are heterozygous for rs1042522;p.Pro72Arg. It is currently unclear as to whether the variants associated with the Arg allele are more pathogenic due to a more prominent gain of function or, instead, those associated with the Pro allele are more hypomorphic and counter-selected (22–24). Any possible association of this SNP with a high risk of developing a specific cancer and/or a heterogenous response to therapy remains both controversial and likely not sufficiently relevant for any clinical intervention (25–28).
The first report by Beckman and colleagues in 1994 of a north–south cline in the frequency of rs1042522;p.Pro allele (from 17% in the Swedish Sámi to 63% in Nigerians) has been confirmed in numerous other studies (29, 30). An association with latitude and UV irradiation was hypothesized for that observation but more recent analyses have been unable to confirm it (31, 32). An extensive study involving multiple Chinese populations living at substantially different latitudes showed that p53 Arg72 is tightly associated with winter temperature (31). Compared with the Pro allele, the Arg allele shows greater proficiency for transactivating leukemia inhibitory factor (LIF), an essential gene for efficient egg implantation after fertilization (33). Resultantly, it has been hypothesized that the Arg72 allele may provide a selective advantage to populations adapting to colder climates by reducing the risk of implantation failure (31). In contrast, the Pro allele has been associated with infertility, implantation failure after in vitro fertilization, and recurrent abortions (34, 35). A number of SNPs in the TP53 pathway have been shown to be associated with infertility, including several in LIF, MDM2, and MDM4, as well as in the ubiquitin-specific protease HAUSP that counteracts TP53 ubiquitination via MDM2 and MDM4. These observations lend credence to this important function of TP53 in fertility (36).
Variant rs1800371;p.Pro47Ser, first discovered in 1993, is the second most frequent missense SNP in the coding region of TP53 (AF 0.15). It has been shown to be specific to the African population, with functional activity indistinguishable from that of wild-type p53 (37). Analysis of the larger population datasets confirmed that rs1800371;p.Pro47Ser appears only in Africans, African-Americans, and Latin-Americans of Afro-Caribbean ancestry; it does not appear in any other populations (Supplementary Fig. S5). Although this variant does not display any obvious loss of activity in conventional functional assays, it has been shown to impair the phosphorylation of the nearby residue Serine 46, which does lead to a slight impairment for apoptosis induction (38). Variant rs1800371;p.Pro47Ser is also defective in the regulation of ferroptosis, a specific type of iron-dependent programmed cell death regulated by TP53 (39). This variant was associated with an increased risk of breast cancer in African-American premenopausal women. However, a high P value for that finding (P = 0.023; OR, 95; 95% CI, 1.08–2.76) and a lack of association with overall breast cancer risk limit its clinical use (40).
A genome-wide association study reported a novel rare variant (rs78378222) in the polyadenylation signal sequence of TP53 (AATAAA to AATACA) and showed that it impaired TP53 stability, leading to a decrease of p53 expression and hampered apoptosis (Fig. 1C; ref. 41). This SNP, restricted to the African and European populations, is strongly associated with an increased risk of cancer (42–47).
In addition to TP53, it has been shown that SNPs in other genes associated with the TP53 pathway could influence its function and therefore cancer risks (Fig. 1C).
SNP309 (rs2279744; T/G) is located in the promoter of MDM2, the main negative regulator of p53. It changes a response element of the transcription factor SP1. Carriers of the G allele display an increase in MDM2 expression and impaired TP53 response (48). Further studies have refined this observation, showing that the estrogen pathway modulates MDM2 transcription, as the estrogen receptor binds to the same response element carrying rs2279744. Indeed, subsequent studies confirmed a strong gender bias, with women showing increased tumor formation depending on their hormonal status (49–51). SNP 34091 (rs 4245739, C/A) is localized in the 3′UTR of MDM4, another negative regulator of p53. The C allele leads to the creation of a microRNA biding site and a decrease in the stability of MDM4 RNA, predicting that the A allele will reduce p53 protein in the cell via a higher content of MDM4 (52, 53). Conflicting evidence has linked this SNP to various cancer risks (54).
Pathogenic TP53 Founder Mutations Restricted to Specific Populations
As of this writing, three specific TP53 variants have been associated with cancer predisposition in distinct populations.
rs121912664; p.R337H
Considering the high incidence of pediatric adrenocortical tumors (ACT) observed in south Brazil (State of Paraná), two breakthrough studies uncovered a specific TP53 germline variant (rs121912664; p.R337H) associated with that disease (55, 56). A haplotype analysis revealed a founder effect with a European/Portuguese-Iberic origin (57). A large-scale analysis of 171,649 newborns in the state of Paraná identified 461 (0.27%) carriers, leading to surveillance recommendations for the families (58). Variant rs121912664; p.R337H is localized in the tetramerization domain of TP53, where it causes a deficiency for oligomerization. However, a functional analysis based on the transactivation of various TP53 target genes showed only a modest loss of function for it (59).
Knock-in mouse models expressing variant p.R334H (human p.R337H homolog) have not demonstrated changes in cancer incidence compared with WT mice but are more susceptible to develop liver cancer upon carcinogen exposure. XIAP Associated Factor 1(XAF1) is a tumor suppressor gene that regulates TP53 stability by preventing MDM2 ubiquitination and promoting p53-dependent apoptosis (60). Its expression is absent or low in a wide variety of cancers. A rare XAF1 germline variant (rs146752602; p.E134*), which is impaired for regulating TP53, has been found to cosegregate with rs121912664; p.R337H, resulting in a more aggressive cancer phenotype than that caused by s121912664; p.R337H alone (61).
rs587782596; p.R181C
Two unrelated reports identified a founder mutation (rs587782596; p.R181C) in codon 181 in women originating from the Middle East with a family history of breast cancer (62, 63). This variant has drawn great interest because codon 181, localized in the H1 helix, has been shown to be essential to maintaining TP53 intradimer interactions (64). In contrast to other variants at codon 181, such as R181P or R181H, which are fully inactive, p.R181C shows defective apoptosis but retains substantial growth arrest activity (64). Knock-in mouse models for this variant are cancer prone with a low penetrance compared with TP53 hot-spot variants and with a spectrum of cancer types similar to that of wild-type mice (65). However, these knock-in mice display increased lipolysis and upregulation of fatty acid metabolism, giving them a surprisingly lean phenotype. LFS patients expressing this variant were shown to display increased oxidative metabolism, reinforcing the importance of the role of the TP53 pathway in the regulation of cellular metabolism (66, 67).
rs730882028; p.G334R
A rare founder TP53 mutation (rs730882028; p.G334R) has been identified in multiple families of Ashkenazi Jewish descent (68). Although the cancer spectrum in these families is similar to those observed in Li–Fraumeni with an increased frequency of ACTs, the age of onset appears to be delayed. Functional analysis has demonstrated that this variant, localized in the tetramerization domain, shows no impairment for the established TP53 regulated genes (WAF1 or MDM2) and retains potent growth arrest activity, but it displays a defect in the transactivation of nonclassical genes such as PCLO, PLTP, PLXNB3, and LCN15.
These three different founder variants described in TP53 are identified in different populations and associated with specific or late cancer onset, but more importantly, they affect p53 oligomeric status. The mild phenotype observed with these variants has also been shown to be associated with different penetrance in Li–Fraumeni patients. Carriers with variants retaining partial or total capacity to form tetramers have a higher median survival age compared with carriers of monomeric variants (69). It is tempting to speculate that other hypomorphic variants localized in the TET domain of TP53 or associated with a partial defect in their tetrameric state as observed for the variant at codon 181 could be associated with either specific cancer predispositions or, in cancer families, with a weak penetrance possibly difficult to identify (70). Furthermore, due to the central function of TP53 in multiple physiological pathways, this penetrance will likely be modulated by the variability of other genes acting either upstream or downstream of the pathway.
Indeed, the discovery of the genetic variant rs146752602; p.E134* in XAF1 that modulates the penetrance and the severity of the disease for variant p.G344R is likely not unique. Indeed, we can surely expect that more TP53-variant modifying genes will be found, for both somatic and germline variants, with the capacity for TP53 loss-of-function enhancement or attenuation, the latter being more difficult to identify.
SNPs Specific to the Asian Population
Six Asian-specific SNPs have been identified in TP53 but their distribution in the various subpopulations is heterogenous (Fig. 1 and 2A; Table 1). rs587780728;p.Asp49His was first described as a somatic variant in a patient with chronic myeloid leukemia in 1991 (71) but subsequent reports suggested it could be a genuine polymorphism (72, 73).
In 2016, Yamahushi and colleagues analyzed the germline TP53 status of 1,685 Japanese patients with cancer (HOPE_cohort) and reported six unrelated carriers of the rs587780728;p.Asp49His variant (Fig. 2A; ref. 74). Only one carrier, a 12-year-old boy with osteosarcoma, met the criteria for a Li–Fraumeni-like syndrome. That patient was however carrying a second germline TP53 variant, p.Ala159Asp, which is known to be pathogenic and was previously associated with LFS in other reports (75).These suggest that p.Ala159Asp was the true driver event in that patient.
More recent studies have shed light on specific features of rs587780728;p.Asp49His and provided multiple lines of evidence, indicating that it is a genuine variant highly specific to the Japanese and Korean populations. First, an analysis of all large-scale population datasets with different ancestries showed that rs587780728;p.Asp49His is always observed in East Asia (Fig. 2A). Population-specific databases confirmed this observation and illustrated a higher AF in Korea and Japan (Fig. 2A). This variant was totally absent from African and European populations. Second, an analysis of the cancer mutation database UMD_TP53, which includes more than 150,000 patients with various types of cancer, showed that all patients displaying rs587780728;p.Asp49His had South Asian and highly predominantly Japanese origins. These mutations were mostly defined as somatic because the germline statuses of these patients were not determined and therefore their true origin could not be fully assessed. Moreover, the five cell lines that carried this variant were derived from Japanese patients. Finally, this variant was not identified in three different population databases from China (18,000 individuals). A specific association with the Japanese population was also observed for variant rs121912665:p.Ala189Val, although it had a lower AF (0.005 to 0.0015) than rs587780728;p.Asp49His did (0.003; Fig. 2B).
As both variants are absent from the Chinese population but present specifically in the Korean and Japanese populations, it is likely that they appeared after the migration of humans from what is now China to the Korean peninsula and the Japanese archipelago either during the first (30,000 years ago) or second (3,000 years ago) wave of human migration from the Asian continent (Fig. 2B; ref. 76). Whether these variants are widely dispersed in the genetically distinct Japanese groups (Hondao, Ryukyu, or Ainu) or restricted to a single one is currently unknown.
Three other variants, rs201382018;p.Glu11Gln, rs201753350;p.Val31Ile, and rs773553186;p.Glu358Val, were also identified in East Asian populations including China, and were absent or highly infrequent in non-Asian populations (Fig. 2A). The few tumor cell lines expressing either rs201382018;p.Glu11Gln or rs201753350;p.Val31Ile were also derived from Japanese individuals (18). The sixth variant, rs72661119;p.Asn263Asp, was observed only in South Asia and was totally absent from all other Asian populations (Fig. 2A). Analysis of the UMD_TP53 database showed that this variant was observed in four cancer patients with Asian origins, including two from India. Unfortunately, determining any clinical value of the variant is difficult due to the paucity of large-scale analyses in the Indian population.
Large-scale case–control studies performed in Japanese patients with breast, colorectal, or pancreatic cancer have shown that the AFs of rs587780728;p.Asp49His, rs201753350;p.Val31Ile and rs201382018;p.Glu11Gln were statistically similar in both cases and controls (Supplementary Fig. S5; refs. 77–79). Taken together, these analyses provide solid support for the Asian specificity of these six variants. Of note, the sole use of global population datasets was insufficient for the classification of these variants as bona fide SNPs; only the use of the population subsets of these global databases or country-specific datasets allowed that endpoint to be reached.
SNPs Specific to the African Population
Three variants specific to the African population, rs144386518;p.Pro58Arg, rs368771578;p.Tyr107His, and rs150607408;p.Ser185Asn, were discovered in this analysis. Although the frequency at which they are observed is lower than that of rs1800371;p.Pro47Ser (AF ranging from 0.001 to 0.0001), they are entirely absent from the large population datasets originating from Europe or Asia (Supplementary Fig. S6). These variants have been shown to behave like wild-type p53 in various functional assays but a specific loss of function associated with a specific pathway regulated by TP53 or with a tissue-specific function cannot be excluded.
Interestingly, two of the African-specific SNPs, rs1800371;p.Pro47Ser, and rs144386518;p.Pro58Arg are in close proximity on the protein and associated with drastic changes targeting proline residues. The particular structure of this amino acid (the amino group is part of the cyclical ring of atoms) yields specific features such as protein stability and rigidity. Substitutions of proline by other residues are notorious for destabilizing proteins via the breakdown of secondary structures. Either of these two variants could have a minor effect on TP53 function, but when associated, they could be more highly detrimental. A haplotype analysis performed using the gnomAD population showed no linkage of these two SNPs therein, but more thorough investigations are needed before drawing a definitive conclusion. Although the study by Murphy and colleagues showing a slight association between rs1800371;p.Pro47Ser and breast cancer in the African-American population provided valuable initial information, other studies conducted in African or African-American populations and taking into account both genetic and environmental variables will be necessary to evaluate any potential risks associated with these two variants (40).
Other TP53 SNPs
Among the eleven remaining SNPs, four variants are restricted to the European and African populations (rs149633775;p.Arg283Cys, rs144340710;p.Asn235Ser, rs145151284;p.Thr312Ser, and rs55819519;p.Arg290His), one is specific to the European population (rs146340390;p.Pro222Leu), one is detected in African and Asian populations but absent from Europeans (rs17882252;p.Glu339Lys), one is found predominately in the Finnish population (rs587782423; p.Val73Met), and four are observed in all populations albeit at various frequencies (rs1042522;p.Pro72Arg, rs35993958;p.Gly360Ala, rs563378859;p.Arg156Cys, and rs11540654;p.Arg110His; Supplementary Fig. S7). Dorling and colleagues recently evaluated a panel of 34 genes, including TP53, to detect variants associated with breast cancer (80). From their large cohort of 60,466 women with breast cancer and 53,461 controls, those authors were able to retrieve multiple TP53 SNPs displaying frequencies similar to those found in other datasets. Their ORs and 95% confidence intervals showed that seven of the eight variants with a significant number of cases in the cohort displayed no association with breast cancer and occurred at the same frequency in both patients and controls (Supplementary Fig. S8).
Although, variants rs144340710;p.Asn235Ser and rs55819519;p.Arg290His are quite infrequent (AF 0.0004), their absence in the Asian population (including three large Chinese datasets) and low frequency in the African population are intriguing. That aspect merits further evaluations to explore a possible association with specific selection during human migration from Africa.
TP53 SNPs Are Not Somatic Variants Associated with Clonal Hematopoiesis or Mosaicism
Today, caution needs to be exercised when considering the conventional view that biallelic SNPs should have an allelic fraction of 100% or 50% depending on whether they are respectively homozygous or heterozygous. In genetic analyses of blood cells identifying somatic variants with low allelic fraction (30%), it is important to consider a number of mechanisms that can lead to false-positives, such as mosaicism or clonal hematopoiesis (CH). In patients with cancer, circulating tumor cells can also cause the spurious detection of tumor-associated variants. This issue is currently a major concern for defining whether variants are germline, thus warranting specific recommendations for patients and their families, or somatic, and thus free of such consequences. This topic has been widely reviewed recently (81–83) and therefore will not be further developed here. Beyond this issue in diagnostics, there is another concerning the quality of the data included in the various population databases, because there is a strong possibility that many SNPs are somatic variants resulting from mosaicism or CH (84). Indeed, somatic mutational events occurring early after the post-zygotic stage can be found at high allele frequencies (greater than 30%) that can be mistakenly defined as germline SNPs. Similarly, variants in CH can be found at high allele frequencies especially in older individuals. Indeed, somatic CH TP53 variants have been identified in larger cohorts of patients undergoing genetic testing (85, 86). TP53 variants resulting from mosaicism have also been reported but they are far less frequent (87). Therefore, it cannot be ruled out that some of the variants identified in this study may be spurious somatic events contaminating the various population databases. However, there are several independent arguments against this possibility. First, the specific ethnicity of most of these variants would suggest that they would be more prone to occur on a specific genetic background, which is an unlikely hypothesis. Second, variants recovered from CH have been shown to be nonfunctional and to drive the clonal expansion of the cells. For example, using sequencing data from 260,686 TP53 variants observed in genetic testing and presumed to be germinal, Fortuno and colleagues showed that those with variant allelic frequencies lower than 37% were solely pathogenic, including several well-known hotspot variants such as c.524G>A or c.818G>A (88). None of the 21 variants described in this review were found at AFs suggesting CH. A similar analysis of the NHLBI TOPMed database identified 4,229 individuals with CH including 86 TP53 variants, all of them defined as pathogenic and nonfunctional (89). Considering these observations, it appears rather unlikely that any of the variants described in this review resulted from somatic events.
Functionality of TP53 SNPs
Numerous in silico predictors aimed at predicting functional effects have been proposed but they often do not comprehensively provide similar results (90). Furthermore, the relation between the prediction of a deleterious effect in a protein and pathogenicity is far from being straightforward. Such predictors vary widely by the considerations and score assemblies they integrate, the types of models they use, and the level of analysis they perform. Consequently, different scores have different advantages and limitations. In a recent study, Cubuk and colleagues compared the effectiveness of 44 in silico tools on a set of pathogenic variants in various tumor suppressor genes including TP53 (91). They observed wide variation in the predictive performance of the tools and very poor specificity for most of them. Indeed, prediction tools such as SIFT or PolyPhen-2 give rise to a high level of misprediction, making them unsuitable for the clinic. Fortuno and colleagues have developed a new, ACMG-compliant, in silico pathogenicity prediction tool for TP53 missense variants called Align-GVGD (92). That tool, combined with BayesDel prediction, labeled 19 of the 21 variants (90%) discussed in this review as functional (Supplementary Table S2).
The importance of TP53 mutations in the clinic and the consequential diversity of the variants (more than 3,500 single nucleotide variants in the coding region of TP53 have been described so far) have led to numerous functional studies ranging from all hotspot variants (93) to combinatorial libraries of mutations covering all the codons of the TP53 gene (94–96). Furthermore, the 21 variants described in this review were investigated in greater detail in a recent publication (18). Although different assays and endpoints were used, a correlation analysis showed excellent agreement between all these functional studies (97). The functional data resulting from these studies and structural information on TP53 were used to develop TP53_PROF (PRediction Of Functionality), a gene-specific machine learning model to predict the functional consequences of every possible missense mutation in TP53 (98). Using either TP53_PROF or data from the various functional analyses, none of the 21 SNPs described here displayed any obvious loss of function or were predicted to be deleterious (Fig. 3; Table 1).
The Statuses of TP53 SNPs in Various Databases
A survey of the most common repositories showed that most of them are contaminated with the various SNPs described in this study (Supplementary Table S2). Only a few variants were found in TCGA and ICGC, but the data from the GENIE project included 11 SNPs. As expected, variants rs11540654;p.Arg110His, rs144340710;p.N235S, rs55819519;p.Arg290His, and rs149633775;p.Arg283Cys, found predominantly in the European population, were the most prevalent (Supplementary Table S2). Moreover, the 21 variants can be found in the COSMIC (Catalogue of Somatic Mutations in Cancer) database (one of the most widely used references for somatic variants) and, based on the FATHMM prediction algorithm, seven of them were predicted to be deleterious. The IARC germline TP53 mutation database (R20) includes the description of TP53 variants in numerous families with either Li-Fraumeni or Li-Fraumeni-like syndrome. Seventeen TP53 SNPs were found in 67 IARC families (p.R290H or p.N235S notably; Supplementary Table S2). This database is used extensively as a reference for the validation of newly discovered TP53 variants in cancer prone families and for the development of new tools to assess TP53 variant pathogenicity. Therefore, an attentive and rigorous curation of this database is essential to ensure that it functions as an accurate reference for the scientific community.
This observation clearly shows that variant call pipelines are very heterogeneous among the various studies and in need of modification to take rare SNPs into account. Although large-scale repositories such as gnomAD are invaluable for the filtration of common SNPs, only the use of the population subsets matching the analyzed populations will empower the discovery of low frequency SNPs. The update of NCBI's dbSNP with the integration of information from its database of Genotypes and Phenotypes (dbGaP) and its development of the allele frequency aggregator (ALFA) pipeline to compute allele frequency for variants in the dbGaP has led to the release of more than 400 million SNPs with ancestry information. That work has resulted in a powerful resource that should prove useful for increasing the accuracy of genetic studies.
Final Discussion and Remaining Questions
It is now undeniable that the coding region of TP53 is far more polymorphic than was previously thought. Moreover, it shows significant heterogeneity both in the frequency and the geographical distribution of the various SNPs. Figure 1A provides a novel map of missense TP53 SNPs. It comprises 19 new variants that must be considered in germline and somatic mutation screening. Although the frequency and the population-specific distribution of these SNPs are now well established and their likely nonpathogenicity supported by multiple criteria, rigorous large-scale case–control studies in homogenous populations are needed to definitively determine if they are associated with cancer risks. This analysis is likely only the tip of the iceberg; a great number of TP53 variants, either very rare or restricted to specific populations, ethnic groups or families, are surely waiting to be discovered.
An unanswered question raised by this observation concerns the possibility that some of these population-specific variants have conferred a specific fitness advantage for their populations, as variant Arg72 has for the European population.
Importantly, the function of TP53 is not restricted to a simple role in the regulation of various stress responses; it also plays a fundamental physiologic role in maintaining cellular and tissue homeostasis. This latter aspect is accomplished via a network of multiple isoforms regulating several cellular processes including apoptosis, cell-cycle arrest (including senescence), DNA repair, metabolism, oxidative stress responses, and cellular differentiation/reprogramming, as well as non-cell autonomous functions that impact the tumor microenvironment (99).
Although cancer is usually associated with a qualitative change in TP53 with functional loss via inactivating mutations, there is an increasing amount of evidence suggesting that quantitative changes in TP53 are associated with developmental diseases (100). An increase in TP53 activity can result from chronic cellular stress, for example impaired ribosome biogenesis or compromised DNA repair leading to increased apoptosis or decreased cellular proliferation. As discussed above, SNP rs1042522;p.Pro72Arg is associated with a differential regulation of target genes, including LIF, which is essential for correct egg implantation and thereafter in early pregnancy. The Pro allele of this SNP has been associated with infertility, implantation failure after in vitro fertilization, and recurrent abortions (34, 35). The two infrequent TP53 allele variants rs35850753 and rs78378222 have been shown to be associated with human cranial development, specifically an increased head circumference and a larger intracranial volume (101). rs35850753 is localized in TP53 intron four, which includes the promoter P2, this latter controlling the expression of transcript ∆133TP53 RNA. This SNP was shown to be associated with a higher expression of ∆133TP53 RNA and cancer risk (102). A recent analysis by Di Giovannantonio and colleagues showed that these two SNPs are also associated with various anthropometric traits such as weight, basal metabolic rate, or standing height (103). These observations clearly show that TP53 SNPs have consequences that go beyond the tumor suppressor function of this gene and that specific traits associated with rare haplotypes could lead to favorable or unfavorable selection.
The last issue raised by the present analysis concerns the important bias associated with the poor representativity of specific populations. This aspect prevents an accurate assessment of population-specific SNPs and emphasizes the importance of “The Missing Diversity in Human Genetic Studies” recently discussed by Sirugo and colleagues (104). This issue is clearly not specific to TP53, but the high impact of TP53 mutation in human health as well as the large body of functional and genetic studies currently available for the gene make it a paradigm for such analyses.
To conclude, this analysis illustrates not only the need to ensure proper representativity of the human population by including different ethnical origins in genetic studies, but also the substantial benefits the recording of such information provides. Indeed, any omission of it for social or political reasons could only lead to discriminative precision medicine.
Authors' Disclosures
No author disclosures were reported.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/).