The success of genome-wide association studies (GWAS) in identifying common, low-penetrance variant-cancer associations for the past decade is undisputed. However, discovering additional high-penetrance cancer mutations in unknown cancer predisposing genes requires detection of variant-cancer association of ultra-rare coding variants. Consequently, large-scale next-generation sequence data with associated phenotype information are needed. Here, we used genotype data on 166,281 Icelanders, of which, 49,708 were whole-genome sequenced and 408,595 individuals from the UK Biobank, of which, 41,147 were whole-exome sequenced, to test for association between loss-of-function burden in autosomal genes and basal cell carcinoma (BCC), the most common cancer in Caucasians. A total of 25,205 BCC cases and 683,058 controls were tested. Rare germline loss-of-function variants in PTPN14 conferred substantial risks of BCC (OR, 8.0; P = 1.9 × 10−12), with a quarter of carriers getting BCC before age 70 and over half in their lifetime. Furthermore, common variants at the PTPN14 locus were associated with BCC, suggesting PTPN14 as a new, high-impact BCC predisposition gene. A follow-up investigation of 24 cancers and three benign tumor types showed that PTPN14 loss-of-function variants are associated with high risk of cervical cancer (OR, 12.7, P = 1.6 × 10−4) and low age at diagnosis. Our findings, using power-increasing methods with high-quality rare variant genotypes, highlight future prospects for new discoveries on carcinogenesis.

Significance:

This study identifies the tumor-suppressor gene PTPN14 as a high-impact BCC predisposition gene and indicates that inactivation of PTPN14 by germline sequence variants may also lead to increased risk of cervical cancer.

Basal cell carcinoma (BCC) arises in the basal cells of follicular or interfollicular skin. Mortality from BCC is very low and it rarely metastasizes. However, its high incidence and tendency to occur as many primary lesions means that it is a substantial clinical and economic burden (1). As with the other major forms of skin cancer, the main environmental risk factor is UV irradiation, derived primarily from exposure of unprotected skin to sunlight or artificial UV sources (2).

Some sequence variants confer high risk of BCC. Gorlin syndrome, characterized by a high propensity to develop early onset and multiple primary BCC, is caused by mutations in the PTCH1, PTCH2 or SUFU genes, components of the Hedgehog (Hh) signaling pathway (3, 4). Patients with xeroderma pigmentosum carry mutations in DNA excision repair genes and have high risks of developing early onset BCC (1). Approximately 33 common, low- or moderate-risk susceptibility loci have been reported from genome-wide association studies (GWAS; ref. 5). These loci often, but not always, affect pigmentation traits and UV sensitivity (6–8).

Several germline predisposition genes are mutated somatically in BCC tumors, for example, PTCH1, SUFU, TP53, MYCN, CASP8, and TERT (9–13). Even though the majority of tumors acquire mutations in the Hh pathway and/or TP53 genes, somatic sequence variants have also been reported in the protein tyrosine phosphatase gene PTPN14, a key modulator of the Hippo–YAP pathway (9, 14). That finding suggests that deregulation of Hippo–YAP signaling plays a key role in BCC pathogenesis, a notion with some experimental support (15, 16). However, germline sequence variants in PTPN14 or any other Hippo–YAP pathway components predisposing to cancer have hitherto not been found.

Here, we used large next-generation sequencing training sets from Iceland and the UK Biobank to enable the imputation of rare, loss-of-function variants in chip-typed individuals in each population. We used this extensive information on genetic variation to carry out gene-level loss-of-function burden tests to look for new BCC predisposition genes. We report that germline loss-of-function mutations in PTPN14 confer high risk of BCC.

Sequence variants that predispose to BCC can also confer risk of other forms of cancer. For example, patients with Gorlin syndrome are strongly predisposed to develop medulloblastoma and the BCC-associated TP53 variant confers increased risk of gliomas, prostate cancer, and colorectal adenoma (4, 12). The Hippo–YAP pathway has previously been implicated in human papillomavirus (HPV) pathogenesis and in a number of human cancers besides BCC (17–24). Interestingly, our follow-up investigations of the PTPN14 mutations revealed a dramatically increased risk of cancer of the uterine cervix and suggestive evidence of an association with growth hormone (GH) secreting pituitary adenomas.

An overview of the study populations and methods

In this study, we combine whole-genome and whole-exome sequence data with datasets based on imputation of thus identified sequence variants into chip-genotyped individuals and their relatives to perform loss-of-function burden tests for association with BCC. We have sequenced the whole genomes of 49,708 Icelanders to a median coverage of 37x and imputed the resulting sequence variants into 116,573 Icelanders who had chip genotypes and 290,482 of their close relatives. We also analyzed exome sequence data from 41,147 participants of British ancestry in the UK Biobank (25) and imputed the resulting sequence variants into the remaining 367,448 participants of British ancestry who had only chip genotypes (Fig. 1A). We applied a gene-based loss-of-function burden test, which tests for association with the combined count of all rare loss-of-function alleles [minor allele frequency (MAF) <2%] in each gene, to test for association among 25,205 BCC cases and 683,058 controls (Supplementary Tables S1–S2). To maximize the number of variants tested, we combined imputable loss-of-function variants (which are testable in chip-genotyped individuals) with non-imputable variants (which can be tested only in sequenced individuals) by including sequencing status as a covariate in the test for association. That way we were able to increase the sample size and hence the power to detect association with a gene, as well as more genes being tested in total (Fig. 1B and C; Supplementary Fig. S1A–S1H). Analyses that are restricted to imputed samples only would fail to include the rarest variants, for example, singletons—sequence variants only observed once in a dataset. The approaches used to restrict to high-quality sequence variants in the burden tests in Iceland are described in detail in the Supplementary Material. Results for loss-of-function burden tests from both populations were combined by a meta-analysis on the gene level.

Figure 1.

Gene-based loss-of-function burden association testing with BCC in Iceland and the UK Biobank. A, An overview of the datasets used in the study. Sequenced samples are whole-genome sequenced in Iceland and whole-exome sequenced in the UK Biobank. The data from the UK Biobank are restricted to individuals of white-British ancestry. Association testing in Iceland additionally uses familial imputation of first- and second-degree relatives of the individuals in the sequenced and imputed datasets. Genes with >4 loss-of-function allele carriers in at least one population were included in the association testing. As shown in B for Iceland and C for the UK Biobank, the number of genes tested is increased when sequenced and imputed samples are combined, irrespective of the number of loss-of-function allele carriers considered, >4, >100, or >1,000.

Figure 1.

Gene-based loss-of-function burden association testing with BCC in Iceland and the UK Biobank. A, An overview of the datasets used in the study. Sequenced samples are whole-genome sequenced in Iceland and whole-exome sequenced in the UK Biobank. The data from the UK Biobank are restricted to individuals of white-British ancestry. Association testing in Iceland additionally uses familial imputation of first- and second-degree relatives of the individuals in the sequenced and imputed datasets. Genes with >4 loss-of-function allele carriers in at least one population were included in the association testing. As shown in B for Iceland and C for the UK Biobank, the number of genes tested is increased when sequenced and imputed samples are combined, irrespective of the number of loss-of-function allele carriers considered, >4, >100, or >1,000.

Close modal

The Icelandic study population

Genotype data

The extensive genotype information on the Icelandic population has been previously described (26). In brief, Illumina technology, including GAIIx, HiSeq, HiSeqX, and NovaSeq machines, was used to sequence the whole genomes of 49,708 Icelanders to a median coverage of 36.9x (mean depth of at least 17.8x) at deCODE genetics. Genotypes of SNPs and indels were identified and their genotypes jointly called using Graphtyper (version 1.4; ref. 27). All the sequenced individuals had also been chip-typed and long-range phased (28) and information about haplotype sharing was used to improve genotype calls. The variants found were then imputed into 116,573 Icelanders whose DNA had been genotyped with various Illumina SNP chips and phased using long-range phasing. Using genealogic information, genotype probabilities for 290,482 untyped relatives of the genotyped individuals were calculated to further increase the sample size for association analysis and to increase the power to detect associations. To annotate sequence variants, predicted variant consequence terms as defined by the Sequence Ontology Project (29) were applied to variants via Ensembl release 92 and variant effect predictor (VEP) version 92.1 (30) according to our previously described scheme (31). As an example, the Sequence Ontology's definition of a splice region variant is: “A sequence variant in which a change has occurred within the region of the splice site, either within 1–3 bases of the exon or 3–8 bases of the intron.”

Phenotype data

The Icelandic BCC dataset consisted of 5,574 histologically confirmed cases with associated ICD10 (C44) and SNOMED-M (8090–8094, 8097, 8098) codes and 294,094 controls. The controls were recruited through different genetic research projects at deCODE genetics and collectively approximate a population sampling of the at-risk age groups. Association testing was performed with age, age squared, gender and an indicator function for the overlap of the lifetime of the individual with the time span of phenotype collection as covariates to account for differences between cases and controls. Other Icelandic cancer/benign tumors case–control datasets used were described previously (12, 32–34). Information on cancer diagnosis in genotyped individuals in Iceland is from the population-based Icelandic Cancer Registry (ICR; ref. 35), which has registered all solid, non-cutaneous cancers in the entire population of Iceland since 1955 but registration of skin and hematological cancers started in the 1980s. ICR registration is based on the ICD system and includes information on histology (systemized nomenclature of medicine, SNOMED). Collectively, over 94% of diagnoses in the ICR have histological confirmation, whereas for BCC, case definition is solely based on histological confirmation. Thyrotropin (thyroid stimulating hormone; mIU/L) was measured in blood samples of Icelanders seeking medical care between the years 1997 and 2017 at the Landspitali University Hospital or at the Clinical Laboratory in Mjodd, Reykjavik, Iceland. The measurements were normalized to a standard normal distribution using quantile–quantile normalization and then adjusted for center, gender, year of birth and age at measurement. For individuals for which more than one measurement was available we used the average of the normalized value. The study was approved by the National Bioethics Committee in Iceland (IRB approval no.VSN-20–004) and written informed consent was obtained from all participants.

The UK Biobank study population

Genotype data

We downloaded the FE version of the UK Biobank plink-formatted exome files (field 23160; see URLs) for 49,991 samples (41,147 of white-British ancestry). The quality control applied to variants in the UK Biobank exome sequencing data has been described, with singletons validated by visual inspection using criteria that have been shown to correlate well with Sanger validation (36). Variants were mapped in house to National Center for Biotechnology Information (NCBI) Genome Reference Consortium Build 38 (GRCh38). Whole-exome sequence variants passing filters (MAF > 0, genotyping yield > 0.5, HWE P > 0) were then used to create a haplotype reference panel that was phased using the phased genotype chip data of 408,595 individuals (25). The genotypes from the exome-sequencing were then imputed into the chip-typed individuals (367,448 of white British ancestry) using in house methods (26, 28). Variants consequence was annotated using VEP version 92.1 (30) and a single VEP annotation was assigned to each sequence variant according to our previously described scheme (31) using Refseq genes from Ensembl release 92. We visually examined the binary alignment files for the 10 variants from UK Biobank included in the combined analysis of PTPN14 gene burden in Iceland and the UK to confirm their quality. All passed a threshold of a mean sequencing depth >12 and an allelic balance >0.24.

Phenotype data

The UK Biobank BCC data consist of 19,631 histologically confirmed cases and 388,964 controls of European ancestry (self-reported white British with similar genetic ancestry based on principal component analysis and with consistent reported and genetically determined gender), recruited between 2006 and 2010 ages 40–69, and with follow-up until 2016 (25). In addition to limiting the set of controls to individuals of European ancestry, association tests included age, gender, and 40 principal components as covariates. Data on cancer diagnoses, year of diagnosis and deaths were provided to the UK Biobank by the Medical Research Information Service based at the National Health Service Information Center (UK) and The Information Services Division, a part of NHS Scotland (see URLs). Cancer registries collect information on cancer diagnoses from hospitals, cancer centers, treatment centers, hospices, nursing homes, private hospitals, cancer-screening programs, other cancer registers, general practices, death certificates, Hospital Episode statistics, and cancer waiting time data. Completeness of case ascertainment in English cancer registries has been reported as 98%–99%, in a study linking routine cancer registration with information from the Hospital Episode Statistics database (37). BCC was ascertained from cancer registry data using ICD10 (C44) and SNOMED-M (8090–8094, 8097, 8098) codes (data-fields 40006 and 40011, respectively), indicating cutaneous BCC. Cervical cancer was ascertained using ICD10 code C53 (Malignant neoplasm of cervix uteri). Pituitary adenoma was ascertained as ICD10 D35.2 (Benign neoplasm: Pituitary gland), acromegaly as E22.0 (Acromegaly and pituitary gigantism), hyperprolactinaemia as E22.1 (hyperprolactinaemia), and Cushing syndrome as E24 (Cushing syndrome). Ethical approval for UK Biobank was received from the North West Research Ethics Committee (REC 11/NW/0382) and written informed consent was obtained from all participants.

The 23andMe study population

The 23andMe BCC study population has been described previously (38). Briefly, BCC cases were self-reported in answers to online questionnaires. The validity of the self-reporting was determined by medical record review of a subset of cases. Subjects were genotyped on Illumina BeadChip platforms. Individuals were included in the study if they had >97% European ancestry as determined from genotype data. 23andMe, Inc., a genetics company provided free access to anonymized genetic and phenotypic data. All information came from research participants who provided informed consent to participation in research, in accord with 23andMe's human subjects protocol (reviewed and approved by Ethical and Independent Review Services, an AAHRPP accredited IRB).

Statistical analysis

We used logistic regression assuming an additive model in the case–control analyses to test for association between loss-of-function gene burdens and BCC/other cancers/benign tumors, treating disease status as the response and genotype counts as covariates, using likelihood ratio test to compute two-sided P values. Individuals were coded as 1 if they carry any of the loss-of-function variants (splice acceptor, splice donor, stop gained, frameshift, stop lost, start lost) in the autosomal gene being tested and as 0 otherwise. The model also includes available individual characteristics that correlate with phenotype status. In Iceland, these are: County of birth, gender, current age or age at death (first and second order terms included), availability of blood sample for the individual and an indicator function for the overlap of the lifetime of the individual with the time span of phenotype collection. For the UK Biobank association testing, 40 principal components were used to adjust for population substructure and age and gender were included as covariates in the logistic regression model. The association analysis for both the Icelandic and the UK Biobank datasets was done using software developed at deCODE genetics (26). We used linkage disequilibrium (LD) regression to account for inflation in test statistics due to cryptic relatedness and stratification (39). LD scores were downloaded from an LD score database (see URLs). With a set of 1.1 million variants, we regressed the χ2 statistics from our association tests on the LD scores and used the intercept as a correction factor. P values were then adjusted by dividing the corresponding χ2 values by the correction factors. The estimated correction factor for BCC based on LD score regression was 1.24 for the Icelandic and 1.07 for the UK datasets, respectively.

Meta-analysis was performed on the summary results from Iceland and the UK using a fixed effects inverse variance weighted method (40), in which the datasets were allowed to have different population frequencies for alleles and genotypes but were assumed to have a common odds ratio (OR) and weighted with the inverse of the variance of the effect estimate derived from the logistic regression.

The quantitative trait, normalized and adjusted measurements of thyrotropin, was tested using a linear mixed model implemented in BOLT-LMM (41).

RNA-sequencing analysis

For the RNA analysis, data from the GTEx database were used as described in the Supplementary Material.

Gene-burden

We collapsed rare and low frequency (<2% minor allele frequency) loss-of-function variants by autosomal genes for the gene-based burden tests performed (42, 43). Assuming that all loss-of-function variants have the same phenotypic effect, collapsing genotypes across the variants maximizes the power to detect association (44). Only variants with imputation information >0.8 were included in the genotype collection among the imputed samples. We report carrier status among imputed samples if genotype probability exceeds 0.9. Previously reported loss-of-function burden tests have used arbitrarily chosen frequency thresholds (45) up to 5%, with some as low as 0.1% MAF (46) as a way to attenuate the probability of false-positive loss-of-function variants in the burden test. Here, we filtered on loss-of-function MAF below 2% because pathogenic variants can be of higher MAF in populations with the founder effect, such as in Iceland (32, 47).

We used UCSC as a source for Refseq Protein coding genes. The number of Refseq autosomal protein coding genes is 18,482 (P value threshold: 0.05/18,482 = 3 × 10−6). We restricted our analysis to genes with at least five copies of loss-of-function alleles in at least one of the populations as we do not have the power to detect association with rarer combined loss-of-function variants. This approach resulted in 11,149 and 14,912 genes being tested for association with BCC in Iceland and the UK Biobank, respectively, and a total of 15,294 genes included in the meta-analysis (Fig. 1A; Supplementary Table S3).

Data availability statement and URLs

The authors declare that the data supporting the findings of this study are available within the article, its Supplementary Data files and upon reasonable request. The UK Biobank data can be obtained upon application (ukbiobank.ac.uk). The summary data from the loss-of-function gene-burden tests for BCC will be made available at http://www.decode.com/summarydata. URLs: UK Biobank cancer data: http://biobank.ctsu.ox.ac.uk/crystal/crystal/docs/CancerLinkage.pdf, LD score database: https://data.broadinstitute.org/alkesgroup/LDSCORE/eur_w_ld_chr.tar.bz2, the UK Biobank exome sequence data: http://biobank.ctsu.ox.ac.uk/showcase/label.cgi?id=170.

We found 27,251 and 165,810 rare loss-of-function variants in 11,852 and 16,976 genes in Iceland and the UK Biobank, respectively, using whole-genome and whole-exome sequences (Fig. 1A). The difference in number of variants found in the two datasets is primarily because of differences in the nature of the two datasets. The Icelandic sample represents a much larger fraction of the Icelandic population than the UK Biobank of the British population (15% vs.0.08%), and the Icelandic population is a founder population whereas the UK population is a much larger and diverse population. Sequencing method, sequencing depth, calling and quality control of polymorphisms may also contribute to the difference. Of these variants, we were able to impute 16,373 (60.1%) and 37,218 (22.5%) into Icelandic and UK Biobank samples, respectively, with imputation information above 0.8 (Fig. 1A). The most common reason for failure of variants to impute adequately was a lack of carriers in the sequenced set (Supplementary Fig. S2). The greater fraction sampled of the Icelandic population than the UK one mostly explains the higher imputation rate of rare variants in Iceland (28).

Genome-wide results of the loss-of-function burden tests are shown in Supplementary Fig. S3 (Manhattan Plot) and detailed in the Supplementary Material Online. We found that the loss-of-function variant burden in one gene, PTPN14, was associated with BCC with an OR of 8.2 [95% confidence interval (CI), 4.4–15.3; P = 3.5 × 10−11]. Thirteen loss-of-function variants in PTPN14 were included in the analysis, three from Iceland and 10 from the UK, with no overlap of variants between the populations (Supplementary Table S4). The variants were distributed throughout the gene and did not cluster in a location that would suggest a partial loss-of-function, such as the last exon (32, 48). To validate and improve the Icelandic imputations, we used Sanger sequencing to confirm the PTPN14 variants in the sequenced and imputed carriers as well as in 53 chip-typed relatives (imputed as non-carriers) of the sequenced samples for two of the rarest loss-of-function variants. In total, 148 Icelandic samples from carriers and potential carriers were Sanger sequenced (Supplementary Tables S5 and S6). The additional sequencing resulted in the identification of three additional imputed carriers and the reclassification of five imputed carriers as non-carriers, changing the combined OR estimate to 8.0 (95% CI, 4.5–14.3; P = 1.9 × 10−12; Table 1; Supplementary Table S6).

Table 1.

Association results for a meta-analysis of PTPN14 loss-of-function burden tests with BCC in Iceland and in the UK Biobank.

Sequenced samplesImputed samplesBurden
DatasetN LOFGroupNon-CarrCarrCF (%)Non-CarrCarrCF (%)OR (95% CI)P
Icelanda Control 48,033 13 0.034 114,069 64 0.065 9.29 (4.54–19.03) 1.1 × 10−9 
  Case 1,658  2,428 12    
UK Biobank 10 Control 39,229 11 0.034 349,710 14 0.005 6.08 (2.27–16.25) 3.2 × 10−4 
  Case 1,904  17,719    
Combined 13 Control 87,262 24 — 463,779 78 — 8.02 (4.49–14.32) 1.9 × 10−12 
  Case 3,562  20,147 17    
Sequenced samplesImputed samplesBurden
DatasetN LOFGroupNon-CarrCarrCF (%)Non-CarrCarrCF (%)OR (95% CI)P
Icelanda Control 48,033 13 0.034 114,069 64 0.065 9.29 (4.54–19.03) 1.1 × 10−9 
  Case 1,658  2,428 12    
UK Biobank 10 Control 39,229 11 0.034 349,710 14 0.005 6.08 (2.27–16.25) 3.2 × 10−4 
  Case 1,904  17,719    
Combined 13 Control 87,262 24 — 463,779 78 — 8.02 (4.49–14.32) 1.9 × 10−12 
  Case 3,562  20,147 17    

Note: Sample size weighted meta-analysis of P-values generated with Fisher exact test in the Icelandic dataset (sequenced and imputed samples): P value = 3.4 × 10−10 and for the UK Biobank dataset: P value = 5.8 × 10−5 (see Supplementary Table S8).

Abbreviations: Carr, number of individuals carrying a loss-of-function variant (genotype probability >0.9); CF, carrier frequency; CI, confidence interval; N LOF, number of loss-of-function variants; Non-Carr, number of individuals who did not carry a loss-of-function variant; OR, odds ratio.

aPhenotype and imputed genotype information of first- and second-degree relatives of genotyped individuals is included in the association testing.

In the sequence data, we found two rare Gorlin syndrome loss-of-function variants in PTCH1 in Iceland (Supplementary Table S7). These sequence variants were not present in the BCC-affected carriers of the PTPN14 loss-of-function variants. Similarly in the UK Biobank, the BCC-affected carriers of the PTPN14 loss-of-function variants did not carry Gorlin syndrome variants.

All of the PTPN14 loss-of-function variants are very rare. The most common variants among the sequenced individuals were carried by 12 in Iceland [carrier frequency (CF) = 0.024%, ca. 1 in 4,000] and 4 in the UK (CF = 0.0097%, ca. 1 in 10,000; Supplementary Table S4). The combined carrier frequency of PTPN14 loss-of-function variants in Iceland was 0.034% among sequenced individuals and 0.065% among imputed individuals, whereas the corresponding frequencies were 0.034% and 0.005% in the UK Biobank (Table 1). The lower carrier frequency among imputed individuals in the UK Biobank was observed because only one of the 10 variants found there could be imputed. We estimate the population-attributable risk (49) for the loss-of-function variants in PTPN14 in Iceland to be 0.33%. Given how rare the loss-of-function variants are, we compared the probabilities of BCC cases among carriers of the PTPN14 loss-of-function variants as opposed to BCC cases among non-carriers using Fisher exact test and confirmed that the results presented in Table 1 are not sensitive to the choice of statistical test used (Supplementary Table S8).

The genome Aggregation Database (gnomAD, v2.1.1) of 141,456 individuals reports 27 individuals carrying a total of 23 loss-of-function variants in PTPN14 (CF = 0.019%). Most genes have fewer observed loss-of-function variants than expected (median o/e ratio = 0.48) and the o/e ratio for PTPN14 is 14/59.9 = 0.23 (canonical transcript; ref. 50). Among the 13 loss-of-function variants in the PTPN14 burden, two of the variants found in the UK, p.Ala928GlufsTer36 and p.Arg74Ter, were found in individuals of European ancestry in gnomAD (one copy each).

We were able to impute four of the 13 loss-of-function variants, three in Iceland and one in the UK Biobank (Supplementary Table S9). Three of the four associated nominally with increased risk of BCC, but none passed genome-wide significance on its own (Supplementary Table S4). Combined, association of the three Icelandic variants was genome-wide significant [P = 1.1 × 10−9, taking into account the number of autosomal protein coding RefSeq genes in the genome (P < 0.05/18,482 = 3.0 × 10−6; refs. 51, 52)]. Association of the 10 UK variants combined was nominally significant and with an effect consistent with that in Iceland (Table 1, P for effect heterogeneity = 0.49). Excluding the single most significant loss-of-function variant (c.-1_2delinsATGG found in Iceland, OR = 8.9, P = 9.7 × 10−8), the remaining loss-of-function variants associated with BCC with an OR of 7.3 (95% CI, 3.2–16.8; P = 3.3 × 10−6; Supplementary Table S10). The association with BCC remained significant if any one of the other variants was left out (Supplementary Table S10). Therefore, the combined loss-of-function burden effect was not overly dependent on any single variant. Moreover, the association could not have been detected with standard single-variant methods, but rather depended on the use of burden testing.

The high penetrance of PTPN14 loss-of-function variants is apparent in Fig. 2A and B that display BCC-free survival curves for carriers and non-carriers in Iceland and the UK Biobank (Fig. 2A and B). In Iceland, the probability of BCC by age 70 among carriers was 27% (95% CI, 12%–40%) compared with 3.4% (95% CI, 3.3%–3.6%) for non-carriers. In the UK Biobank, the corresponding numbers were 26% (95% CI, 2%–45%) and 6.4% (95% CI, 6.3%–6.5%). The Icelandic data span a broader range of ages, and we estimate the probability of BCC among carriers at age 80 as 44% (95% CI, 20%–60%) and among non-carriers 7.1% (95% CI, 6.9%–7.4%; Supplementary Table S11). These numbers are not as high as the estimated population incidence for BCC (1), probably because the BCC cases have histopathologically confirmed diagnoses and are therefore not inclusive of all BCC cases. The PTPN14 loss-of-function variant burden was not associated with age at diagnosis among BCC cases when tested using a linear model (Iceland: β = −4.9 years; 95% CI, −12.3 to 2.4; UK Biobank: β = 1.6 years; 95% CI, −4.7 to 7.9; Supplementary Fig. S4A and S4B, Supplementary Table S12).

Figure 2.

Comparison of BCC event-free survival rate among carriers and non-carriers of loss-of-function (LOF) sequence variants in the PTPN14 gene among chip-typed individuals in Iceland (A) and the UK Biobank (B) using Kaplan–Meier survival curves. Number of individuals in Iceland: 166,100, thereof 93 carriers. Included in the survival analysis are all born before 2015 as last year of diagnosis is 2014. Number of individuals in the UK Biobank: 408,595, thereof 33 carriers. Last year of diagnosis is 2016. Carrier status = 1 if probability of collapsed LOF PTPN14 variant genotype is larger than 0.9 in imputed samples.

Figure 2.

Comparison of BCC event-free survival rate among carriers and non-carriers of loss-of-function (LOF) sequence variants in the PTPN14 gene among chip-typed individuals in Iceland (A) and the UK Biobank (B) using Kaplan–Meier survival curves. Number of individuals in Iceland: 166,100, thereof 93 carriers. Included in the survival analysis are all born before 2015 as last year of diagnosis is 2014. Number of individuals in the UK Biobank: 408,595, thereof 33 carriers. Last year of diagnosis is 2016. Carrier status = 1 if probability of collapsed LOF PTPN14 variant genotype is larger than 0.9 in imputed samples.

Close modal

We tested for association between the loss-of-function variants in PTPN14 and 24 cancer and three benign tumor types. After accounting for multiple testing with Bonferroni correction (P < 0.05/27 = 1.9 × 10−3), cervical cancer associated with PTPN14 loss-of-function variants with a large effect estimate (OR, 12.7; 95% CI, 3.4–47.6; P = 1.6 × 10−4) with no evidence of heterogeneity between the effect estimates from Iceland and the UK Biobank (Phet = 0.51; Table 2). This finding is driven by the Icelandic data as there were no carriers of PTPN14 loss-of-function variants among the cervical cancer cases in the UK Biobank. However, because of how rare the loss-of-function variants are and how few cervical cancer cases are in the UK Biobank, the probability of seeing no carriers is 0.35, given the strength of the association in the Icelandic data. Among the Icelandic cervical cancer cases, we found that the carriers of the PTPN14 burden were younger at diagnosis than non-carriers (β = −15.2 years; 95% CI, −29.3 to −1.1; P = 0.036; Supplementary Table S12, Supplementary Fig. S5).

Table 2.

Association of the PTPN14 loss-of-function burden with BCC, 24 additional cancers and three benign tumor types in Iceland and the UK Biobank combined and by country of origin.

MetaIcelandUK BiobankIcelandUK Biobank
PhenotypeORPPhetORPORPN affN ctrlN affN ctrl
BCC of the skin 8.02 1.9 × 10−12 0.49 9.29 1.1 × 10−9 6.08 3.2 × 10−4 5,574 294,094 19,631 388,964 
Cervical cancer 12.68 1.6 × 10−4 0.51 13.06 1.4 × 10−4 0.00 0.69 899 135,331 1,065 219,786 
Pituitary adenomaa 8.72 8.0 × 10−3 0.88 8.10 0.03 10.60 0.13 464 292,460 945 407,562 
Ovarian cancer 7.60 0.02 0.41 8.26 0.02 0.00 0.59 874 145,125 1,512 219,339 
Leiomyoma of uterusa 2.42 0.03 0.72 2.14 0.15 2.89 0.09 6,683 117,118 19,461 201,435 
Hepatocellular carcinoma 6.83 0.14 0.76 7.01 0.14 0.00 0.84 253 371,305 232 396,453 
Thyroid cancer 3.35 0.17 0.65 3.45 0.16 0.00 0.73 1,285 298,173 655 407,852 
Non-Hodkins lymphoma 0.00 0.20 1.00 0.00 0.27 0.00 0.53 1,305 356,143 2,073 406,434 
Multiple myeloma 0.00 0.22 1.00 0.00 0.24 0.00 0.71 1,542 313,882 775 407,732 
Endometrial cancer 0.00 0.24 1.00 0.00 0.31 0.00 0.54 1,090 116,560 1,821 219,030 
Colorectal cancer 1.83 0.33 0.30 2.00 0.27 0.00 0.37 4,641 313,578 4,562 403,945 
Brain cancer 0.00 0.39 1.00 0.00 0.43 0.00 0.75 722 356,726 542 407,965 
Esophageal cancer 0.00 0.39 1.00 0.00 0.46 0.00 0.67 695 203,319 999 405,883 
Kidney cancer RCC 0.08 0.43 0.82 0.12 0.53 0.00 0.59 1,921 355,981 1,499 407,008 
Melanoma 1.81 0.45 0.53 1.17 0.88 3.16 0.33 1,784 297,884 4,603 403,904 
Testicular cancer 3.74 0.45 0.62 4.32 0.41 0.00 0.71 367 191,230 727 186,929 
CLL and SLL 0.00 0.47 1.00 0.00 0.53 0.00 0.71 488 200,747 777 406,105 
Squamous cell skin cancer 1.77 0.48 0.33 2.03 0.39 0.00 0.40 2,092 277,414 3,851 403,031 
UADT cancer 2.15 0.50 0.69 2.24 0.48 0.00 0.74 995 257,560 642 403,031 
Lung cancer 1.54 0.51 0.41 1.65 0.45 0.00 0.46 5,143 294,109 3,013 405,494 
Prostate cancer 1.40 0.53 0.26 0.90 0.87 3.15 0.20 6,357 117,090 9,243 177,661 
Bladder cancer 1.96 0.56 0.11 0.00 0.20 4.12 0.26 2,228 315,993 2,777 404,105 
Pancreatic cancer 1.63 0.67 0.64 1.73 0.63 0.00 0.68 1,328 277,762 902 405,980 
Gallbladder and gallways cancer 0.00 0.71 1.00 0.00 0.75 0.00 0.84 134 164,098 249 403,275 
Mesothelioma 0.00 0.74 1.00 0.00 0.81 0.00 0.82 79 179,848 312 387,815 
Breast cancer 1.18 0.75 0.15 0.55 0.42 2.41 0.22 6,908 292,623 14,507 394,088 
Colorectal adenomaa 1.17 0.76 0.14 1.35 0.57 0.00 0.16 10,929 235,619 12,752 395,755 
Gastric cancer 0.89 0.91 0.73 0.92 0.94 0.00 0.72 2,905 237,954 812 407,783 
MetaIcelandUK BiobankIcelandUK Biobank
PhenotypeORPPhetORPORPN affN ctrlN affN ctrl
BCC of the skin 8.02 1.9 × 10−12 0.49 9.29 1.1 × 10−9 6.08 3.2 × 10−4 5,574 294,094 19,631 388,964 
Cervical cancer 12.68 1.6 × 10−4 0.51 13.06 1.4 × 10−4 0.00 0.69 899 135,331 1,065 219,786 
Pituitary adenomaa 8.72 8.0 × 10−3 0.88 8.10 0.03 10.60 0.13 464 292,460 945 407,562 
Ovarian cancer 7.60 0.02 0.41 8.26 0.02 0.00 0.59 874 145,125 1,512 219,339 
Leiomyoma of uterusa 2.42 0.03 0.72 2.14 0.15 2.89 0.09 6,683 117,118 19,461 201,435 
Hepatocellular carcinoma 6.83 0.14 0.76 7.01 0.14 0.00 0.84 253 371,305 232 396,453 
Thyroid cancer 3.35 0.17 0.65 3.45 0.16 0.00 0.73 1,285 298,173 655 407,852 
Non-Hodkins lymphoma 0.00 0.20 1.00 0.00 0.27 0.00 0.53 1,305 356,143 2,073 406,434 
Multiple myeloma 0.00 0.22 1.00 0.00 0.24 0.00 0.71 1,542 313,882 775 407,732 
Endometrial cancer 0.00 0.24 1.00 0.00 0.31 0.00 0.54 1,090 116,560 1,821 219,030 
Colorectal cancer 1.83 0.33 0.30 2.00 0.27 0.00 0.37 4,641 313,578 4,562 403,945 
Brain cancer 0.00 0.39 1.00 0.00 0.43 0.00 0.75 722 356,726 542 407,965 
Esophageal cancer 0.00 0.39 1.00 0.00 0.46 0.00 0.67 695 203,319 999 405,883 
Kidney cancer RCC 0.08 0.43 0.82 0.12 0.53 0.00 0.59 1,921 355,981 1,499 407,008 
Melanoma 1.81 0.45 0.53 1.17 0.88 3.16 0.33 1,784 297,884 4,603 403,904 
Testicular cancer 3.74 0.45 0.62 4.32 0.41 0.00 0.71 367 191,230 727 186,929 
CLL and SLL 0.00 0.47 1.00 0.00 0.53 0.00 0.71 488 200,747 777 406,105 
Squamous cell skin cancer 1.77 0.48 0.33 2.03 0.39 0.00 0.40 2,092 277,414 3,851 403,031 
UADT cancer 2.15 0.50 0.69 2.24 0.48 0.00 0.74 995 257,560 642 403,031 
Lung cancer 1.54 0.51 0.41 1.65 0.45 0.00 0.46 5,143 294,109 3,013 405,494 
Prostate cancer 1.40 0.53 0.26 0.90 0.87 3.15 0.20 6,357 117,090 9,243 177,661 
Bladder cancer 1.96 0.56 0.11 0.00 0.20 4.12 0.26 2,228 315,993 2,777 404,105 
Pancreatic cancer 1.63 0.67 0.64 1.73 0.63 0.00 0.68 1,328 277,762 902 405,980 
Gallbladder and gallways cancer 0.00 0.71 1.00 0.00 0.75 0.00 0.84 134 164,098 249 403,275 
Mesothelioma 0.00 0.74 1.00 0.00 0.81 0.00 0.82 79 179,848 312 387,815 
Breast cancer 1.18 0.75 0.15 0.55 0.42 2.41 0.22 6,908 292,623 14,507 394,088 
Colorectal adenomaa 1.17 0.76 0.14 1.35 0.57 0.00 0.16 10,929 235,619 12,752 395,755 
Gastric cancer 0.89 0.91 0.73 0.92 0.94 0.00 0.72 2,905 237,954 812 407,783 

Note: Sex-specific controls are used in the association testing for sex-specific phenotypes.

Abbreviations: aff, affected; ctrl, controls; CLL and SLL, chronic lymphocytic leukemia and small lymphocytic lymphoma; UADT cancer, upper aerodigestive tract cancer (32).

aBenign tumor types.

We also saw a nominally significant association of PTPN14 variants with pituitary adenoma (OR = 8.7 and P = 8.0 × 10−3), which did not survive correction for multiple testing. Pituitary adenomas are classified according to the type of pituitary hormones secreted (if any; ref. 53). Combining the available phenotype data from Iceland and the UK Biobank, we examined a sub-phenotype of pituitary adenoma causing acromegaly (GH secreting) as well as indications of other types of pituitary adenoma, that is, hyperprolactinaemia (prolactin-secreting adenoma) and Cushing syndrome (ACTH-secreting adenoma). We also examined the blood level of thyrotropin. We observed an association only with acromegaly and with a large effect (OR, 47.7; 95% CI, 4.8–477.1; P = 1.0 × 10−3). This association was nominally significant in both Iceland and the UK Biobank (Iceland: P = 0.03, UK Biobank: P = 0.01). These observations indicate that the PTPN14 loss-of-function burden association with pituitary adenoma may be driven by GH-secreting subtypes (Supplementary Table S13).

We tested for association between loss-of-function variants in PTPN14 and pigmentation phenotypes but found no indication that the association with BCC is mediated through pigmentation traits (Supplementary Table S14).

We collected 161 PTPN14 variants found in the combined dataset predicted to have moderate biological impact and tested them for association with BCC (Supplementary Table S15, Supplementary Table S2). One variant, rs17022807, a splice region variant of MAF = 8.0% (mean of Iceland and the UK, with imputation information >0.9 in both datasets), associated with BCC with an OR of 1.08 (P = 3.1 × 10−5, Bonferroni corrected P value threshold 0.05/161 = 3.1 × 10−4; Supplementary Table S15). The association of rs17022807 and BCC replicated in BCC summary level GWAS data from 23andMe, Inc. (12,945 BCC cases and 274,252 controls). Combined analysis of rs17022807 in Iceland, the UK Biobank and 23andMe produced a genome-wide significant result (OR, 1.09; 95% CI, 1.06–1.12; P = 1.3 × 10−8; Supplementary Table S16). We note that rs17022807 cannot explain the association of the PTPN14 loss-of-function burden with BCC as there is no correlation between the genotype count of rs17022807 and the PTPN14 loss-of-function burden in either population tested (r2 = 4.0 × 10−5 and r2 = 6.3 × 10−6 in Iceland and the UK, respectively). Using GTEx RNA-sequence data from both sun exposed and non-sun exposed skin, we were not able to detect an effect of the annotated splice region variant rs17022807 on PTPN14 RNA abundance or splice site utilization.

No associations with BCC have been reported previously for common germline variants at the PTPN14 locus. To further investigate BCC associations at the locus, we conducted a meta-analysis of all variants in a 2-MB window around the splice region variant rs17022807 in the datasets from Iceland, the UK Biobank, and 23andMe. Two intronic variants in PTPN14 associated with BCC with lower P values than rs17022807. One is in strong LD with rs17022807 (rs3104458, r2 = 0.74, MAF = 9%), the other less so (rs6700380, r2 = 0.11, MAF = 37.7%; Supplementary Table S17; Fig. 3). It was not discernable whether rs6700380 and rs17022807 are independent signals (see Supplementary Method). We were not able to detect an association between rs17022807 and cervical cancer or pituitary adenoma in a combined analysis of Iceland and the UK Biobank (Supplementary Table S18).

Figure 3.

Regional association plot for the splice region variant rs17022807. The P values (−log10) of variant associations with BCC in the meta-analysis of Iceland, UK Biobank, and 23andMe are shown against their NCBI Build 38 positions in a 2 MB window around the index splice region variant rs17022807. Number of cases versus controls in Iceland, UK Biobank, and 23andMe: 5,574/294,094, 19,631/388,964, 12,945/274,252 or a total of 38,150/957,310. The colors of the genomic variants reflect the LD (r2) in the Icelandic dataset with the index variant. Only markers found in all three datasets are plotted to avoid inconsistency in calculated P values by sample size. The horizontal bar at the top of the figure denotes the P value for the combined PTPN14 loss-of-function (LOF) burden test in Iceland and the UK Biobank for BCC. There is no correlation between the genotypes of the rare LOF variants (not displayed in the plot) in the PTPN14 burden and rs17022807.

Figure 3.

Regional association plot for the splice region variant rs17022807. The P values (−log10) of variant associations with BCC in the meta-analysis of Iceland, UK Biobank, and 23andMe are shown against their NCBI Build 38 positions in a 2 MB window around the index splice region variant rs17022807. Number of cases versus controls in Iceland, UK Biobank, and 23andMe: 5,574/294,094, 19,631/388,964, 12,945/274,252 or a total of 38,150/957,310. The colors of the genomic variants reflect the LD (r2) in the Icelandic dataset with the index variant. Only markers found in all three datasets are plotted to avoid inconsistency in calculated P values by sample size. The horizontal bar at the top of the figure denotes the P value for the combined PTPN14 loss-of-function (LOF) burden test in Iceland and the UK Biobank for BCC. There is no correlation between the genotypes of the rare LOF variants (not displayed in the plot) in the PTPN14 burden and rs17022807.

Close modal

We conducted loss-of-function burden tests across autosomal genes and detected an association between PTPN14 high-impact variants and BCC. A genome-wide significant association with common variant/variants at the locus reinforced our conclusion that PTPN14 is a new BCC predisposition gene. Through an investigation of several cancer and benign tumor types, we found that the combined PTPN14 loss-of-function variants potentially confer high risk of cervical cancer and GH-secreting pituitary adenomas. We did not detect association between the common variant (rs17022807) and cervical cancer or pituitary adenoma; however, our power to detect such an association is much less than for BCC.

Very rare instances of homozygous loss-of-function variants in PTPN14 cause choanal atresia (bony and/or membranous occlusion of the nasal passages) and lymphedema (54, 55). However, no phenotypes in heterozygous carriers have been described.

The finding that PTPN14 is a causal gene in BCC pathogenesis is supported by a previous report of somatic mutations in the gene. Bonilla and colleagues (9) found that of 293 BCC tumors explored, 23% had presumptive loss-of-function and deleterious missense mutations in PTPN14 and that 61% of the variants were truncating. PTPN14 bridges the tumor-suppressive functions of p53 to the Hippo–YAP pathway, which plays a crucial role in maintaining the balance between basal stem cell proliferation and differentiation in epidermis (14–16, 20, 56). PTPN14 is transcriptionally activated by p53 or p63 (20, 57). Binding of PTPN14 to YAP and other components of the Hippo–YAP pathway promotes the cytoplasmic sequestration of YAP and thereby inhibits the growth promoting and potentially oncogenic activities of YAP in the nucleus (Fig. 4; refs. 58–60).

Figure 4.

PTPN14 and the Hippo–YAP pathway. Signaling through the Hippo pathway activates LATS1. LATS1 phosphorylates YAP, resulting in its sequestration in the cytoplasm. PTPN14 expression is activated by p53 or p63. PTPN14 binds to YAP, again resulting in sequestration of YAP in the cytoplasm. PTPN14 also inhibits YAP indirectly through stimulation of LATS1 kinase activity. PTPN14 can be inactivated by binding to HPV E7 proteins, by somatic sequence variant, and by germline sequence variant (as reported in this article). If inhibition of YAP is removed, the protein translocates to the nucleus where it acts as a transcriptional coactivator of target genes in cooperation with the TEAD family of transcription factors. Transcriptional activation by YAP can lead to expansion of stem and progenitor cell compartments, inhibition of differentiation, tissue overgrowth, and oncogenesis. For clarity, not all components of the pathways are depicted.

Figure 4.

PTPN14 and the Hippo–YAP pathway. Signaling through the Hippo pathway activates LATS1. LATS1 phosphorylates YAP, resulting in its sequestration in the cytoplasm. PTPN14 expression is activated by p53 or p63. PTPN14 binds to YAP, again resulting in sequestration of YAP in the cytoplasm. PTPN14 also inhibits YAP indirectly through stimulation of LATS1 kinase activity. PTPN14 can be inactivated by binding to HPV E7 proteins, by somatic sequence variant, and by germline sequence variant (as reported in this article). If inhibition of YAP is removed, the protein translocates to the nucleus where it acts as a transcriptional coactivator of target genes in cooperation with the TEAD family of transcription factors. Transcriptional activation by YAP can lead to expansion of stem and progenitor cell compartments, inhibition of differentiation, tissue overgrowth, and oncogenesis. For clarity, not all components of the pathways are depicted.

Close modal

High-risk forms of HPV are etiological agents of cervical cancer. Cellular transformation is carried out by the E6 and E7 oncoproteins of the virus, predominantly through their interactions with the tumor-suppressor proteins p53 and Rb, respectively. However, these interactions do not fully account for the transforming activity of the virus. Recently, it was found that high-risk HPV E7 proteins bind also to PTPN14 and target it for proteasomal degradation. As a result, differentiation is impaired and survival is enhanced in infected epithelial cells (19, 22, 23). Taken together with our observation that PTPN14 loss-of-function variants are associated with high risk of cervical cancer, these findings suggest that oncogenic transformation by HPV in the cervix may be potentiated by a germline loss-of-function of one of the PTPN14 alleles. This intriguing possibility warrants further investigation.

GH-secreting pituitary adenomas cause pituitary gigantism or acromegaly, depending on whether the tumor arises before or after epiphyseal closure. Acromegaly and pituitary gigantism are exceedingly rare with annual incidences of about 10 and 8 cases per million persons, respectively, and frequently have an underlying genetic defect (33, 53, 61). Rostomyan and colleagues (62) carried out genetic testing in a population of 143 patients with pituitary gigantism. Although AIP was the single most commonly mutated gene, more than 50% of patients had no identified genetic cause and the disease was more aggressive in these idiopathic patients. We confirmed that none of the pituitary adenoma-affected carriers of the PTPN14 loss-of-function variants carried a sequence variant in a known predisposing gene (AIP, CDH23, GPR101, CDKN1B, MEN1, and RET; refs. 61, 63, 64). Further analyses are needed to pursue our provocative observation of an association of PTPN14 loss-of-function variants with pituitary adenoma, apparently driven primarily by an association with the GH-secreting form.

S.N. Stacey reports employment with deCODE genetics/AMGEN. G. Thorleifsson reports employment with deCode genetic/Amgen. K. Norland reports employment with deCODE genetics/Amgen. S. Kristmundsdottir reports employment with deCODE/Amgen. H. Jonsson reports other support from deCODE genetics/Amgen during the conduct of the study. A. Oddsson reports employment with deCODE genetics. F. Zink reports employment with Amgen/deCODE genetics. H.P. Eggertsson reports employment with deCODE Genetics/AMGEN Inc. B.V. Halldorsson reports employment with deCODE genetics/Amgen. U. Thorsteinsdottir reports employment with deCODE genetics/Amgen. P. Sulem reports other support from deCODE genetics AMGEN outside the submitted work. No disclosures were reported by the other authors.

T. Olafsdottir: Conceptualization, data curation, software, formal analysis, visualization, methodology, writing–original draft, project administration, writing–review and editing. S.N. Stacey: Conceptualization, data curation, formal analysis, visualization, methodology, writing–original draft, project administration, writing–review and editing. G. Sveinbjornsson: Conceptualization, data curation, software, formal analysis, visualization, methodology, writing–review and editing. G. Thorleifsson: Conceptualization, data curation, formal analysis, methodology, writing–original draft, writing–review and editing. K. Norland: Conceptualization, data curation, software, formal analysis, methodology, writing–review and editing. B. Sigurgeirsson: Data curation, writing–review and editing. K. Thorisdottir: Data curation, writing–review and editing. A.K. Kristjansson: Data curation, writing–review and editing. L. Tryggvadottir: Data curation, writing–review and editing. K.Y. Sarin: Data curation, writing–review and editing. R. Benediktsson: Data curation, writing–review and editing. J.G. Jonasson: Data curation, writing–review and editing. A. Sigurdsson: Formal analysis, writing–review and editing. A. Jonasdottir: Formal analysis, writing–review and editing. S. Kristmundsdottir: Data curation, formal analysis, writing–review and editing. H. Jonsson: Data curation, formal analysis, writing–review and editing. A. Gylfason: Data curation, formal analysis, writing–review and editing. A. Oddsson: Data curation, visualization, writing–review and editing. R. Fridriksdottir: Formal analysis, writing–review and editing. S.A. Gudjonsson: Data curation, formal analysis, writing–review and editing. F. Zink: Data curation, visualization, writing–review and editing. S.H. Lund: Formal analysis, writing–review and editing. S. Rognvaldsson: Data curation, formal analysis, writing–review and editing. P. Melsted: Data curation, formal analysis, writing–review and editing. V. Steinthorsdottir: Data curation, writing–review and editing. J. Gudmundsson: Conceptualization, data curation, writing–review and editing. E. Mikaelsdottir: Data curation, writing–review and editing. P.I. Olason: Data curation, formal analysis, writing–review and editing. L. Stefansdottir: Formal analysis, writing–review and editing. H.P. Eggertsson: Data curation, writing–review and editing. B.V. Halldorsson: Data curation, writing–review and editing. U. Thorsteinsdottir: Conceptualization, methodology, writing–original draft, writing–review and editing. T.T. Agustsson: Data curation, writing–review and editing. K. Olafsson: Data curation, writing–review and editing. J.H. Olafsson: Data curation, writing–review and editing. P. Sulem: Conceptualization, formal analysis, methodology, writing–original draft, writing–review and editing. T. Rafnar: Conceptualization, data curation, formal analysis, methodology, writing–original draft, writing–review and editing. D.F. Gudbjartsson: Conceptualization, data curation, software, formal analysis, visualization, methodology, writing–original draft, writing–review and editing. K. Stefansson: Conceptualization, methodology, writing–original draft, writing–review and editing.

This research has been conducted using the UK Biobank Resource under Application Number 56270. The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. The RNA-sequencing data used for the analyses described in this article were obtained from the GTEx Portal on 03/10/2020 with dbGaP accession number phs000424.v8.p2 and the raw sequencing data from the SRA on 02/27/2020. We thank 23andMe for sharing their data and we thank the individuals who participated in the study and whose contribution made this work possible.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Verkouteren
JAC
,
Ramdas
KHR
,
Wakkee
M
,
Nijsten
T
. 
Epidemiology of basal cell carcinoma: scholarly review
.
Brit J Dermatol
2017
;
177
:
359
72
.
2.
Bray
F
,
Ferlay
J
,
Soerjomataram
I
,
Siegel
RL
,
Torre
LA
,
Jemal
A
. 
Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries
.
CA Cancer J Clin
2018
;
68
:
394
424
.
3.
Epstein
EH
. 
Basal cell carcinomas: attack of the hedgehog
.
Nat Rev Cancer
2008
;
8
:
743
54
.
4.
Foulkes
WD
,
Kamihara
J
,
Evans
DGR
,
Brugieres
L
,
Bourdeaut
F
,
Molenaar
JJ
, et al
Cancer surveillance in gorlin syndrome and rhabdoid tumor predisposition syndrome
.
Clin Cancer Res
2017
;
23
:
E62
E7
.
5.
Roberts
MR
,
Asgari
MM
,
Toland
AE
. 
Genome-wide association studies and polygenic risk scores for skin cancer: clinically useful yet?
Brit J Dermatol
2019
;
181
:
1146
55
.
6.
Bastiaens
MT
,
terHuurne
JAC
,
Kielich
C
,
Gruis
NA
,
Westendorp
RGJ
,
Vermeer
BJ
, et al
Melanocortin-1 receptor gene variants determine the risk of nonmelanoma skin cancer independently of fair skin and red hair
.
Am J Hum Genet
2001
;
68
:
884
94
.
7.
Gudbjartsson
DF
,
Sulem
P
,
Stacey
SN
,
Goldstein
AM
,
Rafnar
T
,
Sigurgeirsson
B
, et al
ASIP and TYR pigmentation variants associate with cutaneous melanoma and basal cell carcinoma
.
Nat Genet
2008
;
40
:
886
91
.
8.
Stacey
SN
,
Gudbjartsson
DF
,
Sulem
P
,
Bergthorsson
JT
,
Kumar
R
,
Thorleifsson
G
, et al
Common variants on 1p36 and 1q42 are associated with cutaneous basal cell carcinoma but not with melanoma or pigmentation traits
.
Nat Genet
2008
;
40
:
1313
18
.
9.
Bonilla
X
,
Parmentier
L
,
King
B
,
Bezrukov
F
,
Kaya
G
,
Zoete
V
, et al
Genomic analysis identifies new drivers and progression pathways in skin basal cell carcinoma
.
Nat Genet
2016
;
48
:
398
.
10.
Pellegrini
C
,
Maturo
MG
,
Di Nardo
L
,
Ciciarelli
V
,
Garcia-Rodrigo
CG
,
Fargnoli
MC
. 
Understanding the molecular genetics of basal cell carcinoma
.
Int J MolSci
2017
;
18
:
2485
.
11.
Rafnar
T
,
Sulem
P
,
Stacey
SN
,
Geller
F
,
Gudmundsson
J
,
Sigurdsson
A
, et al
Sequence variants at the TERT-CLPTM1L locus associate with many cancer types
.
Nat Genet
2009
;
41
:
221
7
.
12.
Stacey
SN
,
Sulem
P
,
Jonasdottir
A
,
Masson
G
,
Gudmundsson
J
,
Gudbjartsson
DF
, et al
A germline variant in the TP53 polyadenylation signal confers cancer susceptibility
.
Nat Genet
2011
;
43
:
1098
103
.
13.
Stacey
SN
,
Helgason
H
,
Gudjonsson
SA
,
Thorleifsson
G
,
Zink
F
,
Sigurdsson
A
, et al
New basal cell carcinoma susceptibility loci
.
Nat Commun
2015
;
6
:
6825
.
14.
Aylon
Y
,
Oren
M
. 
Tumor suppression by p53: bring in the Hippo!
Cancer Cell
2017
;
32
:
397
9
.
15.
Maglic
D
,
Schlegelmilch
K
,
Dost
AF
,
Panero
R
,
Dill
MT
,
Calogero
RA
, et al
YAP-TEAD signaling promotes basal cell carcinoma development via a c-JUN/AP1 axis
.
EMBO J
2018
;
37
:
e98642
.
16.
Debaugnies
M
,
Sánchez-Danés
A
,
Rorive
S
,
Raphaël
M
,
Liagre
M
,
Parent
MA
, et al
YAP and TAZ are essential for basal and squamous cell carcinoma initiation
.
EMBO Rep
2018
;
19
:
e45809
.
17.
Camargo
FD
,
Gokhale
S
,
Johnnidis
JB
,
Fu
D
,
Bell
GW
,
Jaenisch
R
, et al
YAP1 increases organ size and expands undifferentiated progenitor cells
.
Curr Biol
2007
;
17
:
2054
60
.
18.
Dong
J
,
Feldmann
G
,
Huang
J
,
Wu
S
,
Zhang
N
,
Comerford
SA
, et al
Elucidation of a universal size-control mechanism in Drosophila and mammals
.
Cell
2007
;
130
:
1120
33
.
19.
Hatterschide
J
,
Bohidar
AE
,
Grace
M
,
Nulton
TJ
,
Kim
HW
,
Windle
B
, et al
PTPN14 degradation by high-risk human papillomavirus E7 limits keratinocyte differentiation and contributes to HPV-mediated oncogenesis
.
Proc Natl Acad Sci USA
2019
;
116
:
7033
42
.
20.
Mello
SS
,
Valente
LJ
,
Raj
N
,
Seoane
JA
,
Flowers
BM
,
McClendon
J
, et al
A p53 super-tumor suppressor reveals a tumor suppressive p53-Ptpn14-Yap axis in pancreatic cancer
.
Cancer Cell
2017
;
32
:
460
.
21.
Moroishi
T
,
Hansen
CG
,
Guan
KL
. 
The emerging roles of YAP and TAZ in cancer
.
Nat Rev Cancer
2015
;
15
:
73
9
.
22.
Szalmas
A
,
Tomaic
V
,
Basukala
O
,
Massimi
P
,
Mittal
S
,
Konya
J
, et al
The PTPN14 tumor suppressor is a degradation target of human papillomavirus E7
.
J Virol
2017
;
91
:
e00057
17
.
23.
White
EA
,
Munger
K
,
Howley
PM
. 
High-risk human papillomavirus E7 proteins target PTPN14 for degradation
.
Mbio
2016
;
7
:
e01530
16
.
24.
Zhao
B
,
Lei
QY
,
Guan
KL
. 
The Hippo–YAP pathway: new connections between regulation of organ size and cancer
.
Curr Opin Cell Biol
2008
;
20
:
638
46
.
25.
Bycroft
C
,
Freeman
C
,
Petkova
D
,
Band
G
,
Elliott
LT
,
Sharp
K
, et al
The UK Biobank resource with deep phenotyping and genomic data
.
Nature
2018
;
562
:
203
9
.
26.
Gudbjartsson
DF
,
Helgason
H
,
Gudjonsson
SA
,
Zink
F
,
Oddson
A
,
Gylfason
A
, et al
Large-scale whole-genome sequencing of the Icelandic population
.
Nat Genet
2015
;
47
:
435
44
.
27.
Eggertsson
HP
,
Jonsson
H
,
Kristmundsdottir
S
,
Hjartarson
E
,
Kehr
B
,
Masson
G
, et al
Graphtyper enables population-scale genotyping using pangenome graphs
.
Nat Genet
2017
;
49
:
1654
60
.
28.
Kong
A
,
Masson
G
,
Frigge
ML
,
Gylfason
A
,
Zusmanovich
P
,
Thorleifsson
G
, et al
Detection of sharing by descent, long-range phasing and haplotype imputation
.
Nat Genet
2008
;
40
:
1068
75
.
29.
Eilbeck
K
,
Lewis
SE
,
Mungall
CJ
,
Yandell
M
,
Stein
L
,
Durbin
R
, et al
The Sequence Ontology: a tool for the unification of genome annotations
.
Genome Biol
2005
;
6
:
R44
.
30.
McLaren
W
,
Pritchard
B
,
Rios
D
,
Chen
Y
,
Flicek
P
,
Cunningham
F
. 
Deriving the consequences of genomic variants with the Ensembl API and SNP effect predictor
.
Bioinformatics
2010
;
26
:
2069
70
.
31.
Sveinbjornsson
G
,
Albrechtsen
A
,
Zink
F
,
Gudjonsson
SA
,
Oddson
A
,
Másson
G
, et al
Weighting sequence variants based on their annotation increases power of whole-genome association studies
.
Nat Genet
2016
;
48
:
314
7
.
32.
Rafnar
T
,
Sigurjonsdottir
GR
,
Stacey
SN
,
Halldorsson
G
,
Sulem
P
,
Pardo
LM
, et al
Association of BRCA2 K3326* with small-cell lung cancer and squamous cell cancer of the skin
.
J Natl Cancer Inst
2018
;
110
:
967
74
.
33.
Agustsson
TT
,
Baldvinsdottir
T
,
Jonasson
JG
,
Olafsdottir
E
,
Steinthorsdottir
V
,
Sigurdsson
G
, et al
The epidemiology of pituitary adenomas in Iceland, 1955–2012: a nationwide population-based study
.
Eur J Endocrinol
2015
;
173
:
655
64
.
34.
Rafnar
T
,
Gunnarsson
B
,
Stefansson
OA
,
Sulem
P
,
Ingason
A
,
Frigge
ML
, et al
Variants associating with uterine leiomyoma highlight genetic background shared by various cancers and hormone-related traits
.
Nat Commun
2018
;
9
:
3636
.
35.
Sigurdardottir
LG
,
Jonasson
JG
,
Stefansdottir
S
,
Jonsdottir
A
,
Olafsdottir
GH
,
Olafsdottir
EJ
, et al
Data quality at the Icelandic Cancer Registry: comparability, validity, timeliness, and completeness
.
Acta Oncol
2012
;
51
:
880
9
.
36.
Van Hout
CV
,
Tachmazidou
I
,
Backman
JD
,
Hoffman
JD
,
Liu
D
,
Pandey
AK
, et al
Exome sequencing and characterization of 49,960 individuals in the UK Biobank
.
Nature
2020
;
586
:
749
56
.
37.
Møller
H
,
Richards
S
,
Hanchett
N
,
Riaz
SP
,
Lüchtenborg
M
,
Holmberg
L
, et al
Completeness of case ascertainment and survival time error in English cancer registries: impact on 1-year survival estimates
.
Br J Cancer
2011
;
105
:
170
6
.
38.
Chahal
HS
,
Wu
W
,
Ransohoff
KJ
,
Yang
L
,
Hedlin
H
,
Desai
M
, et al
Genome-wide association study identifies 14 novel risk alleles associated with basal cell carcinoma
.
Nat Commun
2016
;
7
:
12510
.
39.
Bulik-Sullivan
BK
,
Loh
P-R
,
Finucane
HK
,
Ripke
S
,
Yang
J
,
Patterson
N
, et al
LD Score regression distinguishes confounding from polygenicity in genome-wide association studies
.
Nat Genet
2015
;
47
:
291
5
.
40.
Mantel
N
,
Haenszel
W
. 
Statistical aspects of the analysis of data from retrospective studies of disease
.
J Natl Cancer Inst
1959
;
22
:
719
48
.
41.
Loh
PR
,
Tucker
G
,
Bulik-Sullivan
BK
,
Vilhjálmsson
BJ
,
Finucane
HK
,
Salem
RM
, et al
Efficient Bayesian mixed-model analysis increases association power in large cohorts
.
Nat Genet
2015
;
47
:
284
90
.
42.
Helgason
H
,
Rafnar
T
,
Olafsdottir
HS
,
Jonasson
JG
,
Sigurdsson
A
,
Stacey
SN
, et al
Loss-of-function variants in ATM confer risk of gastric cancer
.
Nat Genet
2015
;
47
:
906
10
.
43.
Lee
S
,
Abecasis
GR
,
Boehnke
M
,
Lin
X
. 
Rare-variant association analysis: study designs and statistical tests
.
Am J Hum Genet
2014
;
95
:
5
23
.
44.
Li
B
,
Leal
SM
. 
Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data
.
Am J Hum Genet
2008
;
83
:
311
21
.
45.
Stitziel
NO
,
Kiezun
A
,
Sunyaev
S
. 
Computational and statistical approaches to analyzing variants identified by exome sequencing
.
Genome Biol
2011
;
12
:
227
.
46.
Cirulli
ET
,
White
S
,
Read
RW
,
Elhanan
G
,
Metcalf
WJ
,
Tanudjaja
F
, et al
Genome-wide rare variant analysis for thousands of phenotypes in over 70,000 exomes from two cohorts
.
Nat Commun
2020
;
11
:
542
.
47.
Levy-Lahad
E
,
Catane
R
,
Eisenberg
S
,
Kaufman
B
,
Hornreich
G
,
Lishinsky
E
, et al
Founder BRCA1 and BRCA2 mutations in Ashkenazi Jews in Israel: frequency and differential penetrance in ovarian cancer and in breast-ovarian cancer families
.
Am J Hum Genet
1997
;
60
:
1059
67
.
48.
Ruark
E
,
Snape
K
,
Humburg
P
,
Loveday
C
,
Bajrami
I
,
Brough
R
, et al
Mosaic PPM1D mutations are associated with predisposition to breast and ovarian cancer
.
Nature
2013
;
493
:
406
10
.
49.
Population attributable risk (PAR) population attributable risk (PAR)
.
In
:
Kirch
W
,
editor
.
Encyclopedia of Public Health
.
Dordrecht
:
Springer Netherlands
; 
2008
. p.
1117
8
.
50.
Karczewski
KJ
,
Francioli
LC
,
Tiao
G
,
Cummings
BB
,
Alföldi
J
,
Wang
Q
, et al
The mutational constraint spectrum quantified from variation in 141,456 humans
.
Nature
2020
;
581
:
434
43
.
51.
O'Leary
NA
,
Wright
MW
,
Brister
JR
,
Ciufo
S
,
Haddad
D
,
McVeigh
R
, et al
Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation
.
Nucleic Acids Res
2016
;
44
:
D733
D45
.
52.
Kent
WJ
,
Sugnet
CW
,
Furey
TS
,
Roskin
KM
,
Pringle
TH
,
Zahler
AM
, et al
The human genome browser at UCSC
.
Genome Res
2002
;
12
:
996
1006
.
53.
Melmed
S
. 
Pituitary-tumor endocrinopathies
.
N Engl J Med
2020
;
382
:
937
50
.
54.
Au
AC
,
Hernandez
PA
,
Lieber
E
,
Nadroo
AM
,
Shen
YM
,
Kelley
KA
, et al
Protein tyrosine phosphatase PTPN14 is a regulator of lymphatic function and choanal development in humans
.
Am J Hum Genet
2010
;
87
:
436
44
.
55.
Bordbar
A
,
Maroofian
R
,
Ostergaard
P
,
Kashaki
M
,
Nikpour
S
,
Gordon
K
, et al
A homozygous loss-of-function mutation in PTPN14 causes a syndrome of bilateral choanal atresia and early infantile-onset lymphedema: PTPN14 mutation in lymphedema-choanal atresia
.
Meta Gene
2017
;
14
:
53
8
.
56.
Zhang
HY
,
Pasolli
HA
,
Fuchs
E
. 
Yes-associated protein (YAP) transcriptional coactivator functions in balancing growth and differentiation in skin
.
Proc Natl Acad Sci USA
2011
;
108
:
2270
5
.
57.
Perez
CA
,
Ott
J
,
Mays
DJ
,
Pietenpol
JA
. 
p63 consensus DNA-binding site: identification, analysis and application into a p63MH algorithm
.
Oncogene
2007
;
26
:
7363
70
.
58.
Liu
X
,
Yang
N
,
Figel
SA
,
Wilson
KE
,
Morrison
CD
,
Gelman
IH
, et al
PTPN14 interacts with and negatively regulates the oncogenic function of YAP
.
Oncogene
2013
;
32
:
1266
73
.
59.
Michaloglou
C
,
Lehmann
W
,
Martin
T
,
Delaunay
C
,
Hueber
A
,
Barys
L
, et al
The tyrosine phosphatase PTPN14 is a negative regulator of YAP activity
.
PLoS ONE
2013
;
8
:
e61916
.
60.
Wilson
KE
,
Li
YW
,
Yang
N
,
Shen
H
,
Orillion
AR
,
Zhang
JM
. 
PTPN14 forms a complex with kibra and LATS1 proteins and negatively regulates the YAP oncogenic function
.
J Biol Chem
2014
;
289
:
23693
700
.
61.
Hannah-Shmouni
F
,
Trivellin
G
,
Stratakis
CA
. 
Genetics of gigantism and acromegaly
.
Growth Horm IGF Res
2016
;
30–31
:
37
41
.
62.
Rostomyan
L
,
Daly
AF
,
Petrossians
P
,
Nachev
E
,
Lila
AR
,
Lecoq
AL
, et al
Clinical and genetic characterization of pituitary gigantism: an international collaborative study in 208 patients
.
Endocr Relat Cancer
2015
;
22
:
745
57
.
63.
Beckers
A
,
Petrossians
P
,
Hanson
J
,
Daly
AF
. 
The causes and consequences of pituitary gigantism
.
Nat Rev Endocrinol
2018
;
14
:
705
20
.
64.
Pepe
S
,
Korbonits
M
,
Iacovazzo
D
. 
Germline and mosaic mutations causing pituitary tumours: genetic and molecular aspects
.
J Endocrinol
2019
;
240
:
R21
r45
.