Abstract
Endometrial cancer, the most commonly diagnosed cancer of the female reproductive tract in developed countries, has a heritable component. To date, 16 genetic risk regions have been robustly discovered by genome-wide association studies (GWAS) of endometrial cancer. Post-GWAS analyses including expression quantitative trait loci analysis and laboratory-based functional studies have been successful in identifying genes and pathways involved in endometrial carcinogenesis. Mendelian randomization analysis studies have confirmed factors causal for endometrial cancer risk, including increased body mass index and early onset of menarche. In this review, we summarize findings from GWAS and post-GWAS analyses of endometrial cancer. We discuss clinical implications of these findings, current knowledge gaps, and future directions for the study of endometrial cancer genetics.
Introduction
Endometrial cancer, a malignancy of the lining of the uterus, is the most common gynecological cancer diagnosed in developed countries, accounting for approximately 5% of all female cancers (http://gco.iarc.fr/today/home); age-standardized rates are steadily increasing (1). Endometrial carcinoma has traditionally been classified into two groups defined by histologic subtype: endometrioid and nonendometrioid. Endometrioid endometrial cancers, which comprise the majority of endometrial cancers (80%), develop from glandular cells in the lining of endometrium and are generally associated with hyperplasia. These tumors are estrogen-dependent and tend to be low grade with a favorable overall prognosis (2). Conversely, nonendometrioid endometrial cancers (commonly serous papillary or clear cell histology) are nonestrogen dependent and often exhibit a more aggressive clinical course that has a very poor prognosis (2).
Endometrial cancer treatment has remained almost static over the last four decades relying principally on surgery, with full hysterectomy the most common and effective treatment for early-stage disease (reviewed in ref. 3). However, the prognosis for advanced, recurrent, or metastatic stage is still poor with high rates of recurrence and lower 3-year survival rates: endometrial cancer patients with local (vaginal) metastasis have a 3-year survival of 73%, pelvic metastasis 8% survival, and distant metastasis 15% survival (4). Furthermore, the median survival time for advanced-stage endometrial cancer is generally less than 12 months (5, 6).
Women with a family history of endometrial cancer have an approximately 2-fold increased risk of developing the disease (7, 8). Although some of the associations between family history and endometrial cancer risk are attributable to shared environmental and/or lifestyle risk factors, twin studies have estimated the heritability to be between 27% and 52% (9–12).
The currently known genetic architecture of endometrial cancer is displayed in Fig. 1 and is consistent with a polygenic model of inheritance. Rare germline pathogenic (i.e., high-risk) variants in cancer syndrome genes, i.e., DNA mismatch repair genes associated with Lynch Syndrome (MLH1, MSH2, MSH6, and PMS2) or PTEN in the context of Cowden's disease, explain approximately 3% to 5% of endometrial cancer cases at the population level (reviewed by ref. 13). Evidence also supports the contribution of rare pathogenic variants in other DNA repair–related genes to endometrial cancer risk, including POLD1, POLE, and possibly BRCA1 (reviewed by ref. 13). Pathogenic variants in other genes are likely to exist, but we would expect the frequency of such variants to be low, and thus they could account for only a small proportion of endometrial cancer cases. In contrast, although each endometrial cancer predisposition variant [such as the 16 risk variants identified to date in genome-wide association studies (GWAS; refs. 14–17)] has only a modest effect on risk, together they are likely to explain far more of the familial relative risk of the disease.
Schematic of the known genetic architecture of endometrial cancer. Low-frequency genetic variants from PTEN and the mismatch repair genes (MLH1, MSH2, MSH6, and PMS2) are considered high-risk variants (estimated risk > 4-fold). Common variants (frequency > 1%) identified by GWAS are considered low-risk variants (estimated risk < 2-fold). No variants associated with moderate risk (estimated risk ∼2- to 4-fold) are currently established. Although variants in other genes have been implicated in predisposition to endometrial cancer, including STK11, POLD, and POLE, their clinical utility for the purpose of altering patient management is unclear, because risk estimates are imprecise and/or based on a limited number of studies with potential for ascertainment bias (13).
Schematic of the known genetic architecture of endometrial cancer. Low-frequency genetic variants from PTEN and the mismatch repair genes (MLH1, MSH2, MSH6, and PMS2) are considered high-risk variants (estimated risk > 4-fold). Common variants (frequency > 1%) identified by GWAS are considered low-risk variants (estimated risk < 2-fold). No variants associated with moderate risk (estimated risk ∼2- to 4-fold) are currently established. Although variants in other genes have been implicated in predisposition to endometrial cancer, including STK11, POLD, and POLE, their clinical utility for the purpose of altering patient management is unclear, because risk estimates are imprecise and/or based on a limited number of studies with potential for ascertainment bias (13).
This review will summarize the findings of recent endometrial cancer GWAS and review post-GWAS analyses, including Mendelian randomization studies and the functional follow-up of endometrial cancer genetic risk regions to identify candidate target genes.
Genome-Wide Association Studies of Endometrial Cancer Risk
Since the late 2000s, genotyping arrays, consisting of hundreds of thousands of common genetic variants across the genome, have revolutionized the study of the genetic basis of complex traits. For many diseases, GWAS have been remarkably successful in unlocking the biology of disease and driving therapeutic development (18); early indications suggest that this will also be true for endometrial cancer.
In recent years, a series of GWAS in European populations have identified 16 endometrial cancer genetic risk regions (Table 1). The first endometrial cancer GWAS, reported in 2011, identified a novel genetic risk region at 17q12, intronic to HNF1B (19). This study involved a stage I GWAS of 1,265 endometrioid endometrial cancer cases and 5,190 controls, followed by stage II validation of 47 genetic variants in 3,957 cases and 6,886 controls. This finding was directionally concordant in a subsequent GWAS by the Epidemiology of Endometrial Cancer Consortium (E2C2), including 4,989 cases across two stages (20). A meta-analysis of these two GWAS studies, with a third GWAS from the National Study of Endometrial Cancer Genetics, totaling 4,907 cases, identified a new, intergenic, risk region at chromosome 6p22.3 (14).
Genetic risk regions and candidate target genes identified by endometrial cancer genetic association studies to date
Risk locus . | Other relevant traits identified by GWAS at this locusa . | Candidate target/closestgeneb . | Evidence for targeting . | Involvement in relevant pathways from network analysis . | Studies reporting endometrial cancer risk association . |
---|---|---|---|---|---|
1p34.3 | Cancer: ovarian | CDCA8 | eQTL (24) | - | (24) |
2p16.1 | Cancer: Hodgkin's lymphoma | BCL11A | - | - | (24) |
6q22.3 | Cancer: melanoma, neuroblastoma | SOX4, CASC15 | - | - | (14, 24) |
Anthropometric: height, BMI | |||||
Steroid hormone–related: bone mineral density | |||||
6q22.31 | Cancer: bladder | HEY2 | Bioinformatic prediction (15) | Notch signaling | (14, 15, 24) |
Anthropometric: height, hip circumference | NCOA7 | Bioinformatic prediction (15) | Endometrial cancer signaling | ||
Steroid hormone–related: Age of menarche | |||||
8q24.21 | Cancer: acute lymphoblastic leukemia, breast, diffuse large B-cell lymphoma, follicular lymphoma, glioma, Hodgkin's lymphoma, ovarian, pancreatic | MIR1204 | Bioinformatic prediction (15) | - | (15, 24) |
Anthropometric: height | MIR1205 | Bioinformatic prediction (15) | - | ||
MIR1207 | Bioinformatic prediction (15) | - | |||
MIR1208 | Bioinformatic prediction (15) | - | |||
MYC | Chromatin looping; bioinformatic | Molecular mechanisms of cancer | |||
prediction (15) | Wnt/β-catenin signaling | ||||
Estrogen-mediated S-phase entry | |||||
G1–S checkpoint regulation | |||||
9p21.3 | Cancer: acute lymphoblastic leukemia, breast, chronic lymphocytic leukemia, glioma, glioblastoma, lung, melanoma, multiple myeloma, nasopharyngeal, oral cavity, prostate | CDKN2A/B | - | - | (24) |
Anthropometric: BMI | |||||
Endometriosis | |||||
11p13 | Anthropometric: BMI, waist–hip ratio | CCDC73 | eQTL (24) | - | (24) |
EIF3M | eQTL (24) | EIF2 signaling | |||
RCN1 | eQTL (24) | - | |||
TCP11L1 | eQTL (24) | - | |||
WT1-AS | eQTL (24) | - | |||
12p12.1 | Cancer: esophageal, renal cell | SSPN | - | - | (24) |
Anthropometric: waist–hip ratio | |||||
12q24.12 | Cancer: breast, colorectal, esophageal | SH2B3 | eQTL (24) | - | (24) |
Anthropometric: BMI | |||||
12q24.21 | Cancer: esophageal, laryngeal squamous cell, pancreatic, prostate | SNORA27 | - | - | (24) |
Steroid hormone–related: mammographic density | |||||
13q22.1 | Cancer: breast, chronic lymphocytic leukemia, pancreatic, prostate | KLF5 | Functional analyses; bioinformatic | Adipogenesis pathway | (14, 15, 24) |
Steroid hormone-related: age of menarche | prediction (15) | ||||
15q15.1 | Cancer: Ewing sarcoma | BMF | Bioinformatic prediction (15) | - | (15, 24) |
Anthropometric: BMI, height | GPR176 | Bioinformatic prediction (15) | - | ||
Steroid hormone–related: age of menopause | SRP14-AS1 | Bioinformatic prediction (15) | - | ||
SRP14 | Bioinformatic prediction (15) | - | |||
15q21.2 | Anthropometric: BMI, height | CYP19A1 | eQTL and association with estradiol | Estrogen-dependent breast cancer | (15, 17, 24) |
Steroid hormone–related: bone mineral density, estradiol levels, follicle-stimulating hormone | levels (17) | signaling | |||
17q11.2 | Cancer: breast, prostate | EVI2A | eQTL (24) | - | (24) |
Anthropometric: BMI, height, hip circumference, waist circumference | NF1 | eQTL (24) | Molecular mechanisms of cancer | ||
SUZ12 | eQTL (24) | - | |||
17q12 | Cancer: ovarian, pancreatic, prostate, testicular germ cell | HNF1B | Functional analyses; eQTL (16) | Pigment epithelium-derived factor | (14–16, 19, 24) |
Anthropometric: BMI, height | (PEDF) signaling | ||||
17q21.32 | Cancer: ovarian | SNX11 | eQTL (24) | - | (24) |
Anthropometric: body fat percentage, BMI, height, obesity |
Risk locus . | Other relevant traits identified by GWAS at this locusa . | Candidate target/closestgeneb . | Evidence for targeting . | Involvement in relevant pathways from network analysis . | Studies reporting endometrial cancer risk association . |
---|---|---|---|---|---|
1p34.3 | Cancer: ovarian | CDCA8 | eQTL (24) | - | (24) |
2p16.1 | Cancer: Hodgkin's lymphoma | BCL11A | - | - | (24) |
6q22.3 | Cancer: melanoma, neuroblastoma | SOX4, CASC15 | - | - | (14, 24) |
Anthropometric: height, BMI | |||||
Steroid hormone–related: bone mineral density | |||||
6q22.31 | Cancer: bladder | HEY2 | Bioinformatic prediction (15) | Notch signaling | (14, 15, 24) |
Anthropometric: height, hip circumference | NCOA7 | Bioinformatic prediction (15) | Endometrial cancer signaling | ||
Steroid hormone–related: Age of menarche | |||||
8q24.21 | Cancer: acute lymphoblastic leukemia, breast, diffuse large B-cell lymphoma, follicular lymphoma, glioma, Hodgkin's lymphoma, ovarian, pancreatic | MIR1204 | Bioinformatic prediction (15) | - | (15, 24) |
Anthropometric: height | MIR1205 | Bioinformatic prediction (15) | - | ||
MIR1207 | Bioinformatic prediction (15) | - | |||
MIR1208 | Bioinformatic prediction (15) | - | |||
MYC | Chromatin looping; bioinformatic | Molecular mechanisms of cancer | |||
prediction (15) | Wnt/β-catenin signaling | ||||
Estrogen-mediated S-phase entry | |||||
G1–S checkpoint regulation | |||||
9p21.3 | Cancer: acute lymphoblastic leukemia, breast, chronic lymphocytic leukemia, glioma, glioblastoma, lung, melanoma, multiple myeloma, nasopharyngeal, oral cavity, prostate | CDKN2A/B | - | - | (24) |
Anthropometric: BMI | |||||
Endometriosis | |||||
11p13 | Anthropometric: BMI, waist–hip ratio | CCDC73 | eQTL (24) | - | (24) |
EIF3M | eQTL (24) | EIF2 signaling | |||
RCN1 | eQTL (24) | - | |||
TCP11L1 | eQTL (24) | - | |||
WT1-AS | eQTL (24) | - | |||
12p12.1 | Cancer: esophageal, renal cell | SSPN | - | - | (24) |
Anthropometric: waist–hip ratio | |||||
12q24.12 | Cancer: breast, colorectal, esophageal | SH2B3 | eQTL (24) | - | (24) |
Anthropometric: BMI | |||||
12q24.21 | Cancer: esophageal, laryngeal squamous cell, pancreatic, prostate | SNORA27 | - | - | (24) |
Steroid hormone–related: mammographic density | |||||
13q22.1 | Cancer: breast, chronic lymphocytic leukemia, pancreatic, prostate | KLF5 | Functional analyses; bioinformatic | Adipogenesis pathway | (14, 15, 24) |
Steroid hormone-related: age of menarche | prediction (15) | ||||
15q15.1 | Cancer: Ewing sarcoma | BMF | Bioinformatic prediction (15) | - | (15, 24) |
Anthropometric: BMI, height | GPR176 | Bioinformatic prediction (15) | - | ||
Steroid hormone–related: age of menopause | SRP14-AS1 | Bioinformatic prediction (15) | - | ||
SRP14 | Bioinformatic prediction (15) | - | |||
15q21.2 | Anthropometric: BMI, height | CYP19A1 | eQTL and association with estradiol | Estrogen-dependent breast cancer | (15, 17, 24) |
Steroid hormone–related: bone mineral density, estradiol levels, follicle-stimulating hormone | levels (17) | signaling | |||
17q11.2 | Cancer: breast, prostate | EVI2A | eQTL (24) | - | (24) |
Anthropometric: BMI, height, hip circumference, waist circumference | NF1 | eQTL (24) | Molecular mechanisms of cancer | ||
SUZ12 | eQTL (24) | - | |||
17q12 | Cancer: ovarian, pancreatic, prostate, testicular germ cell | HNF1B | Functional analyses; eQTL (16) | Pigment epithelium-derived factor | (14–16, 19, 24) |
Anthropometric: BMI, height | (PEDF) signaling | ||||
17q21.32 | Cancer: ovarian | SNX11 | eQTL (24) | - | (24) |
Anthropometric: body fat percentage, BMI, height, obesity |
aFrom GWAS catalog (https://www.ebi.ac.uk/gwas/home), accessed May 2018.
bBolded genes have evidence for being a candidate target gene, and unbolded genes are closest genes.
The identification of additional endometrial cancer genetic risk regions has required collaboration and data sharing in order to achieve the sample sizes (and hence statistical power) required for the identification of variants with modest effect sizes. To this end, the Endometrial Cancer Association Consortium (ECAC) was established to conduct large-scale meta-analyses of GWAS data. ECAC is an ongoing collaboration which currently includes 12 research groups, based in Europe, Australia, and the USA, providing access to genetic and other information for more than 9,000 endometrial cancer cases of European ancestry.
Around the time that ECAC was established, a custom genotyping array (“iCOGS”) was designed to test genetic associations with the risk of hormone-related cancers. The content of the iCOGS ∼200K SNP genotyping array (21) is enriched for genetic variants with some prior evidence of association with one or more hormonal cancer, including variants selected on the basis of the first endometrial cancer GWAS. The genotyping of 4,402 ECAC cases on the iCOGS genotyping array, combined with previous GWAS data, led to the identification of five novel endometrial cancer risk regions (Table 1; ref. 15) and allowed for fine-mapping studies of the known signals at HNF1B (16) and CYP19A1 (17) risk regions (the later had been previously identified in several candidate gene studies; e.g., ref. 22).
Recently, ECAC genotyped an additional 2,689 endometrial cancer cases using the OncoArray array (23). Along with data from the iCOGS projects, E2C2, the first release of the UK Biobank, and a new GWAS conducted by the Women's Health Initiative, this resulted in the largest endometrial cancer meta-GWAS to date (12,906 cases), enabling us to identify a further nine genetic regions associated with endometrial cancer risk (24). One previously identified region, near AKT1 on 14q32 (15), was not replicated by this analysis at a genome-wide level of significance, bringing the total number of established genetic risk regions for endometrial cancer to 16.
We have estimated that common genetic variants of the type that can be tagged by standard GWAS arrays potentially account for approximately 28% of the familial relative risk of endometrial cancer (24), and that the 16 risk variants identified to date account for approximately one quarter of this figure, suggesting that many more genetic risk variants remain to be found.
One limitation of the endometrial cancer GWAS conducted to date is that they have been almost exclusively restricted to European-ancestry populations. An early GWAS of 832 Chinese endometrial cancer cases did not find any associations at the GWAS significance threshold, likely due to its small sample size and the necessity of using mostly European-ancestry cases in the replication stages (25) because of a lack of genotyped East-Asian ancestry cases. The expansion of adequately powered endometrial cancer GWAS to wider populations is therefore a priority, as indeed it is for all types of cancer, in order to identify the risk variants which are most relevant to women with different ethnic backgrounds (26).
A second limitation is that the very small numbers of genotyped endometrial cancer cases with nonendometrioid histologies (e.g., only 434 of the 12,906 in the recent ECAC meta-analysis were of serous/mixed-serous histology, the most common nonendometrioid histology) have precluded meaningful subtype-specific analyses of the type which have proved fruitful in the study of ovarian cancer susceptibility (27). Although there is currently no evidence for a difference in genetic architecture between endometrial cancer subtypes, the limited data currently available for nonendometrioid histologies do not allow for well-powered analyses of these subtypes. Thus, GWAS or sequencing studies using additional cases of rarer endometrial cancer subtypes are needed to increase statistical power, especially if subtypes can be analyzed separately to provide cleaner histologic phenotyping.
Mendelian Randomization Studies of Endometrial Cancer Risk Factors
Aside from genetic variants, numerous other factors have been reported as being associated with endometrial cancer risk, but observational studies alone are not always able to distinguish true, causal associations from artifactual associations caused by confounding or reverse causality. Mendelian randomization uses genetic variants known to be associated with a putative risk factor as “instruments” in an instrumental variable analysis, thus testing the association of the risk factor in the absence of confounding (28). The growing use of Mendelian randomization methods to examine risk factors for endometrial cancer (Table 2) has been facilitated by the success of GWAS in identifying the genetic variants associated with many of these proposed risk factors for endometrial cancer.
Mendelian randomization studies assessing the causal relationship between putative risk factors and endometrial cancer risk
Trait assessed (number of variants used inrisk score) . | Number of cases . | Number of controls . | Association results . | Comments . | Reference . |
---|---|---|---|---|---|
Type 2 Diabetes (49 variants) | 1,287 | 8,273 | 0.91 (0.79–1.04); P = 0.16 | Endometrioid endometrial cancer cases only | (30) |
Increased fasting glucose (36 variants) | 1,287 | 8,273 | 1.00 (0.67–1.50); P = 0.99 | Endometrioid endometrial cancer cases only | (30) |
Increased fasting insulin (18 variants) | 1,287 | 8,273 | 2.34 (1.06–5.14); P = 0.03 | Endometrioid endometrial cancer cases only | (30) |
Early insulin secretion (17 variants) | 1,287 | 8,273 | 1.40 (1.12–1.76); P = 0.003 | Endometrioid endometrial cancer cases only | (30) |
BMI (32 variants) | 1,287 | 8,273 | 3.86 (2.24–6.64); P = 1 × 10−6 | Endometrioid endometrial cancer cases only | (30) |
BMI (97 variants) | 3,376 | 3,867 | 1.13 (1.04–1.22); P = 0.002 | Endometrioid endometrial cancer cases only. Association did not persist after adjustment for measured BMI. | (65) |
BMI (77 variants) | 6,609 | 37,926 | 2.011 (1.94–2.28); P = 3.4 × 10−17 | All endometrial cancer cases. Remained significant after adjustment for measured BMI (OR 1.23; P = 5.3 × 10−4). | (29) |
Waist–hip ratio (34 variants) | 6,609 | 37,926 | 1.02 (0.99–1.04); P = 0.09 | All endometrial cancer cases. Waist–hip ratio variants are those that were associated with this trait among women. | (29) |
Waist–hip ratio (47 variants) | 6,609 | 37,926 | 0.97 (0.63–1.31); P = 0.86 | All endometrial cancer cases. Waist–hip ratio variants are those that were associated with this trait in men and women. | (29) |
Serum estradiol level (1 variant) | 6,608 | 37,925 | 1.15 (1.11–1.21); 4.8 × 10−11 | All endometrial cancer cases. CYP19A1 variant rs727478 used to predict serum estradiol level (10% increase per A-allele). | (17) |
Menarche (age of onset) (237 variants) | 6,609 | 37,926 | 0.78 (0.70–0.87); P = 1.0 × 10−5 | All endometrial cancer cases. Menarche variants were adjusted for genetically predicted BMI. | (32) |
Adult height (814 variants) | 12,906 | 108,979 | 1.00 (0.95–1.06); P = 0.90 | All endometrial cancer cases. | (24) |
Trait assessed (number of variants used inrisk score) . | Number of cases . | Number of controls . | Association results . | Comments . | Reference . |
---|---|---|---|---|---|
Type 2 Diabetes (49 variants) | 1,287 | 8,273 | 0.91 (0.79–1.04); P = 0.16 | Endometrioid endometrial cancer cases only | (30) |
Increased fasting glucose (36 variants) | 1,287 | 8,273 | 1.00 (0.67–1.50); P = 0.99 | Endometrioid endometrial cancer cases only | (30) |
Increased fasting insulin (18 variants) | 1,287 | 8,273 | 2.34 (1.06–5.14); P = 0.03 | Endometrioid endometrial cancer cases only | (30) |
Early insulin secretion (17 variants) | 1,287 | 8,273 | 1.40 (1.12–1.76); P = 0.003 | Endometrioid endometrial cancer cases only | (30) |
BMI (32 variants) | 1,287 | 8,273 | 3.86 (2.24–6.64); P = 1 × 10−6 | Endometrioid endometrial cancer cases only | (30) |
BMI (97 variants) | 3,376 | 3,867 | 1.13 (1.04–1.22); P = 0.002 | Endometrioid endometrial cancer cases only. Association did not persist after adjustment for measured BMI. | (65) |
BMI (77 variants) | 6,609 | 37,926 | 2.011 (1.94–2.28); P = 3.4 × 10−17 | All endometrial cancer cases. Remained significant after adjustment for measured BMI (OR 1.23; P = 5.3 × 10−4). | (29) |
Waist–hip ratio (34 variants) | 6,609 | 37,926 | 1.02 (0.99–1.04); P = 0.09 | All endometrial cancer cases. Waist–hip ratio variants are those that were associated with this trait among women. | (29) |
Waist–hip ratio (47 variants) | 6,609 | 37,926 | 0.97 (0.63–1.31); P = 0.86 | All endometrial cancer cases. Waist–hip ratio variants are those that were associated with this trait in men and women. | (29) |
Serum estradiol level (1 variant) | 6,608 | 37,925 | 1.15 (1.11–1.21); 4.8 × 10−11 | All endometrial cancer cases. CYP19A1 variant rs727478 used to predict serum estradiol level (10% increase per A-allele). | (17) |
Menarche (age of onset) (237 variants) | 6,609 | 37,926 | 0.78 (0.70–0.87); P = 1.0 × 10−5 | All endometrial cancer cases. Menarche variants were adjusted for genetically predicted BMI. | (32) |
Adult height (814 variants) | 12,906 | 108,979 | 1.00 (0.95–1.06); P = 0.90 | All endometrial cancer cases. | (24) |
Obesity is the strongest risk factor for endometrial cancer, with observational studies observing up to 8-fold increased risk between obese women (body mass index, BMI ≥ 40 kg/m2) compared with lean women (BMI < 25 kg/m2). Mendelian randomization analysis has confirmed this relationship, finding strong evidence for a relationship between obesity, as measured by BMI, but not as measured by waist–hip ratio, and endometrial cancer risk (29). An earlier Mendelian randomization study of diabetic-related traits found a significant relationship between increased insulin levels and endometrial cancer risk (30), but did not find type 2 diabetes or glucose levels to be associated with endometrial cancer, suggesting that observed associations between type 2 diabetes and endometrial cancer may be consequences of residual confounding (30).
Excessive endogenous and exogenous estrogen exposure, unopposed by progesterone, is a well-established risk factor for the development and progression of endometrial cancer (31). A Mendelian randomization analysis using the genetic variant most strongly associated with serum estradiol levels in postmenopausal women verified the relationship between postmenopausal estrogen levels and endometrial cancer (17). Further, each year of delay in menarche (which would be expected to reduce lifetime estrogen exposure) has been confirmed by Mendelian randomization as producing an approximately 12% reduction in endometrial cancer risk, even after adjusting for the effects of genetically predicted BMI (24, 32).
The status of other hypothesized risk factors for endometrial cancer, including polycystic ovary syndrome (PCOS), endometriosis, and uterine fibroids, remains unclear. The epidemiologic associations reported for these risk factors (or lack thereof) may be confounded by coexisting conditions (e.g., PCOS with infertility/anovulation) or measurement limitations (e.g., under- and misdiagnosis of endometriosis). Future Mendelian randomization studies to investigate these epidemiologic risk factors, in conjunction with results from observational studies, will provide important information that can be used clinically to identify women at risk of endometrial cancer, and to inform prevention strategies.
Pleiotropy and Cross-Disease GWAS Studies
One of the conclusions that can be drawn from the ever-growing catalog of complex-trait GWAS results is that pleiotropy is very widespread (reviewed by ref. 18). Endometrial cancer is no exception to this pattern; from the 16 endometrial cancer genetic risk regions identified to date, 14 are coincident with risk regions for other cancers (within 1 Mb), 12 regions are associated with anthropometric traits, 1 region with endometriosis, and 6 with traits associated with steroid hormone levels (as at May 2018; Table 1). Although individual risk variants at these regions are not commonly shared between endometrial cancer and these traits, it is anticipated that the different trait-associated variants regulate the same target genes (see functional follow-up section below).
Some of the pleiotropy observed with endometrial cancer has been supported by linkage disequilibrium (LD) Score regression analyses (33). Endometrial cancer GWAS summary statistics and GWAS data publicly available for 224 noncancer traits have found several BMI-related traits to be significantly genetically correlated with endometrial cancer risk (24). A similar study performed using endometriosis GWAS summary statistics revealed a significant correlation between this disease and endometrial cancer risk (rg = 0.23; P = 9.3 × 10−3; ref. 34). LD Score regression analyses between endometrial cancer and other cancer types are in progress and will likely yield intriguing results.
Given the widespread pleiotropy observed across the genome, it is not surprising that pleiotropic cross-disease GWAS meta-analysis has been used successfully to increase power and identify regions relevant to multiple diseases (35–37). A cross-cancer GWAS study of endometrial cancer with colorectal cancer identified a risk region at 12q24.12, where the most significant association is with a missense variant located in the SH2B3 gene (38). Subsequent analyses in larger cohorts have found this variant to be independently associated with the risk of both of these cancers (24, 39). Future large-scale cross-cancer GWAS meta-analyses are planned to identify genetic risk regions important for carcinogenesis across multiple tissues. Cross-disease meta-analysis of GWAS data from endometrial cancer and endometriosis identified a risk region relevant to both diseases at 9p23 within the PTPRD gene (34). Further, a subgenome-wide significant region in this cross-disease analysis (ref. 34; 12p12.1 locus rs2278868; P = 5.5 × 10−6) was subsequently identified as a risk region for endometrial cancer in a larger cohort (24). Meta-analyses of GWAS data from other relevant traits or diseases (e.g., uterine fibroids) could also provide insights into pathways relevant for endometrial cancer etiology.
Enrichment of Endometrial Cancer Risk Variants in Functional Elements
The correlation structure of common genetic variants means that the genetic variant with the most statistically significant association with disease at a particular locus in a GWAS is not necessarily causally associated with the disease—the apparent association may well be driven by a different causal variant(s) correlated through linkage disequilibrium. It is therefore conventional to think of the most significant variant at a locus as merely the “lead” or “index” variant for a wider set of correlated variants, any of which is a credible causal variant. Definitions vary, but these sets of credible causal variants are usually delineated according to the extent of the linkage disequilibrium with the lead variant, and the difference between the level of statistical significance of the lead variant and that of the candidate variants (40).
Of the credible causal variants identified from the endometrial cancer GWAS risk studies, only three identified to date are exonic. This observation is consistent with the distributions of the GWAS-identified variants for most other complex diseases (reviewed by ref. 18). The sets of credible causal risk variants identified in GWAS of other diseases are enriched for localization to active epigenetic marks, characteristic of regulatory elements, mapped from trait-relevant cells or tissues (41). Correspondingly, credible causal variants from the most recently identified nine endometrial cancer genetic risk regions demonstrated a greater enrichment in active epigenetic marks from endometrial cancer cell lines and tissues, compared with credible causal variants for related (i.e., endometriosis) or unrelated (i.e., schizophrenia) diseases (24). Also, significantly more credible causal endometrial cancer risk variants localized to active epigenetic marks from estrogen-stimulated endometrial cancer cells in comparison with such marks from unstimulated cells (24). These findings thus support the use of these epigenetic marks in identifying functional (i.e., likely causal) variants at endometrial cancer risk regions and further highlight the role of estrogen in endometrial cancer development.
Functional Follow-up Studies of Endometrial Cancer Risk GWAS
The identification of the target genes that mediate the effects of GWAS variants is an important step for clinical translation of findings, e.g., the identification of opportunities for drug repositioning (42). Experimental and/or bioinformatic studies are required for target gene identification but are very often neglected (43). Several such functional studies have been performed to date for endometrial cancer risk GWAS and are discussed below.
The approaches taken in functional follow-up studies of GWAS are, in part, determined by the locations of the credible causal variants. For example, at the 17q12 endometrial cancer risk region, three candidate causal variants are located in an extended region of the HNF1B promoter, and so reporter gene assays were performed to assess effects on promoter activity (16). In endometrial cancer cells, the risk alleles of two of the 17q12 credible causal variants were associated with enhanced HNF1B promoter activity, and the effect of one of these variants (rs11263763) was supported by an association between the risk allele of rs11263763 and increased HNF1B expression in endometrial tumors (16). It is intriguing to note that rare HNF1B variants, reported to abrogate HNF1B expression, reduce secretion of insulin which Mendelian randomization has shown to be an endometrial cancer risk factor (30). This finding is consistent with the upregulation of HNF1B promoter activity and transcription by the risk allele of rs11263763, potentially providing a mechanism for the effect of the endometrial cancer risk variation at this region through increased insulin secretion.
Due to haplotype structure, GWAS association signals often map to large genomic intervals, owing to extensive LD between common variants, making identification of likely causal variants and their target genes extremely challenging. Moreover, it is not obvious which genes may be targeted for regulation, especially as functional elements such as enhancers can regulate genes through long-range chromatin looping interactions, up to two megabases away (reviewed in ref. 44; Fig. 2). Indeed, in general, GWAS variants regulate the nearest gene only one third of the time (45, 46) and likely target multiple genes through long-range chromatin looping (47). To address these issues, bioinformatic approaches that use correlations between gene expression and epigenomic features to identify enhancers and their corresponding target genes have been applied to endometrial cancer risk loci and revealed a number of candidate target genes (Table 1; ref. 15). However, these candidate target genes still require validation by other means, as described below.
Long-range gene regulation via chromatin looping events and functional consequences of genetic variants in epigenomic features. A, Chromatin looping bringing an enhancer (characterized by active histone marks) into proximity with a gene promoter, allowing transcription factor (TF) binding and gene transcription. B, The same scenario with a variant allele (A to G change), resulting in further TF binding and increased gene expression.
Long-range gene regulation via chromatin looping events and functional consequences of genetic variants in epigenomic features. A, Chromatin looping bringing an enhancer (characterized by active histone marks) into proximity with a gene promoter, allowing transcription factor (TF) binding and gene transcription. B, The same scenario with a variant allele (A to G change), resulting in further TF binding and increased gene expression.
Long-range chromatin looping can be assessed experimentally to identify genes that may be targeted by credible causal variants through regulatory features such as enhancers. These approaches center on the chromatin conformation capture (3C) technique which identifies interacting genomic regions (reviewed in ref. 48). A 3C method was used at three endometrial cancer risk loci to identify looping between regions containing credible causal variants and the promoters of MYC (8q24.21 risk locus; ref. 49), KLF5 (13q22.1 risk locus; ref. 15), and AKT1 and ZBTB42 (the nonreplicated 14q32 risk locus; ref. 50). Notably, the interactions between credible causal variants and these four genes had been bioinformatically predicted (Table 1; ref. 15).
Looping interactions alone do not provide evidence of gene regulation. Therefore, reporter gene assays in endometrial cancer cells have been used to assess the effects of looping credible causal variants on promoter activity at endometrial cancer risk loci. The risk allele of a looping variant (rs9600103) at the 13q22.1 risk locus increased the activity of a minimal promoter (15). At the 14q32 risk locus, an allele of a looping variant (rs2494737) enhanced the activity of a canonical and alternative promoter of AKT1, but had no effect on ZBTB42 promoter activity (50). No study has yet looked at the effect of looping candidate causal risk variants on MYC promoter activity at the 8q24 risk region.
Evidence from expression quantitative trait loci (eQTL) analyses has demonstrated that GWAS variants are enriched for variants that associate with gene expression (51, 52), indicating that these variants likely affect gene regulation and providing an approach to identify target genes at GWAS loci (52, 53). In the most recent endometrial cancer risk GWAS (24), eQTL analyses were performed using data from a variety of tissue sources to identify genes whose expression associated with risk variants from the newly identified loci (Table 1). Several of the identified eQTL genes were either tumor suppressors (NF1; ref. 54), negative regulators of oncoproteins (SH2B3; ref. 55), or oncogenes [CDCA8 (56) and WT1-AS (57)]. Consistent with these functions, risk alleles were associated with decreased expression of NF1 and SH2B3 and increased expression of CDCA8 and WT1-AS (24).
Network Analysis of Candidate Target Genes of Endometrial Cancer Risk Variants
Analysis of the 25 candidate target genes identified to date (Table 1) has revealed a network which contains 18 of these genes (24). Major network hubs were either known oncoproteins or tumor suppressors, including the protein encoded by the candidate target gene MYC and proteins encoded by genes that are somatically mutated in endometrial cancer (CCND1, CTNNB1, and TP53; ref. 58). An enrichment of these and other network genes was observed in corresponding pathways such as cyclins and cell-cycle regulation, Wnt/β-catenin, and P53 signaling. Interestingly, given the role of obesity in increasing endometrial cancer risk, there was also an enrichment of network genes in an adipogenesis pathway. As further candidate target genes are identified, this network will likely be refined and additional networks revealed. Furthermore, network genes may point to additional endometrial cancer genetic risk regions yet to be identified by GWAS.
Future Functional Follow-up Approaches for Endometrial Cancer Risk GWAS
Although candidate target genes have been identified at 11 of the 17 risk regions (Table 1), it is clear that other approaches are needed to identify further candidate target genes. The transcriptome-wide association study method (45) enables gene expression to be predicted in endometrial cancer GWAS datasets [from existing studies of tissue with genotype and gene expression data, e.g., the Genotype Tissue Expression project (59) and The Cancer Genome Atlas (60)], thus allowing the testing of associations between imputed gene expression and endometrial cancer risk. In addition, a number of global 3C techniques are now available (48) to systematically assess chromatin interactions across the entire genome, rather than being restricted to single loci, and could be used to assess all endometrial cancer genetic risk regions for candidate target genes. However, these approaches still do not provide definitive evidence that credible casual variants directly affect gene expression and require additional studies to determine variant function. The final steps to validate the functionality of credible causal variants and their target genes and determine their effects on cellular phenotypes will require systems such as CRISPR/Cas9 which could be used to generate isogenic cell lines that differ by the alleles of credible causal variants (reviewed in ref. 61) or activate/inactivate chromatin encompassing these variants (reviewed in ref. 62).
Conclusions and Future Directions
Plans by ECAC to conduct a significantly expanded meta-GWAS, including imputation to more dense reference panels, are likely to yield additional endometrial cancer risk loci. Sample sizes will also be increased by conducting cross-disease meta-analysis studies of endometrial cancer and related diseases: an additional benefit of such studies is the potential to identify novel pleiotropic risk loci, and hence insights into shared underlying biology. Looking further ahead, large-scale sequencing initiatives are required for the identification of rarer variants (i.e., not tagged by standard GWAS-type arrays) associated with modest-to-high risks of endometrial cancer.
The next phase of research will be to progress identified endometrial cancer risk variants to translational and clinical outcomes for patients. Results from Mendelian randomization analyses determining causality of epidemiologic risk facts have potential to identify at-risk women for altered screening management. Another avenue for clinical translation will be the development of polygenic risk scores for endometrial cancer. Application of this approach in BRCA1/2 pathogenic variant carriers is able to predict which individuals are most likely to develop breast or ovarian cancer (63). Similarly, a polygenic risk score could be developed to predict which women from Lynch Syndrome families are most likely to be diagnosed with endometrial cancer, and who would derive the most benefit from increased screening and/or risk-reducing steps. The development of polygenic risk scores reinforces the importance of performing GWAS in non-European populations. Given the differences in allele frequencies, effect sizes, and LD patterns across ethnic groups, polygenic risk scores developed using primarily European sample sets may not be translatable to other populations. Therefore, expansion of endometrial cancer GWAS to non-European populations is an essential research priority so that comparable polygenic risk scores will be available for all ethnic groups.
A key ongoing challenge is the identification of the causal genes that mediate the effects of endometrial cancer risk variants. Although post-GWAS studies of endometrial cancer risk have identified candidate target genes at individual risk regions, endometrial cancer functional genomic data need to be generated in order to systematically assess all risk regions. Furthermore, it has not yet been experimentally shown that the regulation of candidate target genes contributes to endometrial cancer risk through effects on cellular phenotypes. Importantly, such experimental studies are necessary to spur the development of new therapies through the identification of new drug targets, or targets for which drugs already exist. Indeed, the use of a drug target with a genetic basis appears to improve the likelihood of successful drug development (64), highlighting the inherent potential for clinical translation from studies identifying the likely causal genes underlying endometrial cancer risk loci.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Acknowledgments
T.A. O'Mara is supported by a National Health and Medical Research Council (NHMRC) Early Career Fellowship (APP1111246), P.F. Kho is supported by an Australian Government Research Training Program PhD Scholarship and QIMR Berghofer Postgraduate Top-Up Scholarship, and A.B. Spurdle is supported by an NHMRC Senior Research Fellowship (APP1061779).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.