Abstract
Background: Although high-risk mutations in identified major susceptibility genes (DNA mismatch repair genes and MUTYH) account for some familial aggregation of colorectal cancer, their population prevalence and the causes of the remaining familial aggregation are not known.
Methods: We studied the families of 5,744 colorectal cancer cases (probands) recruited from population cancer registries in the United States, Canada, and Australia and screened probands for mutations in mismatch repair genes and MUTYH. We conducted modified segregation analyses using the cancer history of first-degree relatives, conditional on the proband's age at diagnosis. We estimated the prevalence of mutations in the identified genes, the prevalence of HR for unidentified major gene mutations, and the variance of the residual polygenic component.
Results: We estimated that 1 in 279 of the population carry mutations in mismatch repair genes (MLH1 = 1 in 1,946, MSH2 = 1 in 2,841, MSH6 = 1 in 758, PMS2 = 1 in 714), 1 in 45 carry mutations in MUTYH, and 1 in 504 carry mutations associated with an average 31-fold increased risk of colorectal cancer in unidentified major genes. The estimated polygenic variance was reduced by 30% to 50% after allowing for unidentified major genes and decreased from 3.3 for age <40 years to 0.5 for age ≥70 years (equivalent to sibling relative risks of 5.1 to 1.3, respectively).
Conclusions: Unidentified major genes might explain one third to one half of the missing heritability of colorectal cancer.
Impact: Our findings could aid gene discovery and development of better colorectal cancer risk prediction models. Cancer Epidemiol Biomarkers Prev; 26(3); 404–12. ©2016 AACR.
Introduction
One of the most important risk factors for colorectal cancer is having a family history of the disease. First-degree relatives of persons diagnosed with colorectal cancer are, on average, at an approximately 2-fold increased risk of colorectal cancer compared with those without a family history (familial relative risk; ref. 1). An estimated 3% to 5% of colorectal cancers are caused by high-risk mutations in the identified major colorectal cancer susceptibility genes (2), DNA mismatch repair (MMR) genes (3) and constitutional 3′ end deletions of EPCAM (4, 5) implicated in Lynch syndrome, the adenomatous polyposis coli (APC) gene implicated in familial adenomatous polyposis (6–8), and the MUTYH gene implicated in colorectal polyps and subsequently cancer (MUTYH-associated polyposis; ref. 9). Current estimates of MMR gene mutation carriers in the general population, inferred from the prevalence of mutations in cases and the risk of colorectal cancer for carriers, range widely from approximately 1 in 300 to 1 in 3,000 depending on differing assumptions and genes (10–16). With the availability of cost-effective sequencing technologies, improved precision in estimates of mutation prevalence would be useful for devising cost-effective genetic testing protocols.
Less than half of the excess risk of colorectal cancer associated with family history (familial aggregation) is explained by mutations in the above identified genes, and only two studies have attempted to explain the remainder of the familial aggregation (17, 18). Aaltonen and colleagues could not confidently distinguish between different modes of inheritance for the hypothetical unidentified major genes (17). Jenkins and colleagues estimated that 1 in 588 of the population carry major gene mutations associated with a recessively inherited risk, and these mutations would explain 15% of all colorectal cancers diagnosed before the age of 45 years (18). Both these studies relied on relatively small numbers of families and did not consider the existence of both polygenic and major genes.
While much research has been conducted on the search for other major colorectal cancer susceptibility genes in addition to those described above, only a few have been confirmed (19). Genome-wide association studies have identified at least 45 independent genetic susceptibility markers (single-nucleotide polymorphisms, SNP) that are reliably associated with small increments in the risk of developing colorectal cancer (20).
The aim of this article was to use population-based family data to estimate the prevalence of mutations in the identified major colorectal cancer susceptibility genes (MMR genes and MUTYH), the prevalence, average penetrance, and likely mode of inheritance for the unidentified major gene mutations, and the variance of the residual polygenic component before and after allowing for different major gene scenarios.
Materials and Methods
Sample
The sample consists of nuclear families from the Colon Cancer Family Registry that has been described in detail previously (21, 22). The current study used data for the first-degree relatives of the incident colorectal cancer cases (probands) who had been recruited irrespective of family history from state or regional population cancer registries in the United States (Washington, California, Arizona, Minnesota, Colorado, New Hampshire, North Carolina), Australia (Victoria), and Canada (Ontario) between 1997 and 2012. Families were excluded if the proband was known to have an APC mutation. Informed consent was obtained from all study participants, and the study protocol was approved by the Institutional Research Ethics Review Board at each recruiting site of the Colon Cancer Family Registry.
Data collection
Information on demographics, personal characteristics, personal and family history of cancer, cancer-screening history, history of polyps, polypectomy, and other surgeries was obtained by questionnaires from all probands at baseline recruitment, which was about 1–2 years after diagnosis of their colorectal cancer, and from all participating relatives. The questionnaires are available from the Colon Cancer Family Registry website (23). We sought confirmation of all reported cancer diagnoses and ages at diagnosis for relatives using pathology reports, medical records, cancer registry reports, and death certificates, where possible. We attempted to obtain blood or buccal samples from all participants and tumor tissue from all affected participants.
MMR gene mutation screening
All probands had their colorectal cancers tested for MMR deficiency, defined by either tumor microsatellite instability (MSI) and/or lack of MMR protein expression by IHC. Probands with a MMR-deficient tumor were screened for germline mutations in MMR genes. MLH1, MSH2, and MSH6 mutations were identified using Sanger sequencing or denaturing high-performance liquid chromatography (dHPLC), followed by confirmatory DNA sequencing. Large duplication and deletion mutations, including those involving EPCAM, which lead to MSH2 methylation, were detected by Multiplex Ligation Dependent Probe Amplification (MLPA) according to the manufacturer's instructions (MRC Holland; refs. 21, 24, 25). PMS2 mutations were identified using a modified protocol from Senter and colleagues (26) where exons 1–5, 9, and 11–15 were amplified in three long-range PCRs followed by nested exon-specific PCR/sequencing. The remaining exons (6, 7, 8, and 10) were amplified and sequenced directly from genomic DNA. Large-scale deletions in PMS2 were detected using the P008-A1 MLPA kit according to manufacturer's specifications (MRC Holland). Germline variants were classified for pathogenicity based on 5 class system for the quantitative assessment of variant pathogenicity (27) and the application of a multifactorial likelihood model developed for MMR gene variants (28) as applied to variants cataloged within the InSiGHT database (29) where classes 4 and 5 were considered pathogenic (30). For variants not yet classified by InSiGHT, we considered a variant as pathogenic if it resulted in a stop codon, frameshift, large deletion, or if it removed a canonical splice site. The relatives of probands with a pathogenic MMR germline mutation, who provided a blood sample, underwent testing for the specific mutation identified in the proband.
MUTYH mutation testing
Population-based probands were tested for 12 previously identified MUTYH variants: c.536A>G p.(Tyr179Cys), c.1187G>A p.(Gly396Asp), c.312C>A p.(Tyr104Ter), c.821G>A p.(Arg274Gln), c.1438G>T p.(Glu480Ter), c.1171C>T p.(Gln391Ter), c.1147delC p.(Ala385ProfsTer23), c.933+3A>C p.(Gly264TrpfsX7), c.1437_1439delGGA p.(Glu480del), c.721C>T, p.(Arg241Trp), c.1227_1228dup p.(Glu410GlyfsX43), and c.1187-2A>G p.(Leu397CysfsX89) using the MassArray MALDI-TOF Mass Spectrometry (MS) system (Sequenom; ref. 31). To confirm the MUTYH mutation and identify additional mutations, screening of the entire MUTYH coding region, promoter, and splice site regions was performed on all samples exhibiting MS mobility shifts using dHLPC (Transgenomic Wave 3500HT System, Transgenomic). All MS-detected variants and WAVE mobility shifts were submitted for sequencing for mutation confirmation (ABI PRISM 3130XL Genetic Analyser). That is, if a heterozygous MUTYH mutation was identified, then the MUTYH gene was screened for any additional mutations not captured by the Sequenom genotyping screen to ensure all potential compound heterozygous carriers were identified. The relatives of probands with a pathogenic MUTYH germline mutation, who provided a blood sample, underwent testing for the specific variant identified in the proband. For the current study, MUTYH gene mutation status was recorded as monoallelic or biallelic mutation-positive or negative, with no distinction between different variants.
Statistical analysis
We used modified segregation analysis to fit a range of genetic models to the observed colorectal cancer family histories for the proband and their first-degree relatives. Individuals were assumed to be at risk of colorectal cancer from birth until the earliest of the following: diagnosis of colorectal cancer or any other cancer (except skin cancer); first polypectomy; death; and the earlier of last known age at baseline interview or age 80 years.
The colorectal cancer incidence λi(t,k) for individual i at age t in sex group k (k = 1 for males or 2 for females) was assumed to depend on genotype according to a parametric survival analysis model λi(t,k) = λ0(t,k) exp(Gi+Pi(t)), where λ0(t,k) is the sex-specific baseline incidence at age t. Gi is the natural logarithm of the relative risk associated with the major genotype and Pi(t) is the polygenic component for age t.
The major genotype was defined by six components representing each of the genes MLH1, MSH2, MSH6, PMS2, MUTYH, and one representing the hypothetical unidentified major genes. We fitted models in which the unidentified major genes were autosomal with a normal and a mutant allele unlinked to mutations in the MMR genes or MUTYH. We also fitted models in which the average relative risk for the unidentified major genes was assumed to be age dependent. We used the published age-, sex-, and country-specific incidences for MLH1 and MSH2 mutation carriers (32), and published age- and sex-specific incidences for MSH6, PMS2, and MUTYH mutation carriers (26, 33, 34).
The polygenic component for age t, Pi(t), was assumed to be normally distributed with zero mean and variance σ2p(t). P was approximated by the hypergeometric polygenic model (35, 36). We also fitted models where the variance of the polygenic “modifying” component was allowed to take a different value σ2m for MMR gene and MUTYH carriers.
To compute the baseline colorectal cancer incidence λ0(t), we constrained the overall incidence of colorectal cancer to agree with the national age- and sex-specific incidences (1998–2002) separately for Australia, Canada, and the United States (37). Other cancers were ignored in this model.
We assumed that the sensitivity of the mutation testing of probands for MMR genes and MUTYH was 80% (38), and we examined the effect of varying this sensitivity. For relatives, we assumed the mutation screening for the proband's mutation (i.e., predictive testing) was 100% sensitive and specific.
The genetic models were specified in terms of colorectal cancer incidence for MMR gene and MUTYH mutation carriers, the frequency (qA) of the putative high-risk allele “A” of the unidentified major genes component, the average relative risk of colorectal cancer for carriers of mutations in the unidentified major genes, and the variances of the polygenic and modifying components (σ2p and σ2m). Maximum likelihood estimation was used to estimate parameters. The estimates we present are the values that were the most likely (i.e., most consistent) with the data. Maximum likelihood is the optimal method for making such estimates, and provides confidence intervals (CI). We adjusted for ascertainment by maximizing the likelihood of each pedigree conditioned on the colorectal cancer status of the proband and his or her age of diagnosis (but not the mutation carrier status as this information was not known at the time of recruitment).
The relative goodness of fit for nested models was tested by the likelihood ratio test. The Akaike's Information Criterion (39) [AIC = −2 × log-likelihood + 2× (no. of parameters)] was used to assess goodness of fit between non-nested models (40).
The expected versus observed number of affected relatives under each fitted model was assessed using the Pearson χ2 goodness-of-fit statistic. The expected number of probands with MMR and MUTYH mutation carriers for families that had undergone mutation testing based on their cancer family history was computed using Bayes theorem (41). Statistical methods are described further in the Supplementary Data.
Results
A total of 5,744 families were eligible for inclusion, including 37,634 first-degree relatives of probands of whom 50% were female and 806 (2%) had been diagnosed with colorectal cancer (Table 1). Nearly two-thirds of the families were recruited from the United States (63%), with 16% and 21% of families recruited from Australia and Canada, respectively. Seventy-three percent of the probands were Caucasian, whereas the rest were African American (17%), Asian (6%), Latino (1%), Native American (1%), and unknown (2%).
. | All . | Australia . | United States . | Canada . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Relative of proband . | Total no. . | No. of CRC affected (%) . | Mean age at CRC diagnosis (SD) . | Total no. . | No. of CRC affected (%) . | Mean age at CRC diagnosis (SD) . | Total no. . | No. of CRC affected (%) . | Mean age at CRC diagnosis (SD) . | Total no. . | No. of CRC affected (%) . | Mean age at CRC diagnosis (SD) . |
Proband | 5,744 | 5,744 (100) | 52.5 (11.6) | 911 | 911 (100) | 45.8 (8.0) | 3,626 | 3,626 (100) | 54.7 (11.8) | 1,207 | 1,207 (100) | 50.7 (10.9) |
Father | 5,737 | 305 (5) | 61.6 (11.0) | 911 | 68 (7) | 61.3 (12.2) | 3,626 | 164 (5) | 61.9 (10.8) | 1,200a | 73 (6) | 61.3 (10.5) |
Mother | 5,737 | 234 (4) | 61.5 (12.1) | 911 | 48 (5) | 61.7 (11.1) | 3,626 | 142 (4) | 62.2 (12.4) | 1,200a | 44 (4) | 59.2 (12.0) |
Sibling | 15,095 | 255 (2) | 56.0 (13.3) | 2,228 | 26 (1) | 47.2 (14.1) | 9,437 | 183 (2) | 57.3 (12.4) | 3,430 | 46 (1) | 55.6 (14.4) |
Offspring | 11,065 | 12 (0.1) | 40.3 (14.4) | 1,772 | 2 (0.1) | 23.0 (8.5) | 6,884 | 8 (0.1) | 46.9 (11.0) | 2,409 | 2 (0.1) | 31.5 (16.3) |
. | All . | Australia . | United States . | Canada . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Relative of proband . | Total no. . | No. of CRC affected (%) . | Mean age at CRC diagnosis (SD) . | Total no. . | No. of CRC affected (%) . | Mean age at CRC diagnosis (SD) . | Total no. . | No. of CRC affected (%) . | Mean age at CRC diagnosis (SD) . | Total no. . | No. of CRC affected (%) . | Mean age at CRC diagnosis (SD) . |
Proband | 5,744 | 5,744 (100) | 52.5 (11.6) | 911 | 911 (100) | 45.8 (8.0) | 3,626 | 3,626 (100) | 54.7 (11.8) | 1,207 | 1,207 (100) | 50.7 (10.9) |
Father | 5,737 | 305 (5) | 61.6 (11.0) | 911 | 68 (7) | 61.3 (12.2) | 3,626 | 164 (5) | 61.9 (10.8) | 1,200a | 73 (6) | 61.3 (10.5) |
Mother | 5,737 | 234 (4) | 61.5 (12.1) | 911 | 48 (5) | 61.7 (11.1) | 3,626 | 142 (4) | 62.2 (12.4) | 1,200a | 44 (4) | 59.2 (12.0) |
Sibling | 15,095 | 255 (2) | 56.0 (13.3) | 2,228 | 26 (1) | 47.2 (14.1) | 9,437 | 183 (2) | 57.3 (12.4) | 3,430 | 46 (1) | 55.6 (14.4) |
Offspring | 11,065 | 12 (0.1) | 40.3 (14.4) | 1,772 | 2 (0.1) | 23.0 (8.5) | 6,884 | 8 (0.1) | 46.9 (11.0) | 2,409 | 2 (0.1) | 31.5 (16.3) |
Abbreviation: CRC, colorectal cancer.
a7 probands had no data for father and mother.
Approximately 7% of all probands (N = 386) had been found to have a MMR-deficient colorectal tumor and therefore had been screened for germline mutations in the MMR genes, while two-thirds of all probands (N = 3,796) had been tested for germline mutations in MUTYH. Of the probands who were screened, 136 had a MMR gene mutation (49 in MLH1, 39 in MSH2, 24 in MSH6, and 24 in PMS2) and 81 had a MUTYH mutation (63 monoallelic and 18 biallelic; Table 2). There were no EPCAM mutation carriers identified.
. | MMR gene mutation families (n = 136) . | MUTYH mutation families (n = 81) . | Noncarrier/unidentified carrier status families (n = 5528) . | ||||||
---|---|---|---|---|---|---|---|---|---|
Relative of proband . | Total no. . | No. of CRC affected (%) . | Mean age at CRC diagnosis (SD) . | Total no. . | No. of CRC affected (%) . | Mean age at CRC diagnosis (SD) . | Total no. . | No. of CRC affected (%) . | Mean age at CRC diagnosis (SD) . |
Proband | 136 | 136 (100) | 42.9 (10.5) | 81 | 81 (100) | 50.1 (12.3) | 5,528 | 5,528 (100) | 52.7 (11.5) |
Father | 136 | 26 (19) | 49.0 (14.4) | 81 | 8 (10) | 67.8 (7.0) | 5,501a | 271 (5) | 62.7 (10.0) |
Mother | 136 | 16 (12) | 51.4 (12.6) | 81 | 0 (0) | — | 5,501a | 218 (4) | 62.3 (11.7) |
Sibling | 375 | 27 (8) | 41.7 (11.5) | 181 | 4 (2) | 63.3 (9.9) | 14,494 | 224 (2) | 57.6 (12.5) |
Offspring | 207 | 0 (0) | — | 150 | 0 (0) | — | 10,665 | 12 (0.1) | 40.3 (14.4) |
. | MMR gene mutation families (n = 136) . | MUTYH mutation families (n = 81) . | Noncarrier/unidentified carrier status families (n = 5528) . | ||||||
---|---|---|---|---|---|---|---|---|---|
Relative of proband . | Total no. . | No. of CRC affected (%) . | Mean age at CRC diagnosis (SD) . | Total no. . | No. of CRC affected (%) . | Mean age at CRC diagnosis (SD) . | Total no. . | No. of CRC affected (%) . | Mean age at CRC diagnosis (SD) . |
Proband | 136 | 136 (100) | 42.9 (10.5) | 81 | 81 (100) | 50.1 (12.3) | 5,528 | 5,528 (100) | 52.7 (11.5) |
Father | 136 | 26 (19) | 49.0 (14.4) | 81 | 8 (10) | 67.8 (7.0) | 5,501a | 271 (5) | 62.7 (10.0) |
Mother | 136 | 16 (12) | 51.4 (12.6) | 81 | 0 (0) | — | 5,501a | 218 (4) | 62.3 (11.7) |
Sibling | 375 | 27 (8) | 41.7 (11.5) | 181 | 4 (2) | 63.3 (9.9) | 14,494 | 224 (2) | 57.6 (12.5) |
Offspring | 207 | 0 (0) | — | 150 | 0 (0) | — | 10,665 | 12 (0.1) | 40.3 (14.4) |
NOTE: One proband had both an MMR gene and a monoallelic MUTYH germline mutation.
Abbreviation: CRC, colorectal cancer.
a7 probands had no data for father and mother.
All seven models that incorporated a polygenic component and the hypothetical unidentified major genes provided significantly better fits than the model that included only MMR gene and MUTYH mutation carriers (all P < 0.001; Supplementary Table S1). The mixed dominant model was essentially identical to a mixed codominant model in terms of fit (likelihood ratio test, P = 0.94), but was more parsimonious given it used less parameters. All other models were rejected when compared with the mixed codominant model (likelihood ratio test, all P < 0.001).
When we allowed the polygenic variance to vary by age, the mixed dominant model for the unidentified major genes was the most parsimonious (i.e., had the lowest AIC) compared with all other models fitted (Table 3). Under this model, we estimated 0.19% (95% CI, 0.04–1.08) of the population carry mutations in unidentified major genes, and these are associated with on average a 31-fold (95% CI, 12–83) increased risk of colorectal cancer. The estimated variance of the polygenic component was 3.28 for age <40 years, 0.92 for age 40–49 years, 0.46 for age 50–59 years, 0.79 for age 60–69 years, and 0.52 for age ≥70 years. The proportion of familial variance after adjusting for the identified major genes explained by the unidentified major genes was 13%, 54%, 58%, 33%, and 36% for ages <40, 40–49, 50–59, 60–69, and ≥70 years, respectively (Fig. 1). The estimated population carrier frequency for mutations in MLH1, MSH2, MSH6, PMS2, and monoallelic and biallelic MUTYH are shown in Table 4.
Model . | No. Par . | LL . | AIC . | Pa . | qA (95% CI) . | RR Het (95% CI) . | RR Hom (95% CI) . | σ2p (<40 y) (95% CI) . | σ2p (40–49 y) (95% CI) . | σ2p (50–59 y) (95% CI) . | σ2p (60–69 y) (95% CI) . | σ2p (> = 70 y) (95% CI) . | q(MLH1) (95% CI) . | q(MSH2) (95% CI) . | q(MSH6) (95% CI) . | q(PMS2) (95% CI) . | q(MUTYH) (95% CI) . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Polygenic | 10 | -7,218.1 | 14,456.1 | 0.01 | – | – | – | 3.74 (1.47–9.51) | 2.02 (1.17–3.48) | 1.11 (0.64–1.91) | 1.19 (0.74–1.90) | 0.80 (0.42–1.54) | 0.000261(0.000198–0.000342) | 0.000181(0.000134–0.000244) | 0.000664(0.000447–0.000987) | 0.000701(0.000474–0.001047) | 0.01113(0.00950–0.01304) |
Mixed dominant | 12 | -7,212.5 | 14,449.0 | 1.0 | 0.000992(0.00018–0.00541) | 31.1 (11.6–83.4) | 31.1 (11.6–83.4) | 3.28 (1.10–9.74) | 0.93 (0.26–3.32) | 0.46 (0.12–1.81) | 0.78 (0.27–2.27) | 0.52 (0.16–1.64) | 0.000257(0.000195–0.000338) | 0.000176(0.000130–0.000238) | 0.000660(0.000444–0.000982) | 0.000701(0.000471–0.001042) | 0.01113(0.00950–0.01304) |
Mixed recessive | 12 | -7,216.1 | 14,456.2 | 0.007 | 0.151 (0.057–0.403) | 1.0 | 10.8 (3.5–33.4) | 3.28 (1.24–8.64) | 1.50 (0.70–3.21) | 0.69 (0.27–1.79) | 0.82 (0.35–1.94) | 0.64 (0.25–1.64) | 0.000261(0.000198–0.000343) | 0.000180(0.000133–0.000244) | 0.000663(0.000446–0.000985) | 0.000703(0.000473–0.001045) | 0.01109(0.00947–0.01299) |
Mixed codominant | 13 | -7,212.5 | 14,451.0 | – | 0.000992 (0.00018–0.00541) | 31.1 (11.6–83.4) | 31.1 (11.6–83.4) | 3.28(1.10–9.74) | 0.93 (0.26–3.32) | 0.46 (0.12–1.81) | 0.78(0.27–2.27) | 0.52 (0.16–1.64) | 0.000257 (0.000195–0.000338) | 0.000176 (0.000130–0.000238) | 0.000660(0.000444–0.000982) | 0.000701(0.000471–0.001042) | 0.01113(0.00950–0.01304) |
Model . | No. Par . | LL . | AIC . | Pa . | qA (95% CI) . | RR Het (95% CI) . | RR Hom (95% CI) . | σ2p (<40 y) (95% CI) . | σ2p (40–49 y) (95% CI) . | σ2p (50–59 y) (95% CI) . | σ2p (60–69 y) (95% CI) . | σ2p (> = 70 y) (95% CI) . | q(MLH1) (95% CI) . | q(MSH2) (95% CI) . | q(MSH6) (95% CI) . | q(PMS2) (95% CI) . | q(MUTYH) (95% CI) . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Polygenic | 10 | -7,218.1 | 14,456.1 | 0.01 | – | – | – | 3.74 (1.47–9.51) | 2.02 (1.17–3.48) | 1.11 (0.64–1.91) | 1.19 (0.74–1.90) | 0.80 (0.42–1.54) | 0.000261(0.000198–0.000342) | 0.000181(0.000134–0.000244) | 0.000664(0.000447–0.000987) | 0.000701(0.000474–0.001047) | 0.01113(0.00950–0.01304) |
Mixed dominant | 12 | -7,212.5 | 14,449.0 | 1.0 | 0.000992(0.00018–0.00541) | 31.1 (11.6–83.4) | 31.1 (11.6–83.4) | 3.28 (1.10–9.74) | 0.93 (0.26–3.32) | 0.46 (0.12–1.81) | 0.78 (0.27–2.27) | 0.52 (0.16–1.64) | 0.000257(0.000195–0.000338) | 0.000176(0.000130–0.000238) | 0.000660(0.000444–0.000982) | 0.000701(0.000471–0.001042) | 0.01113(0.00950–0.01304) |
Mixed recessive | 12 | -7,216.1 | 14,456.2 | 0.007 | 0.151 (0.057–0.403) | 1.0 | 10.8 (3.5–33.4) | 3.28 (1.24–8.64) | 1.50 (0.70–3.21) | 0.69 (0.27–1.79) | 0.82 (0.35–1.94) | 0.64 (0.25–1.64) | 0.000261(0.000198–0.000343) | 0.000180(0.000133–0.000244) | 0.000663(0.000446–0.000985) | 0.000703(0.000473–0.001045) | 0.01109(0.00947–0.01299) |
Mixed codominant | 13 | -7,212.5 | 14,451.0 | – | 0.000992 (0.00018–0.00541) | 31.1 (11.6–83.4) | 31.1 (11.6–83.4) | 3.28(1.10–9.74) | 0.93 (0.26–3.32) | 0.46 (0.12–1.81) | 0.78(0.27–2.27) | 0.52 (0.16–1.64) | 0.000257 (0.000195–0.000338) | 0.000176 (0.000130–0.000238) | 0.000660(0.000444–0.000982) | 0.000701(0.000471–0.001042) | 0.01113(0.00950–0.01304) |
Abbreviations: AIC, Akaile's Information Criterion; hom, homozygous; het, heterozygous, Par, number of parameters estimated in the model; RR, relative risk as compared with noncarriers; LL, log-likelihood; qA, estimated high-risk allele frequency for the unidentified major genes; q, minor allele frequency; σ2p, variance of the polygenic component; –, not applicable.
aFor all models, P value refers to the comparison with the mixed codominant model using the log-likelihood ratio test.
Gene . | % (95% CI) . | 1 in (95% CI) . |
---|---|---|
Unidentified major genes | 0.198 (0.036–1.079) | 504 (93–2,778) |
MLH1 | 0.051 (0.039–0.068) | 1,946 (1,480–2,564) |
MSH2 | 0.035 (0.026–0.048) | 2,841 (2,101–3,846) |
MLH1 or MSH2 | 0.087 (0.065–0.115) | 1,155 (868–1,539) |
MSH6 | 0.132 (0.089–0.196) | 758 (509–1,126) |
PMS2 | 0.140 (0.094–0.208) | 714 (480–1,062) |
Any MMR gene | 0.359 (0.248–0.520) | 279 (192–403) |
MUTYH monoallelic | 2.214 (1.891–2.591) | 45 (39–53) |
MUTYH biallelic | 0.012 (0.009–0.017) | 8,073 (5,881–11,080) |
Gene . | % (95% CI) . | 1 in (95% CI) . |
---|---|---|
Unidentified major genes | 0.198 (0.036–1.079) | 504 (93–2,778) |
MLH1 | 0.051 (0.039–0.068) | 1,946 (1,480–2,564) |
MSH2 | 0.035 (0.026–0.048) | 2,841 (2,101–3,846) |
MLH1 or MSH2 | 0.087 (0.065–0.115) | 1,155 (868–1,539) |
MSH6 | 0.132 (0.089–0.196) | 758 (509–1,126) |
PMS2 | 0.140 (0.094–0.208) | 714 (480–1,062) |
Any MMR gene | 0.359 (0.248–0.520) | 279 (192–403) |
MUTYH monoallelic | 2.214 (1.891–2.591) | 45 (39–53) |
MUTYH biallelic | 0.012 (0.009–0.017) | 8,073 (5,881–11,080) |
Table 5A shows the expected versus observed number of relatives of the probands, who developed colorectal cancer before age 80 years. Consistent with the AIC, the expected numbers from the mixed dominant model is closest to the observed numbers.
. | 1 parent . | 1 sibling . | 2 siblings . | 1 parent 1 sibling . | χ2 . |
---|---|---|---|---|---|
Observed | 478 | 175 | 14 | 28 | |
Expected | |||||
Polygenic | 466.9 | 189.8 | 9.6 | 21.7 | 5.3 |
Mixed dominant | 462.4 | 179.6 | 9.4 | 24.2 | 3.5 |
Mixed recessive | 451.9 | 200.1 | 10.8 | 22.4 | 7.0 |
Mixed codominant | 462.4 | 179.6 | 9.4 | 24.2 | 3.5 |
. | 1 parent . | 1 sibling . | 2 siblings . | 1 parent 1 sibling . | χ2 . |
---|---|---|---|---|---|
Observed | 478 | 175 | 14 | 28 | |
Expected | |||||
Polygenic | 466.9 | 189.8 | 9.6 | 21.7 | 5.3 |
Mixed dominant | 462.4 | 179.6 | 9.4 | 24.2 | 3.5 |
Mixed recessive | 451.9 | 200.1 | 10.8 | 22.4 | 7.0 |
Mixed codominant | 462.4 | 179.6 | 9.4 | 24.2 | 3.5 |
NOTE: The lower the χ2, the better the fit of the model. χ2 value for the difference between observed and expected number of affected relatives.
Table 5B shows the expected and observed number of probands who are mutation carriers for each MMR gene and monoallelic and biallelic MUTYH mutations. The expected numbers from the mixed dominant model with an age-dependent polygenic variance were closest to the observed numbers and had the lowest χ2 compared with other models. In general, all the models closely predicted the number of mutation carriers.
. | MLH1 . | MSH2 . | MSH6 . | PMS2 . | MUTYH biallelic . | MUTYH monoallelic . | χ2 . |
---|---|---|---|---|---|---|---|
Number of families | 3319 | 3319 | 3319 | 3319 | 3796 | 3796 | |
Observed | 49 | 39 | 24 | 24 | 18 | 63 | |
Expected | |||||||
Polygenic | 49.3 | 43.8 | 24.9 | 24.9 | 18.3 | 66.6 | 0.8 |
Mixed dominant | 48.7 | 42.5 | 24.7 | 24.6 | 18.2 | 66.6 | 0.5 |
Mixed recessive | 49.4 | 43.9 | 24.7 | 24.7 | 17.9 | 66.3 | 0.8 |
Mixed codominant | 48.7 | 42.5 | 24.7 | 24.6 | 18.2 | 66.6 | 0.5 |
. | MLH1 . | MSH2 . | MSH6 . | PMS2 . | MUTYH biallelic . | MUTYH monoallelic . | χ2 . |
---|---|---|---|---|---|---|---|
Number of families | 3319 | 3319 | 3319 | 3319 | 3796 | 3796 | |
Observed | 49 | 39 | 24 | 24 | 18 | 63 | |
Expected | |||||||
Polygenic | 49.3 | 43.8 | 24.9 | 24.9 | 18.3 | 66.6 | 0.8 |
Mixed dominant | 48.7 | 42.5 | 24.7 | 24.6 | 18.2 | 66.6 | 0.5 |
Mixed recessive | 49.4 | 43.9 | 24.7 | 24.7 | 17.9 | 66.3 | 0.8 |
Mixed codominant | 48.7 | 42.5 | 24.7 | 24.6 | 18.2 | 66.6 | 0.5 |
NOTE: The lower the χ2, the better the fit of the model. χ2 value for the difference between observed and expected number of mutation carriers.
In all the fitted models above, the sensitivity of mutation testing was fixed at 0.80. When we refitted the models assuming the sensitivity was 0.90, the impact was negligible. Model estimates were virtually identical when the unidentified major genes were fitted as a separate locus to the MMR mutations and MUTYH (not shown).
Results were not materially different when we restricted analyses to Caucasian families (not shown). The relative risks for the unidentified major genes did not vary appreciably by age in the major gene models (not shown). There was virtually no evidence of a difference between the size of the polygenic variance for noncarriers σ2p and the modifying variance σ2m for any of the models (not shown).
Discussion
We have used a large population-based family dataset from the Colon Cancer Family Registry, and existing penetrance estimates, to produce new estimates of the population prevalence of high-risk mutations in the identified major susceptibility genes for colorectal cancer: the DNA MMR genes and MUTYH. We estimated that 1 in 279 (95% CI, 192–403) of the population carry mutations in mismatch repair genes (MLH1 = 1 in 1,946, MSH2 = 1 in 2,841, MSH6 = 1 in 758, PMS2 = 1 in 714), and 1 in 45 carry mutations in MUTYH.
Previously, researchers have inferred these carrier frequencies from the carrier frequency for cases, risk for the general population, and risk for mutation carriers (Supplementary Table S2; refs. 10–16). None, except those estimated by Song and colleagues (16), were gene specific. Previous estimates of population carrier frequencies for the four MMR mutations combined (or MLH1 and MSH2 mutations combined) were similar to our estimates, except for those obtained by Dunlop and colleagues (11). This discrepancy might be explained by different screening methods, and that knowledge about which mutations are truly pathogenic has improved substantially over time (30). For MUTYH mutations, a systematic review and meta-analysis estimated the population carrier frequency of monoallelic MUTYH mutations to be 1 in 60 and biallelic MUTYH mutations to be approximately 1 in 7,000, similar to our estimates (42).
We then sought to explain the residual familial aggregation of this disease. We considered a polygenic component that proposes there are multiple independent loci, and across loci and at each locus, the alleles have a multiplicative effect on risk. We also considered the existence of one or more unidentified major genes (genes for which there are mutations associated with a high risk of colorectal cancer), and allowed for different modes of disease inheritance (dominant, recessive, and codominant).
We found evidence that there exist as yet unidentified major colorectal cancer susceptibility genes, and their mode of inheritance was most likely dominant (although this does not necessarily mean that they were all dominant). It is important to note that the apparent dominant component might also reflect missed mutations in MMR genes, MUTYH, or APC because the mutation screening techniques used were not 100% sensitive and not all probands had been screened. We estimated that 1 in 504 (95% CI, 93—2,778) of the population carry unidentified mutations associated with an average 31-fold increased risk of colorectal cancer. The estimated polygenic variance was reduced by 30%–50% after allowing for these unidentified major genes, after which it decreased from 3.3 for age <40 years to 0.5 for age ≥70 years (equivalent to sibling relative risks of 5.1 to 1.3, respectively).
The term “missing heritability” has been variously defined over the last decade to refer to the fact that not all the causes of familial aggregation, or of familial aggregation considered to be due to genetic factors, have been found (43). The latter has been addressed by assuming an all-or-nothing unmeasured liability model that makes untestable assumptions (44). For the purposes of discussion here, we assume that heritability encapsulates both genetic and nongenetic causes of familial aggregation. In this regard, it is plausible for common cancers that nontrivial heritability is due to nongenetic factors (45). In this article, we have fitted a polygenic component to capture familial aggregation not explained by the major genes. It is based on an underlying genetic model of Fisher (1918; ref. 46), but given we are studying nuclear families it also represents nongenetic familial factors. That is, although it is labeled polygenic, it could also reflect the effect of environmental and lifestyle factors shared by first-degree relatives. Given that the familial variance is proportional to the log of the relative risk attributable to the familial component, the unidentified major genes might explain one-third to one-half of the missing heritability of colorectal cancer across the ages of 40 to 70 years.
The polygenic component will also capture the currently identified, and as yet unidentified, common SNPs associated with colorectal cancer risk. For example, the current 45 independent susceptibility SNPs explain 22% of familial aggregation (20). It is likely this proportion will increase as larger studies are conducted, such as the OncoArray initiative, and as more informative statistical strategies are used to devise risk-prediction SNP-based scores other than the current highly conservative paradigm of considering each SNP individually and applying stringent penalties for multiple testing. The common SNPs identified to date are not necessarily causal, and they could also be tagging rare causal variants (as was the case for HOXB13 and prostate cancer; ref. 47).
Our analyses suggest a role for rare variants in as yet undiscovered susceptibility genes associated with high risk. Individually, they could be very rare, and difficult to discover. One recent attempt to resolve this issue was a whole-exome sequencing study that identified some high-risk mutations in candidate susceptibility genes such as POT1, POLE2, and MRE11 (19). The authors concluded that the study “probably discounts the existence of further major high-penetrance susceptibility genes, which individually account for >1% of the familial risk.” Therefore, if both their and our findings are correct, there are likely to be perhaps hundreds of major genes each contributing little to the missing heritability. As well as sample size, the authors recognized that restriction to exomes limited their ability to identify pathogenic mutations outside of transcribed regions, and that targeted capture is insufficiently sensitive to detected copy number variation. We, therefore, agree with the authors in their conclusion that there is a need for very large-scale sequencing studies that would benefit from including highly informative families.
Strengths of our study include a large number of families ascertained regardless of a family history, standardized questionnaires, and protocols used by the Colon Cancer Family Registry, and sophisticated statistical techniques that properly adjust for ascertainment and account for residual familial aggregation of disease (thereby avoiding bias). We also used a systematic approach for screening and testing of germline mutations in both MMR genes and MUTYH.
When predicting the number of relatives with colorectal cancer, we did not differentiate family history of colorectal cancer in terms of tumor location within the bowel. This approach was supported by findings from a large study in Utah, which reported similarly elevated risks of colorectal cancer associated with a family history of colorectal cancer regardless of tumor location (proximal colon, distal colon, and rectum) (48).
The response of the population-based probands approached to participate was 72% (49). MMR gene and MUTYH mutation carriers have both been associated with better colorectal cancer survival than noncarriers (50–52). Therefore, if probands with better prognosis are more likely to participate in the study, survivor bias could potentially lead to an overestimation of the mutation frequency. Data on participation differences by prognostic characteristics were not available to assess this.
A potential limitation of our study is inaccurate reporting of family colorectal cancer history. Of the 806 colorectal cancer diagnoses reported by first-degree relatives, 26% were confirmed by pathology reports, clinic records, or cancer registries. Previous studies have found reported colorectal cancer history in first-degree relatives to be reasonably accurate (85%–90% agreement; ref. 53) so even though the colorectal cancer diagnoses in relatives were not confirmed, it is unlikely to have a great impact on our results.
Another potential limitation of our study is the reliance on external estimates of colorectal cancer relative risks for carriers of MMR gene and MUTYH mutations. To help mitigate this weakness, we used estimates based on the largest studies available, and all used data from the same source, the Colon Cancer Family Registry (26, 32–34). Future studies should focus on incorporating the explicit effects of other colorectal cancer susceptibility genes such as STK11 (54), BMPR1A (55), SMAD4, PTEN (56), POLE, and POLD1 (57) as well as the explicit effects of identified common low-risk alleles (20). In addition to colorectal cancer risk, it is known that MMR gene mutations increase the risks of other cancers such as endometrial and ovarian cancer (58). Our analyses can be extended to incorporate such information.
The polygenic variance describes the range of familial risk across a population at a given age. For example, given the estimated variances by age for the mixed dominant model, the familial relative risk was 5.1, 1.6, 1.3, 1.5, and 1.3 for ages <40, 40–49, 50–59, 60–69, and ≥70 years, respectively. Although we found no evidence that the polygenic effects differed for carriers of a MMR gene mutation compared with noncarriers, this does not imply that they are due to the same variants. Some studies have shown that the common genetic variants identified through GWAS to be associated with the risk for the general population are not relevant for MMR gene mutation carriers (59). If future studies identify specific genetic modifiers of colorectal cancer risk for MMR gene or MUTYH mutation carriers, it should be possible to extend the current analyses to allow for this level of complexity.
In conclusion, we have used a large population-based family study to estimate the prevalence of mutations in the identified major colorectal cancer susceptibility genes, as well as the prevalence and relative risk of yet-to-be-discovered, high-risk susceptibility genes. This is an essential step in the development of a high-quality risk prediction model for colorectal cancer and is a major clinical and public health goal. Subsequently, screening programs can be optimized at an individual level to attain maximum benefit; however, that may be defined. This study also provides a guidepost for future new gene discovery research and will justify, and guide, the use of next-generation sequencing to find these genes. The results show that our current understanding of hereditary predisposition to colorectal cancer is incomplete and supports the existence of yet undiscovered rare but highly penetrant mutations, while also underscoring that the polygenic component is still largely unresolved.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Disclaimer
The content of this manuscript does not necessarily reflect the views or policies of the National Cancer Institute or any of the collaborating centers in the CFRs, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government or the CFR. Authors had full responsibility for the design of the study, the collection of the data, the analysis and interpretation of the data, the decision to submit the manuscript for publication, and the writing of the manuscript. The ideas and opinions expressed herein are those of the author(s) and endorsement by the State of California, Department of Public Health the National Cancer Institute, and the Centers for Disease Control and Prevention or their contractors and subcontractors is not intended nor should be inferred.
Authors' Contributions
Conception and design: A.K. Win, M.A. Jenkins, G.G. Giles, J.L. Hopper, R.J. MacInnis
Development of methodology: A.K. Win, M.A. Jenkins, A.C. Antoniou, A. Lee, J.L. Hopper, R.J. MacInnis
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): A.K. Win, M.A. Jenkins, D.D. Buchanan, M. Clendenning, C. Rosty, D.J. Ahnen, S.N. Thibodeau, G. Casey, S. Gallinger, L. Le Marchand, R.W. Haile, J.D. Potter, N.M. Lindor, P.A. Newcomb, J.L. Hopper
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): A.K. Win, M.A. Jenkins, J.G. Dowty, A.C. Antoniou, A. Lee, M. Clendenning, Y. Zheng, J.L. Hopper, R.J. MacInnis
Writing, review, and/or revision of the manuscript: A.K. Win, M.A. Jenkins, J.G. Dowty, A.C. Antoniou, G.G. Giles, D.D. Buchanan, M. Clendenning, C. Rosty, D.J. Ahnen, G. Casey, L. Le Marchand, R.W. Haile, J.D. Potter, N.M. Lindor, P.A. Newcomb, J.L. Hopper, R.J. MacInnis
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): S. Gallinger, N.M. Lindor, P.A. Newcomb
Study supervision: M.A. Jenkins, J.L. Hopper, R.J. MacInnis
Other (software development; implementation of analysis): A. Lee
Acknowledgments
The authors thank all study participants of the Colon Cancer Family Registry and staff for their contributions to this project. We also thank Associate Professor James McCaw for use of his UNIX computer cluster.
Grant Support
This work was supported by grant UM1 CA167551 from the National Cancer Institute, NIH, and through cooperative agreements with the following Colon Cancer Family Registry (CCFR) centers: Australasian Colorectal Cancer Family Registry (U01/U24 CA097735), Mayo Clinic Cooperative Family Registry for Colon Cancer Studies (U01/U24 CA074800), Ontario Familial Colorectal Cancer Registry (U01/U24 CA074783), Seattle Colorectal Cancer Family Registry (U01/U24 CA074794), and USC Consortium Colorectal Cancer Family Registry (U01/U24 CA074799). Seattle CCFR research was also supported by the Cancer Surveillance System of the Fred Hutchinson Cancer Research Center, which was funded by contract numbers N01-CN-67009 (1996–2003) and N01-PC-35142 (2003–2010) and contract no. HHSN2612013000121(2010–2017) from the Surveillance, Epidemiology and End Results (SEER) Program of the National Cancer Institute with additional support from the Fred Hutchinson Cancer Research Center. The collection of cancer incidence data used in this study was supported by the California Department of Public Health as part of the statewide cancer reporting program mandated by California Health and Safety Code Section 103885; the National Cancer Institute's Surveillance, Epidemiology and End Results Program under contract HHSN261201000035C awarded to the University of Southern California, and contract HHSN261201000034C awarded to the Public Health Institute; and the Centers for Disease Control and Prevention's National Program of Cancer Registries, under agreement U58DP003862-01 awarded to the California Department of Public Health. P.A. Newcomb, M.A. Jenkins, J.G. Dowty, J.L. Hopper, N.M. Lindor, R.J. MacInnis, and Y. Zheng received support for this study by grant R01CA170122 from NIH. M.A. Jenkins, J.L. Hopper, and G.G. Giles received further support from Centre for Research Excellence grant APP1042021 and program grant APP1074383 from National Health and Medical Research Council (NHMRC), Australia. A.K. Win is a NHMRC Early Career Fellow. M.A. Jenkins is an NHMRC Senior Research Fellow. J.L. Hopper is an NHMRC Senior Principal Research Fellow. D.D. Buchanan is a University of Melbourne Research at Melbourne Accelerator Program (R@MAP) Senior Research Fellow. A.C. Antoniou is a Cancer Research UK Senior Research Fellow (C12292/A11174).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.