Large germline deletions within the mismatch repair gene MSH2 account for a significant proportion (up to 20%) of all deleterious mutations of this gene which are associated with Lynch syndrome. An exons 1 to 6 deletion of MSH2, originally reported in nine families, has been associated with a founding event within the United States, which genealogic studies had previously dated to 1727, and the number of present day carriers was estimated to be 18,981. Here, we report the development of a robust multiplex PCR which has assisted in the detection of 32 new families who carry the MSH2 American Founder Mutation (AFM). By offering testing to family members, 126 carriers of the AFM have been identified. Extensive genealogic studies have connected 27 of the 41 AFM families into seven extended pedigrees. These extended families have been traced back to around the 18th century without any evidence of further convergence between them. Characterization of the genomic sequence flanking the deletion and the identification of a common disease haplotype of between 0.6 and 2.3 Mb in all probands provides evidence for a common ancestor between these extended families. The DMLE+2.2 software predicts an age of ∼500 years (95% confidence interval, 425–625) for this mutation. Taken together, these data are suggestive of an earlier founding event than was first thought, which likely occurred in a European or a Native American population. The consequences of this finding would be that the AFM is significantly more frequent in the United States than was previously predicted. [Cancer Res 2008;68(7):2145–53]

Lynch syndrome, which is also known as hereditary nonpolyposis colorectal cancer, is estimated to account for approximately one third of all heritable colon cancers and 2.8% of all colon cancers in the Western world (1).6

6

H. Hampel, W.L. Frankel, S. Ramsey, et al., unpublished observations.

This amounts to ∼4,500 cases a year in the United States alone (2). It is currently accepted that Lynch syndrome arises as a result of mutations in mismatch repair genes, giving the syndrome a dominant inheritance pattern. Four mismatch repair genes, MLH1, MSH2, PMS2, and MSH6, are typically implicated in the transmission of Lynch syndrome, with the majority of mutations having been identified in either MLH1 or MSH2.

A broad spectrum of mutations have been identified throughout these genes, but due to the detection capabilities for large rearrangements being relatively poor until recently, the majority of mutations identified to date have been point mutations or small insertion/deletions. With the development of improved techniques, such as multiplex ligation-dependent probe amplification (MLPA; ref. 3) and semiquantitative PCR (4), the number of large genomic rearrangements identified has increased considerably. One such mutation is a deletion of exons 1 to 6 of MSH2, which was initially identified in nine families from the United States (5, 6). The MSH2 American Founder Mutation (AFM) was characterized by long-range PCR and sequencing of the breakpoints, which allowed for the separation of this deletion from that of other similar, but distinct, exons 1 to 6 deletions. Subsequent genealogic studies had linked three of the nine families and traced them through 12 generations to a putative common ancestor who migrated to the United States from Germany in the early 18th century (7). Based on this information, it was calculated that 18,981 [95% confidence intervals (95% CI), 6,038–34,466] Americans carry the AFM mutation (8).

In this study, we sought to gain further insight into the origin, occurrence, and spread of the MSH2 AFM in the United States. We developed a robust multiplex PCR which can be performed using standard conditions and thus allows for the accurate identification of AFM carriers. Using this detection method, we identified a further 32 families who carry the AFM. Haplotypes flanking the mutation were characterized in 29 of these new families, as well as four of the original families (throughout this article, we refer to the original families, families A–I, as families 1–9, respectively). The use of an intraallelic coalescent model generated an estimate for the age of the mutation, which may have considerable implications for the predicted origin and prevalence of the MSH2 AFM.

Patient samples. This study was advertised to cancer genetic counselors in the United States through their e-mail listserv, their newsletter “Perspectives in Genetic Counseling,” and their research directory. In addition, four families were identified through our own population-based genetic testing study for Lynch syndrome (9).6 Free predictive genetic testing was offered to at-risk members of any family with an exons 1 to 6 deletion in MSH2 if the breakpoints were confirmed to be consistent with the AFM in return for their assistance with our genealogic analysis. Most of these patients' deletions were identified originally at an outside commercial laboratory (Myriad Genetic Laboratories, Inc. or Mayo Medical Laboratory), with the AFM-specific breakpoints being confirmed in our CLIA-approved laboratory in all but four of the cases studied. These four cases with different breakpoints were not included in this study.

Genealogic analysis. Each proband with an exons 1 to 6 deletion of MSH2 who enrolled in this study provided a copy of their pedigree as drawn by their local genetic counselor and named a family member contact person to help with the genealogy work. The family member contact was asked to review the family tree to provide as much of the following information as possible on the oldest individuals in the pedigree (generally grandparents): full names, dates of birth, places of birth, dates of death, places of death, sibling names, and parent names. In addition, where possible, we tried to ascertain from which side of the family the mutation had been inherited to help limit the amount of genealogic work necessary. This information was then used by the study genealogist (M.E.B.) to develop a tentative genealogy based on a general consensus of information7

7

Found at www.ancestry.com.

and from other pedigrees that have been posted on various Internet sites.8
8

Usually a combination of information found at either www.ancestry.com or www.familysearch.org and from privately known/researched information.

After developing a “consensus” pedigree, the genealogist then attempted to find original documentation or other verifiable information to prove or disprove each claim (especially with respect to individuals born before 1850, when the United States censuses first began to list all household members). This was accomplished using censuses, wills, marriage and death records, deeds, cemetery records, diaries, and other related information that can be accessed via the Internet to determine names, dates, and geographic locations for the generation being analyzed. This information was then verified by cross-correlation with the same information for later and/or earlier generations using standard genealogic techniques.

MLPA. MLPA was carried out according to the manufacturer's instructions (MRC-Holland9

). Briefly, 125 ng of genomic DNA was denatured and hybridized for 18 h at 60°C with the probe mix. Samples were then treated with ligase-65 at 54°C for 15 min. Subsequent PCR reactions were performed using a 6-carboxyfluorescein (FAM)–labeled primer. PCR products were resolved using an ABI3700 sequencer, and the results were analyzed with Genotyper software (Applied Biosystems). Deletions were suspected when the peak height was 60% or less than the peak height of the normal controls.

PCR-based screening and breakpoint characterization. Previously published primer sets (6) for this procedure were not ideal due to their tendency to produce false-negative results, primarily because the product was relatively large and spanned across three complete and one partial Alu repeats. For this study, we positioned the forward primer (5′-gcctggcgtcaaacgtt-3′) in a region between two repeat elements that lacked homology to other human sequences. The reverse primer (5′-tgagtcattttggggatcagtt-3′) is very similar to the F3 primer used by Wagner et al. (6); however, in combination with this new forward primer, it produces a breakpoint specific, 562-bp amplicon under standard PCR conditions. To ensure that poor DNA quality was not a reason for a negative result, this amplicon was multiplexed with an 811-bp amplicon (forward 5′-aagcatctcacctcatcctaacaca-3′, reverse 5′-ggatcacacctgccttaaattgcat-3′) from the BRAF locus, in a 15-μL PCR reaction. Each 15-μL reaction, containing 7.5 μL of GoTaq master mix (Promega), 25 ng of genomic DNA, and 5 pmol of each of the four primers, was cycled under the following conditions: 95°C for 2 min, 30 cycles of 95°C for 30 s, 60°C for 30 s, 72°C for 45 s, and a final extension at 72°C for 8 min. Samples with a breakpoint-specific product were treated with ExoSAP-IT (USB Corporation) and sequenced with the forward primer to confirm the presence of an AFM-specific breakpoint.

Diploid-to-haploid conversion. Haploid-converted clones from patient 8 were created using the conversion technology of Yan et al. (10). In brief, human lymphoblastoid cell lines were electrofused with a specifically designed mouse cell line (E2). Unfused mouse parental cells were negatively selected by sodium hypoxanthine, aminopterin, and thymidine (HAT), and unfused human lymphocytes were negatively selected by Geneticin. Hybrid cells were maintained in DMEM, including 10% fetal bovine serum, 0.5 mg/mL Geneticin, 1× HAT, and penicillin-streptomycin. To ensure that each clone contained one of the two homologues of chromosome 2, clones were typed at two polymorphic microsatellite loci (D2S2952 and D2S434).

Genotype analysis. Twelve microsatellite and seven single-nucleotide polymorphism (SNP) markers (Supplementary Table S1 and Supplementary Fig. S1) were used to obtain a haplotype spanning ∼12.5 Mb across the MSH2 locus. Markers were typed in AFM-positive probands and a set of 118 Caucasian control DNAs (Coriell Institute). SNPs were typed using standard PCR and sequencing protocols. For the microsatellite analysis, the reverse primer had an M13 tail which, in combination with a FAM-labeled M13 oligo, could be sized using an ABI7000. Each 25-μL PCR reaction contained 12.5 μL of AmpliTaq Gold master mix (Applied Biosystems), 25 ng of genomic DNA, 10 pmol of forward primer, 2 pmol of tailed reverse primer, and 10 pmol of the FAM-labeled M13 primer. Reactions were multiplexed when possible and cycled using the following profile: 96°C for 10 min, 50 cycles of 96°C for 30 s, 60°C for 30 s, 72°C for 30 s, and a final extension at 72°C for 10 min.

Estimating allele age. To make an estimate for the age of the AFM, we took two approaches, both of which use the haplotype data generated from affected individuals and data obtained from a set of unaffected controls. The first method, described by Risch et al. (11), gives a separate age estimate for each marker using the following calculation: age (in generations) = ln(LD) / ln(1 − 𝛉), wherein LD is the linkage disequilibrium index (12) and 𝛉 is the recombination fraction between the marker of interest and the mutation. Genetic distances were calculated using a conversion factor of 1.1 Mb/cM, which was obtained from a comparison of Centimorgan and megabase values for the 2p22-2p16 region (based on deCODE mapping data; ref. 13). The second method uses an intraallelic coalescent model to assess the linkage disequilibrium across the marker set as a whole. The analysis is performed using the DMLE+2.2 software (14).10

10

Freely available from www.dmle.org.

In addition to the genotype data, marker locations, population growth rates, and an estimate for the proportion of disease bearing chromosomes being analyzed are used by the software. To estimate the proportion of disease-bearing chromosomes being studied, we used the following data, 6% lifetime risk for CRC (American Cancer Society),11 2.8% of CRCs are Lynch syndrome,6 and 6.8% of Lynch syndrome cases are due to the AFM.6 For the population growth rate, we took data from the U.S. census, which records population figures back to 1790.12 The census records show that the population growth rates have decreased with time; so to account for this, we used a growth rate across the whole time period [1.65-fold per generation (fpg), where a generation is taken as 25 y] along with two extreme growth rates (one from the 19th century, 1.96 fpg, and one from the 20th century, 1.38 fpg) which will be representative of our outer CIs.

Identification of AFM carriers. As part of a large study of all newly diagnosed colorectal and endometrial cancer patients from the Columbus metropolitan region of Ohio (9),6 microsatellite instability (MSI) status was determined in 1,566 CRC patients. All 306 MSI-positive cases were screened via MLPA for large-scale alterations within the MSH2 gene. Eight of these samples gave aberrant results typical of a deletion or insertion in MSH2, four of which suggested the deletion of exons 1 to 6 of MSH2. In addition, 32 patients from throughout the United States were identified as having an exons 1 to 6 deletion of MSH2 at an outside commercial laboratory (Mayo Medical Laboratories and Myriad Genetic Laboratories, Inc.) and were brought to the attention of this study. Due to the sequence composition of this region of the genome (i.e., excessive numbers of Alu-like repeats; refs. 4, 15), it is essential to confirm the exact breakpoint sequence of this deletion, as it is known that other exons 1 to 6 deletions occur as a result of recombination between alternate repeat elements or even at different locations within the same repeat elements (16, 17). To this end, each of the 36 samples were amplified using the new breakpoint-specific primers, and any break-specific products were sequenced to confirm the precise location of the break. All four of our population-based cases and 28 of the 32 outside cases were found to carry the same AFM-specific mutation (Fig. 1).

Figure 1.

Detection and characterization of the AFM. A, agarose gel showing the AFM, breakpoint specific, PCR product at 562 bp, along with a control band at 811 bp. Lane 1, 100 bp ladder; lane 2, AFM carrier DNA; lane 3, wild-type DNA; lane 4, no template control. B, sequencing chromatogram flanking the breakpoint. Alignments below represent wild-type sequences 5′ and 3′ to the breakpoint, with mismatches displayed in a red font. The predicted breakpoint region (10 bp) is shown in a bold font.

Figure 1.

Detection and characterization of the AFM. A, agarose gel showing the AFM, breakpoint specific, PCR product at 562 bp, along with a control band at 811 bp. Lane 1, 100 bp ladder; lane 2, AFM carrier DNA; lane 3, wild-type DNA; lane 4, no template control. B, sequencing chromatogram flanking the breakpoint. Alignments below represent wild-type sequences 5′ and 3′ to the breakpoint, with mismatches displayed in a red font. The predicted breakpoint region (10 bp) is shown in a bold font.

Close modal

As motivation for assisting us with our genealogic studies, we offered free predictive testing to any at-risk individuals from our mutation-positive families. These relatives were tested (often in large family groups) after receiving thorough genetic counseling, which is a key component of the testing process. Through this incentive, we received samples from 171 relatives of these 41 AFM probands for mutation screening; 85 (49.7%) of which were shown to carry the AFM.

For each proband who carried the AFM, a geographic location was recorded and plotted onto a map of the United States (Fig. 2). From these data, we were able to see that present day families were primarily located within three States; Kentucky, Ohio, and Texas. To address the issue of ascertainment bias being an apparent cause for the prevalence within Ohio and Kentucky, we sought to obtain additional data on AFM carriers from two of the nation's most prominent genetics laboratories, the Mayo Clinic13

and Myriad Genetics.14 In keeping with HIPAA guidelines, both laboratories were unable to provide certain patient-specific information; however, they were able to report that the incidences of the AFM among Lynch syndrome patients in their cohorts was 4.3% and 7%, respectively, as well as the geographic locations for their AFM-positive probands (Fig. 2).

Figure 2.

Map of the United States depicting the location of patients, representing each of 76 families (at the time of diagnosis) and subfounders. Patients described in this study are shown as squares (numbered 1 through 41), patients identified through Myriad Genetics Laboratories, Inc., are shown as triangles, and patients identified through the Mayo Clinic are shown as circles. The location of common ancestors for the subfounder pedigrees are displayed as stars, wherein A, C, D, and E correspond to marriage locations of the common ancestors, and B, F, and G correspond to the regions that the common ancestors were born in. Due to the complexity of pedigrees A and B, there is some ambiguity for these subfounder locations.

Figure 2.

Map of the United States depicting the location of patients, representing each of 76 families (at the time of diagnosis) and subfounders. Patients described in this study are shown as squares (numbered 1 through 41), patients identified through Myriad Genetics Laboratories, Inc., are shown as triangles, and patients identified through the Mayo Clinic are shown as circles. The location of common ancestors for the subfounder pedigrees are displayed as stars, wherein A, C, D, and E correspond to marriage locations of the common ancestors, and B, F, and G correspond to the regions that the common ancestors were born in. Due to the complexity of pedigrees A and B, there is some ambiguity for these subfounder locations.

Close modal

Throughout the course of our study, we have actively sought to identify a carrier of the AFM outside of the United States, with efforts focusing on European countries as the most likely source of any founding individuals. To date, we have screened three samples from France (two French and one Belgian), two from Sweden, three from Italy, two from Poland, six from Germany (five Germans and one Armenian), one from Scotland, and four from Ireland (three Northern and one Republic) known to have an MSH2 exons 1 to 6 deletion, and none of them has proved to have the AFM-specific breakpoints.

Genotype characterization. To link the probands genetically and rule out the possibility that the mutation was occurring in the affected families de novo, we typed a set of polymorphic microsatellite markers and SNPs which flanked the mutation site. A panel of 12 microsatellites which flanked the mutation site and had an average interval of 956 kb (range, 120 kb–2 Mb) was chosen. We optimized microsatellite selection based on three criteria: (a) the repeat had to be highly polymorphic to reduce likelihood of similarity by chance, (b) the microsatellite and its flanking sequence had to be unique within the genome, and (c) the repeat was not located close to or within a more complex repeat, such as an Alu, which can often introduce additional size variation due to poly(A) tracts and can cause problems with primer binding due to their excessive prevalence. The use of a haploid cell line from patient 8 enabled us to generate an inferred haplotype for 34 samples (29 from this study and 5 from the original study; Fig. 3) and showed that all patients shared a common disease haplotype. Seven informative SNPs were typed at selected regions to establish if a specific microsatellite was different from the consensus because of a recombination event rather than a mutation. Twenty-one of the patients have disease-specific haplotypes in excess of 4.99 Mb; however, the core haplotype shared by all 34 of the patients is between 0.59 Mb (Clen32-Clen30) and 2.26 Mb (rs10495934-Clen25). To ensure that allele match was not a chance event, a panel of 118 control DNAs were typed with the 19 markers to determine the degree of variation for each marker (Fig. 3).

Figure 3.

Genotype data spanning the MSH2 locus of 36 DNAs. The 36 samples comprise 34 patients and 2 haploid clones. Disease associated genotypes are highlighted in gray, with the disease allele depicted in red. The size of the minimum and maximum conserved haplotypes along with the frequency of the disease associated allele among a panel of control DNAs are shown at the bottom. The patients have been ordered according to their subfounder groups, and where possible, we have displayed family specific haplotypes, which differ from the disease haplotype, in a colored font. Based on these data it is likely that patient 34 is part of subfounder group A, due to the presence of a family specific haplotype at the 3′ end of the gene (blue font).

Figure 3.

Genotype data spanning the MSH2 locus of 36 DNAs. The 36 samples comprise 34 patients and 2 haploid clones. Disease associated genotypes are highlighted in gray, with the disease allele depicted in red. The size of the minimum and maximum conserved haplotypes along with the frequency of the disease associated allele among a panel of control DNAs are shown at the bottom. The patients have been ordered according to their subfounder groups, and where possible, we have displayed family specific haplotypes, which differ from the disease haplotype, in a colored font. Based on these data it is likely that patient 34 is part of subfounder group A, due to the presence of a family specific haplotype at the 3′ end of the gene (blue font).

Close modal

Genealogic studies. Excluding one proband (20) whose father (from whom the mutation was inherited) was adopted, there were 40 families (31 families identified in this study and 9 families from the original study) with the AFM available for genealogic studies. Using the techniques described in Materials and Methods, 27 of these ostensibly unrelated families were linked into seven pedigrees, which we refer to herein as the “subfounder” groups, denoted A to G (Figs. 4 and 5). We have been unable, thus far, to link the seven subfounder groups to a single common ancestor to elucidate the true founder of this mutation in the United States. However, the oldest of the seven subfounder pedigrees (B) seems to trace back to a husband and wife who were born in the United States in about 1700, with the other pedigrees all tracing back to couples born in the mid to late 1700s. These individuals all seem to have been born, where known, exclusively in either Virginia or North Carolina. In the case of subfounder group A, which contains highly intermarried lines descending primarily from two early pioneering families, there are three different progenitor couples to which the eight current day families (1, 29, 30, 31, 33, 35, 39, and 40) can be linked, and as such, there are five different individuals who could have introduced the AFM into this subfounder group (Fig. 4A, the five potential founders are marked with an asterisk). From the attainable data, we can trace six of the seven (A, B, C, D, E, and G) subfounder groups back as having migrated through Kentucky (where some families remained), with data showing that many of the “founding” couples were married in relatively close geographic proximity to one another (A, southwestern Virginia ca. 1778; C, western North Carolina ca.1820; D, northeastern Tennessee15

15

Present-day northeastern Tennessee was part of North Carolina during the described period.

ca. 1785; E, northeastern Tennessee16 ca. 1789; Fig. 2). Although a mixture of European ancestries is evident among these subfounder groups, the predominant ancestry seems to be from the British Isles, primarily Scottish. Various families in most of the subfounder groups also have reported Native American ancestries, specifically Cherokee, but thus far this seems verifiable in only three or four families. These Native American links are certainly possible because the geographic placement of the progenitor couples, as given above, is either just inside or very close to the known Cherokee territorial boundaries of the time.11

Figure 4.

Parsimonious pedigrees showing the genealogically linked families. Pedigrees show direct blood line links only; siblings and spouses are included when they seem to carry the mutation or if they are equally likely to be a founder lineage, respectively (full pedigree data can be accessed online, http://internalmedicine.osu.edu/genetics/990.cfm). Individuals are labeled with their initials and year of birth, with the presence of a Lynch syndrome–associated cancer (endometrial, uterine, stomach, ovarian, pancreatic, ureter and renal pelvis, billary tract, brain, small bowel, sebaceous adenomas, and keratoacanthoma) being depicted as a solid black symbol when known. Pedigrees are drawn to an approximate scale of 25 y/generation as displayed at the left hand side of each pedigree. Asterisks have been placed at the top right corner of the five individuals who could all have introduced the mutation into subfounder group A.

Figure 4.

Parsimonious pedigrees showing the genealogically linked families. Pedigrees show direct blood line links only; siblings and spouses are included when they seem to carry the mutation or if they are equally likely to be a founder lineage, respectively (full pedigree data can be accessed online, http://internalmedicine.osu.edu/genetics/990.cfm). Individuals are labeled with their initials and year of birth, with the presence of a Lynch syndrome–associated cancer (endometrial, uterine, stomach, ovarian, pancreatic, ureter and renal pelvis, billary tract, brain, small bowel, sebaceous adenomas, and keratoacanthoma) being depicted as a solid black symbol when known. Pedigrees are drawn to an approximate scale of 25 y/generation as displayed at the left hand side of each pedigree. Asterisks have been placed at the top right corner of the five individuals who could all have introduced the mutation into subfounder group A.

Close modal
Figure 5.

Parsimonious pedigrees showing the genealogically linked families, continued.

Figure 5.

Parsimonious pedigrees showing the genealogically linked families, continued.

Close modal

Subfounder group D includes families 6 and 7 from the original AFM study (7). These two families connect to three newly identified AFM families (24, 25, and 28) through the wife of an individual who was originally thought to be an obligate carrier of the mutation due to him being an apparent link between three of the original families (Fig. 5A). Given the certainty of the new genealogy connections, the genealogy of family 5 (the only other family that was linked to families 6 and 7) was scrutinized. It was discovered that there was an error in the original genealogy due to two men sharing the same name. Given the locations of these men at the time (one in Illinois and the other in Alabama), it is clear that they were two different individuals and that family 5 does not link to families 6 and 7, as described in the original paper. This also proved that the true AFM “founder family” could not be the family who immigrated to the United States in 1727 from Germany as reported (7).

Estimate of allele age. It has been shown that the rate of recombination and population frequency of flanking marker alleles can be used to estimate the age of a mutation. In this study, we used two established methods: a single marker approach (11) and a combined marker approach (14). Using a marker-by-marker approach, we generated age estimates at all loci with respect to the AFM mutation (Supplementary Table S2; refs. 18, 19). Each marker estimate is generated independently from all the others, based on its location with respect to the mutation (recombination fraction) and the frequency with which the disease allele is found within the normal population. When all markers are considered, we get a broad estimate of 4 to 98 generations (100–2,450 years), with 13 of 15 markers giving a range of 4 to 26 generations (100–650 years). The two markers which define the outer limits of the fully conserved haplotype (rs10495934 and Clen25) give an estimate of 23 generations (575 years) and 19 generations (475 years), respectively.

The second method, which considers the linkage disequilibrium across all markers in a single analysis, is implemented in the DMLE+2.2 software. To use this method, we provided the program with an estimate of 32,150 (see Materials and Methods) for the number of disease chromosomes currently present in the U.S. population (based on data from the 2000 census). Three separate analyses were performed, each using a different estimate for the population growth rate (Fig. 6). The first analysis, which used a fast growth rate estimated from the 19th century censuses, gave an age estimate of 375 years (15 generations; 95% CI, 13–18). The second analysis, which used a slow growth rate estimated from the 20th century censuses, gave an age estimate of 775 years (31 generations; 95% CI, 23–40). The third analysis used a growth rate estimate from across both time periods and gave an age estimate of 500 years (20 generations; 95% CI, 17–25).

Figure 6.

Age estimates for the AFM. The posterior probability distribution plots of the mutation age (in generations), as estimated by the software DMLE+2.2, are shown when population growth rates of 1.96-fold (A), 1.38-fold (B), and 1.65-fold (C) per generation (25 y) are assumed. The dotted lines show the 95% CIs for the calculation.

Figure 6.

Age estimates for the AFM. The posterior probability distribution plots of the mutation age (in generations), as estimated by the software DMLE+2.2, are shown when population growth rates of 1.96-fold (A), 1.38-fold (B), and 1.65-fold (C) per generation (25 y) are assumed. The dotted lines show the 95% CIs for the calculation.

Close modal

Given the fact that mutations in the mismatch repair gene MSH2 account for a significant proportion (1–2%) of all cases of colorectal cancer, it is important to assess the risk that can be attributed to specific, frequently occurring, mutations. In this study, we have taken both molecular-based and genealogic-based approaches to get a better understanding of the prevalence and spread of the AFM of MSH2. Screening a large cohort of Lynch syndrome families added an additional 32 ostensibly unrelated families to the nine which had been identified previously (6). Genotype analysis in 34 of these families showed a conserved haplotype of some 5 Mb in most cases and a much smaller haplotype (0.59–2.67 Mb) in a single case (34), which allowed for an age calculation of ∼500 years for the AFM. Whereas haplotyping proves that these modern day AFM carriers are all distantly related to one another, the oldest genealogic links are to individuals born during the early 18th century in Virginia. As it becomes ever more difficult to extend these genealogies before 1700, due to the sparsity of recorded data from these times, it is apparent that we may never know who the original AFM founding individual was. This MSH2 AFM has some similarities to a recently described AFM in the APC gene which was brought to the United States from England around 1630 and affects two large U.S. kindreds (20).

Our new genealogic findings and haplotyping data do however provide very strong evidence against the hypothesis that the AFM arrived in the United States as a single event in the early 18th century. The records linking 27 of the AFM families into seven extended families each with a common ancestor being born between 1700 and the early 1800s would make it seem very unlikely that they could all converge on each other in such a period of time that would be consistent with a single common ancestor having arrived in the United States during the early 18th century, a time of significant European immigrations. With this in mind, we are left with two hypotheses, both of which are supported by the mutation age calculations. Either, the subfounder families were in the United States for several generations before that of the current subfounder common ancestors, which would allow for the possibility that the mutation was either brought into the United States by a single European immigrant during an earlier period or that the mutation had been introduced into these European lines by a Native American individual during this early colonial period. Or, the mutation originated within Europe several generations before it arrived on the shores of the United States, probably through several individuals.

The lack of evidence for the AFM outside of the United States is perhaps more supportive of the first hypothesis; however, there also lies the possibility that most of the possible carriers of the mutation in Europe may have emigrated or that many failed to pass on the mutation to subsequent generations and so we might predict a significant reduction of cases, or even an absence of cases in the founding country. In addition, our search for the AFM in Europe has by no means been exhaustive, so it is still possible that there are AFM mutation carriers in Europe that we have just not been able to identify yet. We have speculated, based on our genealogic studies, that Scotland in particular is a potential European source for the mutation; however, we have only been able to identify a single candidate (individual with a known exons 1–6 deletion of MSH2) from this region, who was shown not to carry the AFM. In addition, samples we have screened from an ancestrally related population (Ireland) were also negative for the mutation, and to the best of our knowledge, it has not been described elsewhere in the British Isles. Until evidence in support of one of these two hypotheses is obtained, whether it is the identification of a single common ancestor within the United States or the presence of a case of the AFM in another country, we cannot make firm conclusions as to the origins of this mutation. Nevertheless our current data show that the AFM is significantly more prevalent within the United States than was previously thought, and is clearly a prominent cause of Lynch syndrome, particularly in Ohio, Kentucky, and Texas. This is an important public health issue because there is ample evidence that individuals who are identified as having Lynch syndrome can benefit from highly targeted cancer control measures (2123).

With this in mind, we would promote the introduction of a single site diagnostic PCR, such as the one described here, as a first line of molecular screening in patients whose tumors stain negatively for MSH2 at the protein level. Such a diagnostic test would not only reduce the cost of genetic screening in a significant proportion of Lynch syndrome cases, but it would also allow for a far more rapid diagnosis than techniques, such as MLPA, and complete gene sequencing.

Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/).

Grant support: NIH grants CA67941 and CA16058 and NSF grant 0112050.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

We thank Thierry Frebourg, Annika Lindblom, Alessandra Viel, Jan Lubinski, Elisabeth Mangold, Timm Goecke, Patrick Morrison, Andrew Green, Malcolm Dunlop, and Rebecca Barnetson for the provision and screening of MSH2 exons 1 to 6 deletion cases; Thomas Prior and the Ohio State University Molecular Pathology Laboratory for confirming the breakpoints in all the probands and for providing testing to the at-risk family members; Douglas Crews for his contributions to the genealogic studies; Richard Wenstrup, Lynn Anne Burbidge, and Cynthia Frye of Myriad Genetic Laboratories, Inc., for the provision of epidemiologic data; and the families who participated in this research.

1
Hampel H, Frankel WL, Martin E, et al. Screening for the Lynch syndrome (hereditary nonpolyposis colorectal cancer).
N Engl J Med
2005
;
352
:
1851
–60.
2
Lynch HT, de la Chapelle A. Hereditary colorectal cancer.
N Engl J Med
2003
;
348
:
919
–32.
3
Schouten JP, McElgunn CJ, Waaijer R, Zwijnenburg D, Diepvens F, Pals G. Relative quantification of 40 nucleic acid sequences by multiplex ligation-dependent probe amplification.
Nucleic Acids Res
2002
;
30
:
e57
.
4
Heath KE, Day IN, Humphries SE. Universal primer quantitative fluorescent multiplex (UPQFM) PCR: a method to detect major and minor rearrangements of the low density lipoprotein receptor gene.
J Med Genet
2000
;
37
:
272
–80.
5
Nakagawa H, Hampel H, de la Chapelle A. Identification and characterization of genomic rearrangements of MSH2 and MLH1 in Lynch syndrome (HNPCC) by novel techniques.
Hum Mutat
2003
;
22
:
258
.
6
Wagner A, Barrows A, Wijnen JT, et al. Molecular analysis of hereditary nonpolyposis colorectal cancer in the United States: high mutation detection rate among clinically selected families and characterization of an American founder genomic deletion of the MSH2 gene.
Am J Hum Genet
2003
;
72
:
1088
–100.
7
Lynch HT, Coronel SM, Okimoto R, et al. A founder mutation of the MSH2 gene and hereditary nonpolyposis colorectal cancer in the United States.
JAMA
2004
;
291
:
718
–24.
8
Lynch HT, de la Chapelle A, Hampel H, et al. American founder mutation for Lynch syndrome. Prevalence estimates and implications.
Cancer
2006
;
106
:
448
–52.
9
Hampel H, Frankel W, Panescu J, et al. Screening for Lynch syndrome (hereditary nonpolyposis colorectal cancer) among endometrial cancer patients.
Cancer Res
2006
;
66
:
7810
–7.
10
Yan H, Papadopoulos N, Marra G, et al. Conversion of diploidy to haploidy.
Nature
2000
;
403
:
723
–4.
11
Risch N, de Leon D, Ozelius L, et al. Genetic analysis of idiopathic torsion dystonia in Ashkenazi Jews and their recent descent from a small founder population.
Nat Genet
1995
;
9
:
152
–9.
12
Bengtsson BO, Thomson G. Measuring the strength of associations between HLA antigens and diseases.
Tissue Antigens
1981
;
18
:
356
–63.
13
Kong A, Gudbjartsson DF, Sainz J, et al. A high-resolution recombination map of the human genome.
Nat Genet
2002
;
31
:
241
–7.
14
Reeve JP, Rannala B. DMLE+: Bayesian linkage disequilibrium gene mapping.
Bioinformatics
2002
;
18
:
894
–5.
15
Charbonnier F, Baert-Desurmont S, Liang P, et al. The 5′ region of the MSH2 gene involved in hereditary non-polyposis colorectal cancer contains a high density of recombinogenic sequences.
Hum Mutat
2005
;
26
:
255
–61.
16
van der Klift H, Wijnen J, Wagner A, et al. Molecular characterization of the spectrum of genomic deletions in the mismatch repair genes MSH2, MLH1, MSH6, and PMS2 responsible for hereditary nonpolyposis colorectal cancer (HNPCC).
Genes Chromosomes Cancer
2005
;
44
:
123
–38.
17
Stella A, Surdo NC, Lastella P, et al. Germline novel MSH2 deletions and a founder MSH2 deletion associated with anticipation effects in HNPCC.
Clin Genet
2007
;
71
:
130
–9.
18
Labuda M, Labuda D, Korab-Laskowska M, et al. Linkage disequilibrium analysis in young populations: pseudo-vitamin D-deficiency rickets and the founder effect in French Canadians.
Am J Hum Genet
1996
;
59
:
633
–43.
19
Stephens M, Scheet P. Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation.
Am J Hum Genet
2005
;
76
:
449
–62.
20
Neklason DW, Stevens J, Boucher KM, et al. American founder mutation for attenuated familial adenomatous polyposis.
Clin Gatroenterol Hepatol
2008
;
6
:
46
–52.
21
Lindor NM, Petersen GM, Hadley DW, et al. Recommendations for the care of individuals with an inherited predisposition to Lynch syndrome: a systematic review.
JAMA
2006
;
296
:
1507
–17.
22
Vasen HF, de Vos Tot Nederveen Cappel WH. An evidence-based review on surveillance for Lynch syndrome.
Dis Colon Rectum
2006
;
49
:
1797
–8; author reply 1799.
23
Schmeler KM, Lynch HT, Chen LM, et al. Prophylactic surgery to reduce the risk of gynecologic cancers in the Lynch syndrome.
N Engl J Med
2006
;
354
:
261
–9.

Supplementary data